| ###################################################### |
| Code sharing between independently linked XIP binaries |
| ###################################################### |
| |
| :Author: Tamas Ban |
| :Organization: Arm Limited |
| :Contact: tamas.ban@arm.com |
| |
| ********** |
| Motivation |
| ********** |
| Cortex-M devices are usually constrained in terms of flash and RAM. Therefore, |
| it is often challenging to fit bigger projects in the available memory. The PSA |
| specifications require a device to both have a secure boot process in place at |
| device boot-up time, and to have a partition in the SPE which provides |
| cryptographic services at runtime. These two entities have some overlapping |
| functionality. Some cryptographic primitives (e.g. hash calculation and digital |
| signature verification) are required both in the bootloader and the runtime |
| environment. In the current TF-M code base, both firmware components use the |
| mbed-crypto library to implement these requirements. During the build process, |
| the mbed-crpyto library is built twice, with different configurations (the |
| bootloader requires less functionality) and then linked to the corresponding |
| firmware component. As a result of this workflow, the same code is placed in the |
| flash twice. For example, the code for the SHA-256 algorithm is included in |
| MCUboot, but the exact same code is duplicated in the SPE cryptography |
| partition. In most cases, there is no memory isolation between the bootloader |
| and the SPE, because both are part of the PRoT code and run in the secure |
| domain. So, in theory, the code of the common cryptographic algorithms could be |
| reused among these firmware components. This could result in a big reduction in |
| code footprint, because the cryptographic algorithms are usually flash hungry. |
| Code size reduction can be a good opportunity for very constrained devices, |
| which might need to use TF-M Profile Small anyway. |
| |
| ******************* |
| Technical challenge |
| ******************* |
| Code sharing in a regular OS environment is easily achievable with dynamically |
| linked libraries. However, this is not the case in Cortex-M systems where |
| applications might run bare-metal, or on top of an RTOS, which usually lacks |
| dynamic loading functionality. One major challenge to be solved in the Cortex-M |
| space is how to share code between independently linked XIP applications that |
| are tied to a certain memory address range to be executable and have absolute |
| function and global data memory addresses. In this case, the code is not |
| relocatable, and in most cases, there is no loader functionality in the system |
| that can perform code relocation. Also, the lack of an MMU makes the address |
| space flat, constant and not reconfigurable at runtime by privileged code. |
| |
| One other difficulty is that the bootloader and the runtime use the same RAM |
| area during execution. The runtime firmware is executed strictly after the |
| bootloader, so normally, it can reuse the whole secure RAM area, as it would be |
| the exclusive user. No attention needs to be paid as to where global data is |
| placed by the linker. The bootloader does not need to retain its state. The low |
| level startup of the runtime firmware can freely overwrite the RAM with its data |
| without corrupting bootloader functionality. However, with code sharing between |
| bootloader and runtime firmware, these statements are no longer true. Global |
| variables used by the shared code must either retain their value or must be |
| reinitialised during low level startup of the runtime firmware. The startup code |
| is not allowed to overwrite the shared global variables with arbitrary data. The |
| following design proposal provides a solution to these challenges. |
| |
| ************** |
| Design concept |
| ************** |
| The bootloader is sometimes implemented as ROM code (BL1) or stored in a region |
| of the flash which is lockable, to prevent tampering. In a secure system, the |
| bootloader is immutable code and thus implements a part of the Root of Trust |
| anchor in the device, which is trusted implicitly. The shared code is primarily |
| part of the bootloader, and is reused by the runtime SPE firmware at a later |
| stage. Not all of the bootloader code is reused by the runtime SPE, only some |
| cryptographic functions. |
| |
| Simplified steps of building with code sharing enabled: |
| |
| - Complete the bootloader build process to have a final image that contains |
| the absolute addresses of the shared functions, and the global variables |
| used by these functions. |
| - Extract the addresses of the functions and related global variables that are |
| intended to be shared from the bootloader executable. |
| - When building runtime firmware, provide the absolute addresses of the shared |
| symbols to the linker, so that it can pick them up, instead of instantiating |
| them again. |
| |
| The execution flow looks like this: |
| |
| .. code-block:: bash |
| |
| SPE MCUboot func1() MCUboot func2() MCUboot func3() |
| | |
| | Hash() |
| |------------->| |
| |----------------->| |
| | |
| Return | |
| Return |<-----------------| |
| |<-------------| |
| | |
| | |
| |----------------------------------------------------->| |
| | |
| Function pointer in shared global data() | |
| |<-----------------------------------------------------| |
| | |
| | Return |
| |----------------------------------------------------->| |
| | |
| Return | |
| |<-----------------------------------------------------| |
| | |
| | |
| |
| The execution flow usually returns from a shared function back to the SPE with |
| an ordinary function return. So usually, once a shared function is called in the |
| call path, all further functions in the call chain will be shared as well. |
| However, this is not always the case, as it is possible for a shared function to |
| call a non-shared function in SPE code through a global function pointer. |
| |
| For shared global variables, a dedicated data section must be allocated in the |
| linker configuration file. This area must have the same memory address in both |
| MCUboot's and the SPE's linker files, to ensure the integrity of the variables. |
| For simplicity's sake, this section is placed at the very beginning of the RAM |
| area. Also, the RAM wiping functionality at the end of the secure boot flow |
| (that is intended to remove any possible secrets from the RAM) must not clear |
| this area. Furthermore, it must be ensured that the linker places shared globals |
| into this data section. There are two way to achieve this: |
| |
| - Put a filter pattern in the section body that matches the shared global |
| variables. |
| - Mark the global variables in the source code with special attribute |
| `__attribute__((section(<NAME_OF_SHARED_SYMBOL_SECTION>)))` |
| |
| RAM memory layout in MCUboot with code sharing enabled: |
| |
| .. code-block:: bash |
| |
| +------------------+ |
| | Shared symbols | |
| +------------------+ |
| | Shared boot data | |
| +------------------+ |
| | Data | |
| +------------------+ |
| | Stack (MSP) | |
| +------------------+ |
| | Heap | |
| +------------------+ |
| |
| RAM memory layout in SPE with code sharing enabled: |
| |
| .. code-block:: bash |
| |
| +-------------------+ |
| | Shared symbols | |
| +-------------------+ |
| | Shared boot data | |
| +-------------------+ |
| | Stack (MSP) | |
| +-------------------+ |
| | Stack (PSP) | |
| +-------------------+ |
| | Partition X Data | |
| +-------------------+ |
| | Partition X Stack | |
| +-------------------+ |
| . |
| . |
| . |
| +-------------------+ |
| | Partition Z Data | |
| +-------------------+ |
| | Partition Z Stack | |
| +-------------------+ |
| | PRoT Data | |
| +-------------------+ |
| | Heap | |
| +-------------------+ |
| |
| Patching Mbed TLS |
| ================= |
| In order to share some global function pointers from mbed-crypto that are |
| related to dynamic memory allocation, their scope must be extended from private |
| to global. This is needed because some compiler toolchain only extract the |
| addresses of public functions and global variables, and extraction of addresses |
| is a requirement to share them among binaries. Therefore, a short patch was |
| created for the mbed-crypto library, which "globalises" these function pointers: |
| |
| `lib/ext/mbedcrypto/0002-Enable-crypto-code-sharing-between-independent-binar.patch` |
| |
| The patch needs to be manually applied in the Mbed TLS repo, if code sharing is |
| enabled. The patch has no effect on the functional behaviour of the |
| cryptographic library, it only extends the scope of some variables. |
| |
| ************* |
| Tools support |
| ************* |
| All the currently supported compilers provide a way to achieve the above |
| objectives. However, there is no standard way, which means that the code sharing |
| functionality must be implemented on a per compiler basis. The following steps |
| are needed: |
| |
| - Extraction of the addresses of all global symbols. |
| - The filtering out of the addresses of symbols that aren't shared. The goal is |
| to not need to list all the shared symbols by name. Only a simple pattern |
| has to be provided, which matches the beginning of the symbol's name. |
| Matching symbols will be shared. Examples are in : |
| `bl2/shared_symbol_template.txt` |
| - Provision of the addresses of shared symbols to the linker during the SPE |
| build process. |
| - The resolution of symbol collisions during SPE linking. Because mbed-crypto |
| is linked to both firmware components as a static library, the external |
| shared symbols will conflict with the same symbols found within it. In order |
| to prioritize the external symbol, the symbol with the same name in |
| mbed-crypto must be marked as weak in the symbol table. |
| |
| The above functionalities are implemented in the toolchain specific CMake files: |
| |
| - `toolchain_ARMCLANG.cmake` |
| - `toolchain_GNUARM.cmake` |
| |
| By the following two functions: |
| |
| - `target_share_symbols()`: Extract and filter shared symbol addresses |
| from MCUboot. |
| - `target_link_shared_code()`: Link shared code to the SPE and resolve symbol |
| conflict issues. |
| |
| ARMCLANG |
| ======== |
| The toolchain specific steps are: |
| |
| - Extract all symbols from MCUboot: add `-symdefs` to the compiler command line |
| - Filter shared symbols: call CMake script `FilterSharedSymbols.cmake` |
| - Weaken duplicated (shared) symbols in the mbed-crypto static library that are |
| linked to the SPE: `arm-none-eabi-objcopy` |
| - Link shared code to SPE: Add the filtered output of `-symdefs` to the SPE |
| source file list. |
| |
| GNUARM |
| ====== |
| The toolchain specific steps are: |
| |
| - Extract all symbols from MCUboot: `arm-none-eabi-nm` |
| - Filter shared symbols: call CMake script: `FilterSharedSymbols.cmake` |
| - Strip unshared code from MCUboot: `arm-none-eabi-strip` |
| - Weaken duplicated (shared) symbols in the mbed-crypto static library that are |
| linked to the SPE: `arm-none-eabi-objcopy` |
| - Link shared code to SPE: Add `-Wl -R <SHARED_STRIPPED_CODE.axf>` to the |
| compiler command line |
| |
| IAR |
| === |
| Functionality currently not implemented, but the toolchain supports doing it. |
| |
| ************************** |
| Memory footprint reduction |
| ************************** |
| Build type: MinSizeRel |
| Platform: mps2/an521 |
| Version: TF-Mv1.2.0 + code sharing patches |
| MCUboot image encryption support is disabled. |
| |
| +------------------+-------------------+-------------------+-------------------+ |
| | | ConfigDefault | ConfigProfile-M | ConfigProfile-S | |
| +------------------+----------+--------+----------+--------+----------+--------+ |
| | | ARMCLANG | GNUARM | ARMCLANG | GNUARM | ARMCLANG | GNUARM | |
| +------------------+----------+--------+----------+--------+----------+--------+ |
| | CODE_SHARING=OFF | 122268 | 124572 | 75936 | 75996 | 50336 | 50224 | |
| +------------------+----------+--------+----------+--------+----------+--------+ |
| | CODE_SHARING=ON | 113264 | 115500 | 70400 | 70336 | 48840 | 48988 | |
| +------------------+----------+--------+----------+--------+----------+--------+ |
| | Difference | 9004 | 9072 | 5536 | 5660 | 1496 | 1236 | |
| +------------------+----------+--------+----------+--------+----------+--------+ |
| |
| If MCUboot image encryption support is enabled then saving could be up to |
| ~13-15KB. |
| |
| .. Note:: |
| |
| Code sharing on Musca-B1 was tested only with SW only crypto, so crypto |
| hardware acceleration must be turned off: -DCRYPTO_HW_ACCELERATOR=OFF |
| |
| |
| ************************* |
| Usability considerations |
| ************************* |
| Functions that only use local variables can be shared easily. However, functions |
| that rely on global variables are a bit tricky. They can still be shared, but |
| all global variables must be placed in the shared symbol section, to prevent |
| overwriting and to enable the retention of their values. |
| |
| Some global variables might need to be reinitialised to their original values by |
| runtime firmware, if they have been used by the bootloader, but need to have |
| their original value when runtime firmware starts to use them. If so, the |
| reinitialising functionality must be implemented explicitly, because the low |
| level startup code in the SPE does not initialise the shared variables, which |
| means they retain their value after MCUboot stops running. |
| |
| If a bug is discovered in the shared code, it cannot be fixed with a firmware |
| upgrade, if the bootloader code is immutable. If this is the case, disabling |
| code sharing might be a solution, as the new runtime firmware could contain the |
| fixed code instead of relying on the unfixed shared code. However, this would |
| increase code footprint. |
| |
| API backward compatibility also can be an issue. If the API has changed in newer |
| version of the shared code. Then new code cannot rely on the shared version. |
| The changed code and all the other shared code where it is referenced from must |
| be ignored and the updated version of the functions must be compiled in the |
| SPE binary. The Mbed TLS library is API compatible with its current version |
| (``v2.24.0``) since the ``mbedtls-2.7.0 release`` (2018-02-03). |
| |
| To minimise the risk of incompatibility, use the same compiler flags to build |
| both firmware components. |
| |
| The artifacts of the shared code extraction steps must be preserved so as to |
| remain available if new SPE firmware (that relies on shared code) is built and |
| released. Those files are necessary to know the address of shared symbols when |
| linking the SPE. |
| |
| ************************ |
| How to use code sharing? |
| ************************ |
| Considering the above, code sharing is an optional feature, which is disabled |
| by default. It can be enabled from the command line with a compile time switch: |
| |
| - `TFM_CODE_SHARING`: Set to `ON` to enable code sharing. |
| |
| With the default settings, only the common part of the mbed-crypto library is |
| shared, between MCUboot and the SPE. However, there might be other device |
| specific code (e.g. device drivers) that could be shared. The shared |
| cryptography code consists mainly of the SHA-256 algorithm, the `bignum` library |
| and some RSA related functions. If image encryption support is enabled in |
| MCUboot, then AES algorithms can be shared as well. |
| |
| Sharing code between the SPE and an external project is possible, even if |
| MCUboot isn't used as the bootloader. For example, a custom bootloader can also |
| be built in such a way as to create the necessary artifacts to share some of its |
| code with the SPE. The same artifacts must be created like the case of MCUboot: |
| |
| - `shared_symbols_name.txt`: Contains the name of the shared symbols. Used by |
| the script that prevents symbol collision. |
| - `shared_symbols_address.txt`: Contains the type, address and name of shared |
| symbols. Used by the linker when linking runtime SPE. |
| - `shared_code.axf`: GNUARM specific. The stripped version of the firmware |
| component, only contains the shared code. It is used by the linker when |
| linking the SPE. |
| |
| .. Note:: |
| |
| The artifacts of the shared code extraction steps must be preserved to be |
| able to link them to any future SPE version. |
| |
| When an external project is sharing code with the SPE, the `SHARED_CODE_PATH` |
| compile time switch must be set to the path of the artifacts mentioned above. |
| |
| ******************** |
| Further improvements |
| ******************** |
| This design focuses only on sharing the cryptography code. However, other code |
| could be shared as well. Some possibilities: |
| |
| - Flash driver |
| - Serial driver |
| - Image metadata parsing code |
| - etc. |
| |
| -------------- |
| |
| *Copyright (c) 2020-2024, Arm Limited. All rights reserved.* |