Raef Coles | 40ff13f | 2022-02-24 14:33:15 +0000 | [diff] [blame] | 1 | ######################## |
| 2 | BL1 Immutable bootloader |
| 3 | ######################## |
| 4 | |
| 5 | :Author: Raef Coles |
| 6 | :Organization: Arm Limited |
| 7 | :Contact: raef.coles@arm.com |
| 8 | |
| 9 | ************ |
| 10 | Introduction |
| 11 | ************ |
| 12 | |
| 13 | Some devices that use TF-M will require initial boot code that is stored in ROM. |
| 14 | There are a variety of reasons that this might happen: |
| 15 | |
| 16 | - The device cannot access flash memory without a driver, so needs some setup |
| 17 | to be done before main images on flash can be booted. |
| 18 | - The device has no on-chip secure flash, and therefore cannot otherwise |
| 19 | maintain a tamper-resistant root of trust. |
| 20 | - The device has a security model that requires an immutable root of trust |
| 21 | |
| 22 | Henceforth any bootloader stored in ROM will be referred to as BL1, as it would |
| 23 | necessarily be the first stage in the boot chain. |
| 24 | |
| 25 | TF-M provides a reference second-stage flash bootloader BL2, in order to allow |
| 26 | easier integration. This bootloader implements all secure boot functionality |
| 27 | needed to provide a secure chain of trust. |
| 28 | |
| 29 | A reference ROM bootloader BL1 has now being added with the same motivation - |
| 30 | allowing easier integration of TF-M for platforms that do not have their own |
| 31 | BL1 and require one. |
| 32 | |
| 33 | **************************** |
| 34 | BL1 Features and Motivations |
| 35 | **************************** |
| 36 | |
| 37 | The reference ROM bootloader provides the following features: |
| 38 | |
| 39 | - A split between code being stored in ROM and in other non-volatile memory. |
| 40 | |
| 41 | - This can allow significant cost reduction in fixing bugs compared to |
| 42 | ROM-only bootloaders. |
| 43 | |
| 44 | - A secure boot mechanism that allows upgrading the next boot stage (which |
| 45 | would usually be BL2). |
| 46 | |
| 47 | - This allows for the fixing of any bugs in the BL2 image. |
| 48 | - Alternately, this could allow the removal of BL2 in some devices that are |
| 49 | constrained in flash space but have ROM. |
| 50 | |
| 51 | - A post-quantum resistant asymmetric signature scheme for verifying the next |
| 52 | boot stage image. |
| 53 | |
| 54 | - This can allow devices to be securely updated even if attacks |
| 55 | involving quantum computers become viable. This could extend the lifespans |
| 56 | of devices that might be deployed in the field for many years. |
| 57 | |
| 58 | - A mechanism for passing boot measurements to the TF-M runtime so that they |
| 59 | can be attested. |
| 60 | - Tooling to create and sign images. |
| 61 | - Fault Injection (FI) and Differential Power Analysis (DPA) mitigations. |
| 62 | |
| 63 | ********************************* |
| 64 | BL1_1 and BL1_2 split bootloaders |
| 65 | ********************************* |
| 66 | |
| 67 | BL1 is split into two distinct boot stages, BL1_1 which is stored in ROM and |
| 68 | BL1_2 which is stored in other non-volatile storage. This would usually be |
| 69 | either trusted or untrusted flash, but on platforms without flash memory can be |
| 70 | OTP. As BL1_2 is verified against a hash stored in OTP, it is immutable after |
| 71 | provisioning even if stored in mutable storage. |
| 72 | |
| 73 | Bugs in ROM bootloaders usually cannot be fixed once a device is provisioned / |
| 74 | in the field, as ROM code is immutable the only option is fixing the bug in |
| 75 | newly manufactured devices. |
| 76 | |
| 77 | However, it can be very expensive to change the ROM code of devices once |
| 78 | manufacturing has begun, as it requires changes to the photolithography masks |
| 79 | that are used to create the device. This cost varies depending on the complexity |
| 80 | of the device and of the process node that it is being fabricated on, but can be |
| 81 | large, both in engineering time and material/process costs. |
| 82 | |
| 83 | By placing the majority of the immutable bootloader in other storage, we can |
| 84 | mitigate the costs associated with changing ROM code, as a new BL1_2 image can |
| 85 | be used at provisioning time with minimal changeover cost. BL1_1 contains a |
| 86 | minimal codebase responsible mainly for the verification of the BL1_2 image. |
| 87 | |
| 88 | The bootflow is as follows. For simplicity this assumes that the boot stage |
| 89 | after BL1 is BL2, though this is not necessarily the case: |
| 90 | |
| 91 | 1) BL1_1 begins executing in place from ROM |
| 92 | 2) BL1_1 copies BL1_2 into RAM |
| 93 | 3) BL1_1 verifies BL1_2 against the hash stored in OTP |
| 94 | 4) BL1_1 jumps to BL1_2, if the hash verification has succeeded |
| 95 | 5) BL1_2 copies the primary BL2 image from flash into RAM |
| 96 | 6) BL1_2 verifies the BL2 image using asymmetric cryptography |
| 97 | 7) If verification fails, BL1_2 repeats 5 and 6 with the secondary BL2 image |
| 98 | 8) BL1_2 jumps to BL2, if either image has successfully verified |
| 99 | |
| 100 | .. Note:: |
| 101 | The BL1_2 image is not encrypted, so if it is placed in untrusted flash it |
| 102 | will be possible to read the data in the image. |
| 103 | |
| 104 | Some optimizations have been made specifically for the case where BL1_2 has been |
| 105 | stored in OTP: |
| 106 | |
| 107 | OTP can be very expensive in terms of chip area, though new technologies like |
| 108 | antifuse OTP decrease this cost. Because of this, the code size of BL1_2 has |
| 109 | been minimized. Code-sharing has been configured so that BL1_2 can call |
| 110 | functions stored in ROM. Care should be taken that OTP is sized such that it is |
| 111 | possible to include versions of the functions used via code-sharing, in case the |
| 112 | ROM functions contain bugs, though less space is needed than if all code is |
| 113 | duplicated as it is assumed that most functions will not contain bugs. |
| 114 | |
| 115 | As OTP memory frequently has low performance, BL1_2 is copied into RAM before it |
| 116 | it is executed. It also copies the next image stage into RAM before |
| 117 | authenticating it, which allows the next stage to be stored in untrusted flash. |
| 118 | This requires that the device have sufficient RAM to contain both the BL1_2 |
| 119 | image and the next stage image at the same time. Note that this is done even if |
| 120 | BL1_2 is located in XIP-capable flash, as it both allows the use of untrusted |
| 121 | flash and simplifies the image upgrade logic. |
| 122 | |
| 123 | .. Note:: |
| 124 | BL1_2 enables TF-M to be used on devices that contain no secure flash, though |
| 125 | the ITS service will not be available. Other services that depend on ITS will |
| 126 | not be available without modification. |
| 127 | |
| 128 | ************************************* |
| 129 | Secure boot / Image upgrade mechanism |
| 130 | ************************************* |
| 131 | |
| 132 | BL1_2 verifies the authenticity of the next stage image via asymmetric |
| 133 | cryptography, using a public key that is provisioned into OTP. |
| 134 | |
| 135 | BL1_2 implements a rollback protection counter in OTP, which is used to prevent |
| 136 | the next stage image being downgraded to a less secure version. |
| 137 | |
| 138 | BL1_2 has two image slots, which allows image upgrades to be performed. The |
| 139 | primary slot is always booted first, and then if verification of this fails |
| 140 | (either due to an invalid signature or due to a version lower than the rollback |
| 141 | protection counter) the secondary slot is then booted (subject to the same |
| 142 | checks). |
| 143 | |
| 144 | BL1_2 contains no image upgrade logic, in order for OTA of the next stage image |
| 145 | to be implemented, a later stage in the system must handle downloading new |
| 146 | images and placing them in the required slot. |
| 147 | |
| 148 | ******************************************** |
| 149 | Post-Quantum signature verification in BL1_2 |
| 150 | ******************************************** |
| 151 | |
| 152 | BL1_2 uses a post-quantum asymmetric signature scheme to verify the next stage. |
| 153 | The scheme used is Leighton-Michaeli Signatures (henceforth LMS). LMS is |
| 154 | standardised in `NIST SP800-208 |
| 155 | <https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-208.pdf>`_ |
| 156 | and `IETF RFC8554. <https://datatracker.ietf.org/doc/html/rfc8554>`_ |
| 157 | |
| 158 | LMS is a stateful-hash signature scheme, meaning that: |
| 159 | |
| 160 | 1) It is constructed from a cryptographic hash function, in this case SHA256. |
| 161 | |
| 162 | - This function can be accelerated by existing hardware accelerators, which |
| 163 | can make LMS verification relatively fast compared to other post-quantum |
| 164 | signature schemes that cannot be accelerated in hardware yet. |
| 165 | |
| 166 | 2) Each private key can only be used to sign a certain number of images. |
| 167 | |
| 168 | - BL1_2 uses the SHA256_H10 parameter set, meaning each key can sign 1024 |
| 169 | images. |
| 170 | |
| 171 | The main downside, the limited amount of possible signatures, can be mitigated |
| 172 | by limiting the amount of image upgrades that are done. As BL2 is often |
| 173 | currently not upgradable, it is not anticipated that this limit will be |
| 174 | problematic. If BL1 is being used to directly boot a TF-M/NS combined image, the |
| 175 | limit is more likely to be problematic, and care should be taken to examine the |
| 176 | likely update amount. |
| 177 | |
| 178 | LMS public keys are 32 bytes in size, and LMS signatures are 1912 bytes in size. |
| 179 | The signature size is larger than some asymmetric schemes, though most devices |
| 180 | should have enough space in flash to accommodate this. |
| 181 | |
| 182 | The main upside of LMS, aside from the security against attacks involving |
| 183 | quantum computers, is that it is relatively simple to implement. The software |
| 184 | implementation that is used by BL1 is ~3KiB in size, which is considerably |
| 185 | smaller than the corresponding RSA implementation which is at least 6.5K. This |
| 186 | simplicity of implementation is useful to avoid bugs. |
| 187 | |
| 188 | BL1 will use MbedTLS as the source for its implementation of LMS. |
| 189 | |
| 190 | .. Note:: |
| 191 | As of the time of writing, the LMS code is still in the process of being |
| 192 | merged into MbedTLS, so BL1 currently does not support asymmetric |
| 193 | verification of the next boot stage. Currently, the next boot stage is |
| 194 | hash-locked, so cannot be upgraded. |
| 195 | |
| 196 | The Github pull request for LMS can be found `here |
| 197 | <https://github.com/ARMmbed/mbedtls/pull/4826>`_ |
| 198 | |
| 199 | ********************* |
| 200 | BL1 boot measurements |
| 201 | ********************* |
| 202 | |
| 203 | BL1 outputs boot measurements in the same format as BL2, utilising the same |
| 204 | shared memory area. These measurements can then be included in the attestation |
| 205 | token, allowing the attestation of the version of the boot stage after BL1. |
| 206 | |
| 207 | *********** |
| 208 | BL1 tooling |
| 209 | *********** |
| 210 | |
| 211 | Image signing scripts are provided for BL1_1 and BL1_2. While the script is |
| 212 | named ``create_bl2_img.py``, it can be used for any next stage image. |
| 213 | |
| 214 | - ``bl1/bl1_1/scripts/create_bl1_2_img.py`` |
| 215 | - ``bl1/bl1_2/scripts/create_bl2_img.py`` |
| 216 | |
| 217 | These sign (and encrypt in the case of ``create_bl2_img.py``) a given image file |
| 218 | and append the required headers. |
| 219 | |
| 220 | ************************** |
| 221 | BL1 FI and DPA mitigations |
| 222 | ************************** |
| 223 | |
| 224 | BL1 reuses the FI countermeasures used in the TF-M runtime, which are found in |
| 225 | ``lib/fih/``. |
| 226 | |
Raef Coles | a5d031b | 2023-07-04 10:45:33 +0100 | [diff] [blame] | 227 | BL1 implements countermeasures against fault injection. The functions with these |
| 228 | countermeasures are found in ``bl1/bl1_1/shared_lib/util.c`` |
Raef Coles | 40ff13f | 2022-02-24 14:33:15 +0000 | [diff] [blame] | 229 | |
Raef Coles | a5d031b | 2023-07-04 10:45:33 +0100 | [diff] [blame] | 230 | ``bl_fih_memeql`` tests if memory regions have the same value |
Raef Coles | 40ff13f | 2022-02-24 14:33:15 +0000 | [diff] [blame] | 231 | |
Raef Coles | a5d031b | 2023-07-04 10:45:33 +0100 | [diff] [blame] | 232 | - It inserts random delays to improve resilience to FIH attacks |
Raef Coles | 40ff13f | 2022-02-24 14:33:15 +0000 | [diff] [blame] | 233 | - It performs loop integrity checks |
| 234 | - It uses FIH constructs |
| 235 | |
| 236 | ************************** |
| 237 | Using BL1 on new platforms |
| 238 | ************************** |
| 239 | |
| 240 | New platforms must define the following macros in their ``region_defs.h``: |
| 241 | |
| 242 | - ``BL1_1_HEAP_SIZE`` |
| 243 | - ``BL1_1_STACK_SIZE`` |
| 244 | - ``BL1_2_HEAP_SIZE`` |
| 245 | - ``BL1_2_STACK_SIZE`` |
| 246 | - ``BL1_1_CODE_START`` |
| 247 | - ``BL1_1_CODE_LIMIT`` |
| 248 | - ``BL1_1_CODE_SIZE`` |
| 249 | - ``BL1_2_CODE_START`` |
| 250 | - ``BL1_2_CODE_LIMIT`` |
| 251 | - ``BL1_2_CODE_SIZE`` |
| 252 | - ``PROVISIONING_DATA_START`` |
| 253 | - ``PROVISIONING_DATA_LIMIT`` |
| 254 | - ``PROVISIONING_DATA_SIZE`` |
| 255 | |
| 256 | The ``PROVISIONING_DATA_*`` defines are used to locate where the data to be |
| 257 | provisioned into OTP can be found. These are required as the provisioning bundle |
| 258 | needs to contain the entire BL1_2 image, usually >= 8KiB in size, which is too |
| 259 | large to be placed in the static data area as is done for all other dummy |
| 260 | provisioning data. On development platforms with reprogrammable ROM, this is |
| 261 | often placed in unused ROM. On production platforms, this should be located in |
| 262 | RAM and then filled with provisioning data. The format of the provisioning data |
| 263 | that should be located in the ``PROVISIONING_DATA_*`` region can be found in |
| 264 | ``bl1/bl1_1/lib/provisioning.c`` in the struct |
| 265 | ``bl1_assembly_and_test_provisioning_data_t`` |
| 266 | |
| 267 | If the platform is storing BL1_2 in flash, it must set |
| 268 | ``BL1_2_IMAGE_FLASH_OFFSET`` to the flash offset of the start of BL1_2. |
| 269 | |
| 270 | The platform must also implement the HAL functions defined in the following |
| 271 | headers: |
| 272 | |
| 273 | - ``bl1/bl1_1/shared_lib/interface/trng.h`` |
| 274 | - ``bl1/bl1_1/shared_lib/interface/crypto.h`` |
| 275 | - ``bl1/bl1_1/shared_lib/interface/otp.h`` |
| 276 | |
| 277 | If the platform integrates a CryptoCell-312, then it can reuse the existing |
| 278 | implementation. |
| 279 | |
| 280 | *********** |
| 281 | BL1 Testing |
| 282 | *********** |
| 283 | |
| 284 | New tests have been written to test both the HAL implementation, and the |
| 285 | integration of those functions for verifying images. These tests are stored in |
| 286 | the ``tf-m-tests`` repository, under the ``test/bl1/`` directory, and further |
| 287 | subdivided into BL1_1 and BL1_2 tests. |
| 288 | |
| 289 | -------------- |
| 290 | |
Raef Coles | a5d031b | 2023-07-04 10:45:33 +0100 | [diff] [blame] | 291 | *Copyright (c) 2022-2023, Arm Limited. All rights reserved.* |