Javier Almansa Sobrino | 7af29bc | 2023-01-06 12:32:21 +0000 | [diff] [blame^] | 1 | .. SPDX-License-Identifier: BSD-3-Clause |
| 2 | .. SPDX-FileCopyrightText: Copyright TF-RMM Contributors. |
| 3 | |
| 4 | MMU setup and memory management design in RMM |
| 5 | ============================================= |
| 6 | |
| 7 | This document describes how the MMU is setup and how memory is managed |
| 8 | by the |RMM| implementation. |
| 9 | |
| 10 | Physical Address Space |
| 11 | ---------------------- |
| 12 | |
| 13 | The Realm Management Extension (``FEAT_RME``) defines four Physical Address |
| 14 | Spaces (PAS): |
| 15 | |
| 16 | - Non-secure |
| 17 | - Secure |
| 18 | - Realm |
| 19 | - Root |
| 20 | |
| 21 | |RMM| code and |RMM| data are in Realm PAS memory, loaded and allocated to |
| 22 | Realm PAS at boot time by the EL3 Firware. This is a static carveout and it |
| 23 | is never changed during the lifetime of the system. |
| 24 | |
| 25 | The size of the |RMM| data is fixed at build time. The majority of this is the |
| 26 | granules array (see `Granule state tracking`_ below), whose size is |
| 27 | configurable and proportional to the maximum amount of delegable DRAM supported |
| 28 | by the system. |
| 29 | |
| 30 | Realm data and metadata are in Realm PAS memory, which is delegated to the |
| 31 | Realm PAS by the Host at runtime. The |RMM| ABI ensures that this memory cannot |
| 32 | be returned to Non-secure PAS ("undelegated") while it is in use by the |
| 33 | |RMM| or by a Realm. |
| 34 | |
| 35 | NS data is in Non-secure PAS memory. The Host is able to change the PAS |
| 36 | of this memory while it is being accessed by the |RMM|. Consequently, the |
| 37 | |RMM| must be able to handle a Granule Protection Fault (GPF) while accessing |
| 38 | NS data as part of RMI handling. |
| 39 | |
| 40 | Granule state tracking |
| 41 | ---------------------- |
| 42 | |
| 43 | The |RMM| manages a data structure called the `granules` array, which is |
| 44 | stored in |RMM| data memory. |
| 45 | |
| 46 | The `granules` array contains one entry for every Granule of physical |
| 47 | memory which was in Non-secure PAS at |RMM| boot and can be delegated. |
| 48 | |
| 49 | Each entry in the `granules` array contains a field `granule_state` which |
| 50 | records the *state* of the Granule and which can be one of the states as |
| 51 | listed below: |
| 52 | |
| 53 | - NS: Not Realm PAS (i.e. Non-secure PAS, Root PAS or Secure PAS) |
| 54 | - Delegated: Realm PAS, but not yet assigned a purpose as either Realm |
| 55 | data or Realm metadata |
| 56 | - RD: Realm Descriptor |
| 57 | - REC: Realm Execution Context |
| 58 | - REC aux: Auxiliary storage for REC |
| 59 | - Data: Realm data |
| 60 | - RTT: Realm Stage 2 translation tables |
| 61 | |
| 62 | As part of RMI SMC handling, the state of the granule can be a pre-condition |
| 63 | and undergo transtion to a new state. For more details on the various granule |
| 64 | states and their transtions, please refer to the |
| 65 | `Realm Management Monitor (RMM) Specification`_. |
| 66 | |
| 67 | For further details, see: |
| 68 | |
| 69 | - ``enum granule_state`` |
| 70 | - ``struct granule`` |
| 71 | |
| 72 | RMM stage 1 translation regime |
| 73 | ------------------------------ |
| 74 | |
| 75 | |RMM| uses the ``FEAT_VHE`` extension to split the 64-bit VA space into two |
| 76 | address spaces as shown in the figure below: |
| 77 | |
| 78 | |full va space| |
| 79 | |
| 80 | - The Low VA range: it expands from VA 0x0 up to the maximum VA size |
| 81 | configured for the region (with a maximum VA size of 48 bits or 52 bits |
| 82 | if ``FEAT_LPA2`` is supported). This range is used to map the |RMM| Runtime |
| 83 | (code, data, shared memory with EL3-FW and any other platform mappings). |
| 84 | - The High VA range: It expands from VA 0xFFFF_FFFF_FFFF_FFFF all the way down |
| 85 | to an address corresponding to the maximum VA size configured for the region. |
| 86 | This region is used by the `Stage 1 High VA - Slot Buffer mechanism`_ |
| 87 | as well as the `Per-CPU stack mapping`_. |
| 88 | |
| 89 | There is a range of invalid addresses between both ranges that is not mapped to |
| 90 | any of them as shown in the figure above. TCR_EL2.TxSZ fields controls the |
| 91 | maximum VA size of each region and |RMM| configures this field to fit the |
| 92 | mappings used for each region. |
| 93 | |
| 94 | The 2 VA ranges are used for 2 different purposes in RMM as described below. |
| 95 | |
| 96 | Stage 1 Low VA range |
| 97 | ^^^^^^^^^^^^^^^^^^^^ |
| 98 | |
| 99 | The Low VA range is used to create static mappings which are shared accross all |
| 100 | the CPUs. It encompases the RMM executable binary memory and the EL3 Shared |
| 101 | memory region. |
| 102 | |
| 103 | The RMM Executable binary memory consists of code, RO data and RW data. Note |
| 104 | that the stage 1 translation tables for the Low Region are kept in RO data, so |
| 105 | that once the MMU is enabled, the tables mappings are protected from further |
| 106 | modification. |
| 107 | |
| 108 | The EL3 shared memory, which is allocated by the EL3 Firmware, is used by the |
| 109 | `RMM-EL3 communications interface`_. A pointer to the beginning of this area |
| 110 | is received by |RMM| during initialization. |RMM| will then map the region in |
| 111 | the .rw area. |
| 112 | |
| 113 | The Low VA range is setup by the platform layer as part of platform |
| 114 | initialization. |
| 115 | |
| 116 | The following mappings belong to the Low VA Range: |
| 117 | |
| 118 | - RMM_CODE |
| 119 | - RMM_RO |
| 120 | - RMM_RW |
| 121 | - RMM_SHARED |
| 122 | |
| 123 | Per-platform mappings can also be added if needed, such as the UART for the |
| 124 | FVP platform. |
| 125 | |
| 126 | Stage 1 High VA range |
| 127 | ^^^^^^^^^^^^^^^^^^^^^ |
| 128 | |
| 129 | The High VA range is used to create dynamic per-CPU mappings. The tables used |
| 130 | for this are private to each CPU and hence it is possible for every CPU to map |
| 131 | a different PA at a specific VA. This property is used by the `slot-buffer` |
| 132 | mechanism as described later. |
| 133 | |
| 134 | In order to allow the mappings for this region to be dynamic, its translation |
| 135 | tables are stored in the RW section of |RMM|, allowing for it to be |
| 136 | modified as needed. |
| 137 | |
| 138 | For more details see ``xlat_high_va.c`` file of the xlat library. |
| 139 | |
| 140 | The diagram below shows the memory layout for the High VA region. |
| 141 | |
| 142 | |high va region| |
| 143 | |
| 144 | Stage 1 High VA - Slot Buffer mechanism |
| 145 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 146 | |
| 147 | The |RMM| provides a dynamic mapping mechanism called `slot-buffer` in the |
| 148 | high VA region. The assigned VA space for `slot-buffer` is divided into `slots` |
| 149 | of GRANULE_SIZE each. |
| 150 | |
| 151 | The |RMM| has a fixed number of `slots` per CPU. Each `slot` is used to map |
| 152 | memory of a particular category. The |RMM| validates that the target physical |
| 153 | granule to be mapped is of the expected `granule_state` by looking up the |
| 154 | corresponding entry in `granules` array. |
| 155 | |
| 156 | The `slot-buffer` mechanism has `slots` for mapping memory of the following |
| 157 | types: |
| 158 | |
| 159 | - Realm metadata: These correspond to the specific Realm and Realm |
| 160 | Execution context scheduled on the PE. These mappings are usually only |
| 161 | valid during the execution of an RMI or RSI handlers and are removed |
| 162 | afterwards. These include Realm Descriptors (RDs), Realm Execution |
| 163 | Contexts (RECs), Realm Translation Tables (RTTs). |
| 164 | |
| 165 | - NS data: RMM needs to map NS memory as part of RMIs to access parameters |
| 166 | passed by the Host or to return arguments to the Host. RMM also needs |
| 167 | to copy Data provided by the Host as part of populating the Realm |
| 168 | data memory. |
| 169 | |
| 170 | - Realm data: RMM sometimes needs to temporarily map Realm data memory |
| 171 | during Realm creation in order to load the Realm image or access buffers |
| 172 | specified by the Realm as part of RSI commends. |
| 173 | |
| 174 | The `slot-buffer` design avoids the need for generic allocation of VA space. |
| 175 | The rationalization of all mappings ever needed for managing a realm via |
| 176 | `slots` is only possible due to the simple nature of the |RMM| design - in |
| 177 | particular, the fact that it is possible to statically determine the types |
| 178 | of objects which need to be mapped into the |RMM|'s address space, and the |
| 179 | maximum number of objects of a given type which need to be mapped at any point |
| 180 | in time. |
| 181 | |
| 182 | During Realm entry and Realm exit, the RD is mapped in the "RD" buffer |
| 183 | slot. Once Realm entry or Realm exit is complete, this mapping is |
| 184 | removed. The RD is not mapped during Realm execution. |
| 185 | |
| 186 | The REC and the `rmi_rec_run` data structures are both mapped during Realm |
| 187 | execution. |
| 188 | |
| 189 | As the `slots` are mapped on the High VA region, each CPU |
| 190 | has its own private translation tables for such mappings, which means |
| 191 | that a particular slot has a fixed VA on every CPU. Since the Translation |
| 192 | tables are private to a CPU, the mapping to the slot is private to the CPU. |
| 193 | This allows the interruption and migration of a REC (vCPU) to another CPU with |
| 194 | live memory allocations in RMM. An example of this scenario is when the Realm |
| 195 | attestation token is being created in RMM, a pending IRQ can cause RMM to yield |
| 196 | to NS Host with live memory allocations in MbedTLS heap. The NS Host can |
| 197 | schedule the REC on another CPU and, since the mapping for the memory |
| 198 | allocations remain at the same VA, the interrupted realm token creation can |
| 199 | continue. |
| 200 | |
| 201 | The `slot-buffer` implementation in RMM also has some performance optimizations |
| 202 | like caching of TTE's to avoid walking the Stage 1 translation tables for every |
| 203 | map and unmap operation. |
| 204 | |
| 205 | As an alternative to using dynamic mappings as required for the RMI command, |
| 206 | the approach of maintaining static mappings for all physical memory was |
| 207 | considered, but rejected on thegrounds that this could permit arbitrary |
| 208 | memory access for an attacker who is able to subvert |RMM| execution. |
| 209 | |
| 210 | The xlat lib APIs are used by the `slot-buffer` to create dynamic mappings. |
| 211 | These dynamic mappings are stored in the high VA region's ``xlat_ctx`` |
| 212 | structure and marked by the xlat library as *TRANSIENT*. This helps xlat lib to |
| 213 | distinguish valid Translation Table Entries from invalid ones as otherwise the |
| 214 | unmapped dynamic TTEs would be identical to INVALID ones. |
| 215 | |
| 216 | For further details, see: |
| 217 | |
| 218 | - ``enum buffer_slot`` |
| 219 | - ``lib/realm/src/buffer.c`` |
| 220 | |
| 221 | Per-CPU stack mapping |
| 222 | ~~~~~~~~~~~~~~~~~~~~~ |
| 223 | |
| 224 | Each CPU maps its stack to the High VA region which means that the stack has |
| 225 | same VA on all the CPUs and it is private to the CPU. At boot time, each CPU |
| 226 | calculates the PA for the start of the stack and maps it to the designated |
| 227 | High VA address space. |
| 228 | |
| 229 | The per-CPU VA mapping also includes a gap at the end of the stack VA to detect |
| 230 | any stack underflows. The gap has a page size. |
| 231 | |
| 232 | The rest of the VA space available below the stack is unused and therefore left |
| 233 | unmapped. The stage 1 translation library will not allow to map anything there. |
| 234 | |
| 235 | Stage 1 translation library (xlat library) |
| 236 | ------------------------------------------ |
| 237 | |
| 238 | The |RMM| stage 1 translation management is taken care of by the xlat library. |
| 239 | This library is able to support up to 52-bit addresses and 5 levels of |
| 240 | translation (when ``FEAT_LPA2`` is enabled). |
| 241 | |
| 242 | The xlat library is designed to be stateless and it uses the abstraction of |
| 243 | `translation context`, modelled through the ``struct xlat_ctx``. A translation |
| 244 | context stores all the information related to a given VA space, such as the |
| 245 | translation tables, the VA configuration used to initialize the context and any |
| 246 | internal status related to such VA. Once a context has been initialized, its |
| 247 | VA configuration cannot be modified. |
| 248 | |
| 249 | At the moment, although the xlat library supports creation of multiple |
| 250 | contexts, it assumes that the caller will only use a single context per |
| 251 | CPU for a given VA region. The library does not offer support to switch |
| 252 | contexts on a CPU at run time. A context can be shared by several CPUs if they |
| 253 | share the same VA configuration and mappings, like on the low va region. |
| 254 | |
| 255 | Dynamic mappings can be created by specifying the ``TRANSIENT`` flag. The |
| 256 | high VA region create dynamic mappings using this flag. |
| 257 | |
| 258 | For further details, see ``lib/xlat``. |
| 259 | |
| 260 | RMM executable bootstrap |
| 261 | ------------------------ |
| 262 | |
| 263 | The |RMM| is loaded as a .bin file by the EL3 loader. The size of the sections |
| 264 | in the |RMM| binary as well as the placing of |RMM| code and data into |
| 265 | appropriate sections is controlled by the linker script in the source tree. |
| 266 | |
| 267 | Platform initialization code takes care of importing the linker symbols |
| 268 | that define the boundaries of the different sections and creates static |
| 269 | memory mappings that are then used to initialize an ``xlat_ctx`` structure |
| 270 | for the low VA region. The RMM binary sections are flat-mapped and are shared |
| 271 | across all the CPUs on the system. In addition, as |RMM| is compiled as a |
| 272 | Position Independed Executable (PIE) at address 0x0, the Global Offset |
| 273 | Table (GOT) and other relocations in the binary are fixed up with the right |
| 274 | offsets as part of boot. This allows RMM to be run at any physical address as |
| 275 | a PIE regardless of the compile time address. |
| 276 | |
| 277 | For further details, see: |
| 278 | |
| 279 | - ``runtime/linker.lds`` |
| 280 | - ``plat/common/src/plat_common_init.c`` |
| 281 | - ``plat/fvp/src/fvp_setup.c`` |
| 282 | |
| 283 | _______________________________________________________________________________ |
| 284 | |
| 285 | .. |full va space| image:: ./diagrams/full_va_space_diagram.png |
| 286 | :height: 500 |
| 287 | .. |high va region| image:: ./diagrams/high_va_memory_map.png |
| 288 | :height: 600 |
| 289 | .. _Realm Management Monitor (RMM) Specification: https://developer.arm.com/documentation/den0137/1-0eac5/?lang=en |
| 290 | .. _`RMM-EL3 communications interface`: https://trustedfirmware-a.readthedocs.io/en/latest/components/rmm-el3-comms-spec.html |