Blame - docs/design/memory-management.rst - TF-RMM/tf-rmm

blob: 8164e52c95baf0cc11c97857b6ee794712e89851 [file] [log] [blame]

Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	1	.. SPDX-License-Identifier: BSD-3-Clause
				2	.. SPDX-FileCopyrightText: Copyright TF-RMM Contributors.
				3
				4	MMU setup and memory management design in RMM
				5	=============================================
				6
				7	This document describes how the MMU is setup and how memory is managed
				8	by the \|RMM\| implementation.
				9
				10	Physical Address Space
				11	----------------------
				12
				13	The Realm Management Extension (``FEAT_RME``) defines four Physical Address
				14	Spaces (PAS):
				15
				16	- Non-secure
				17	- Secure
				18	- Realm
				19	- Root
				20
				21	\|RMM\| code and \|RMM\| data are in Realm PAS memory, loaded and allocated to
Mate Toth-Pal	51bf2fa	2024-01-09 12:32:50 +0100	[diff] [blame]	22	Realm PAS at boot time by the EL3 Firmware. This is a static carveout and it
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	23	is never changed during the lifetime of the system.
				24
				25	The size of the \|RMM\| data is fixed at build time. The majority of this is the
				26	granules array (see `Granule state tracking`_ below), whose size is
				27	configurable and proportional to the maximum amount of delegable DRAM supported
				28	by the system.
				29
				30	Realm data and metadata are in Realm PAS memory, which is delegated to the
				31	Realm PAS by the Host at runtime. The \|RMM\| ABI ensures that this memory cannot
				32	be returned to Non-secure PAS ("undelegated") while it is in use by the
				33	\|RMM\| or by a Realm.
				34
				35	NS data is in Non-secure PAS memory. The Host is able to change the PAS
				36	of this memory while it is being accessed by the \|RMM\|. Consequently, the
				37	\|RMM\| must be able to handle a Granule Protection Fault (GPF) while accessing
				38	NS data as part of RMI handling.
				39
				40	Granule state tracking
				41	----------------------
				42
				43	The \|RMM\| manages a data structure called the `granules` array, which is
				44	stored in \|RMM\| data memory.
				45
				46	The `granules` array contains one entry for every Granule of physical
				47	memory which was in Non-secure PAS at \|RMM\| boot and can be delegated.
				48
				49	Each entry in the `granules` array contains a field `granule_state` which
				50	records the state of the Granule and which can be one of the states as
				51	listed below:
				52
				53	- NS: Not Realm PAS (i.e. Non-secure PAS, Root PAS or Secure PAS)
				54	- Delegated: Realm PAS, but not yet assigned a purpose as either Realm
				55	data or Realm metadata
				56	- RD: Realm Descriptor
				57	- REC: Realm Execution Context
				58	- REC aux: Auxiliary storage for REC
				59	- Data: Realm data
				60	- RTT: Realm Stage 2 translation tables
				61
				62	As part of RMI SMC handling, the state of the granule can be a pre-condition
Mate Toth-Pal	51bf2fa	2024-01-09 12:32:50 +0100	[diff] [blame]	63	and undergo transition to a new state. For more details on the various granule
				64	states and their transitions, please refer to the
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	65	`Realm Management Monitor (RMM) Specification`_.
				66
				67	For further details, see:
				68
				69	- ``enum granule_state``
				70	- ``struct granule``
				71
				72	RMM stage 1 translation regime
				73	------------------------------
				74
				75	\|RMM\| uses the ``FEAT_VHE`` extension to split the 64-bit VA space into two
				76	address spaces as shown in the figure below:
				77
				78	\|full va space\|
				79
				80	- The Low VA range: it expands from VA 0x0 up to the maximum VA size
				81	configured for the region (with a maximum VA size of 48 bits or 52 bits
				82	if ``FEAT_LPA2`` is supported). This range is used to map the \|RMM\| Runtime
				83	(code, data, shared memory with EL3-FW and any other platform mappings).
				84	- The High VA range: It expands from VA 0xFFFF_FFFF_FFFF_FFFF all the way down
				85	to an address corresponding to the maximum VA size configured for the region.
				86	This region is used by the `Stage 1 High VA - Slot Buffer mechanism`_
				87	as well as the `Per-CPU stack mapping`_.
				88
				89	There is a range of invalid addresses between both ranges that is not mapped to
				90	any of them as shown in the figure above. TCR_EL2.TxSZ fields controls the
				91	maximum VA size of each region and \|RMM\| configures this field to fit the
				92	mappings used for each region.
				93
				94	The 2 VA ranges are used for 2 different purposes in RMM as described below.
				95
				96	Stage 1 Low VA range
				97	^^^^^^^^^^^^^^^^^^^^
				98
Mate Toth-Pal	51bf2fa	2024-01-09 12:32:50 +0100	[diff] [blame]	99	The Low VA range is used to create static mappings which are shared across all
				100	the CPUs. It encompasses the RMM executable binary memory and the EL3 Shared
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	101	memory region.
				102
				103	The RMM Executable binary memory consists of code, RO data and RW data. Note
				104	that the stage 1 translation tables for the Low Region are kept in RO data, so
				105	that once the MMU is enabled, the tables mappings are protected from further
				106	modification.
				107
				108	The EL3 shared memory, which is allocated by the EL3 Firmware, is used by the
				109	`RMM-EL3 communications interface`_. A pointer to the beginning of this area
				110	is received by \|RMM\| during initialization. \|RMM\| will then map the region in
				111	the .rw area.
				112
				113	The Low VA range is setup by the platform layer as part of platform
				114	initialization.
				115
				116	The following mappings belong to the Low VA Range:
				117
				118	- RMM_CODE
				119	- RMM_RO
				120	- RMM_RW
				121	- RMM_SHARED
				122
				123	Per-platform mappings can also be added if needed, such as the UART for the
				124	FVP platform.
				125
				126	Stage 1 High VA range
				127	^^^^^^^^^^^^^^^^^^^^^
				128
				129	The High VA range is used to create dynamic per-CPU mappings. The tables used
				130	for this are private to each CPU and hence it is possible for every CPU to map
				131	a different PA at a specific VA. This property is used by the `slot-buffer`
				132	mechanism as described later.
				133
				134	In order to allow the mappings for this region to be dynamic, its translation
				135	tables are stored in the RW section of \|RMM\|, allowing for it to be
				136	modified as needed.
				137
				138	For more details see ``xlat_high_va.c`` file of the xlat library.
				139
				140	The diagram below shows the memory layout for the High VA region.
				141
				142	\|high va region\|
				143
				144	Stage 1 High VA - Slot Buffer mechanism
				145	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				146
				147	The \|RMM\| provides a dynamic mapping mechanism called `slot-buffer` in the
				148	high VA region. The assigned VA space for `slot-buffer` is divided into `slots`
				149	of GRANULE_SIZE each.
				150
				151	The \|RMM\| has a fixed number of `slots` per CPU. Each `slot` is used to map
				152	memory of a particular category. The \|RMM\| validates that the target physical
				153	granule to be mapped is of the expected `granule_state` by looking up the
				154	corresponding entry in `granules` array.
				155
				156	The `slot-buffer` mechanism has `slots` for mapping memory of the following
				157	types:
				158
				159	- Realm metadata: These correspond to the specific Realm and Realm
				160	Execution context scheduled on the PE. These mappings are usually only
				161	valid during the execution of an RMI or RSI handlers and are removed
				162	afterwards. These include Realm Descriptors (RDs), Realm Execution
				163	Contexts (RECs), Realm Translation Tables (RTTs).
				164
				165	- NS data: RMM needs to map NS memory as part of RMIs to access parameters
				166	passed by the Host or to return arguments to the Host. RMM also needs
				167	to copy Data provided by the Host as part of populating the Realm
				168	data memory.
				169
				170	- Realm data: RMM sometimes needs to temporarily map Realm data memory
				171	during Realm creation in order to load the Realm image or access buffers
				172	specified by the Realm as part of RSI commends.
				173
				174	The `slot-buffer` design avoids the need for generic allocation of VA space.
				175	The rationalization of all mappings ever needed for managing a realm via
				176	`slots` is only possible due to the simple nature of the \|RMM\| design - in
				177	particular, the fact that it is possible to statically determine the types
				178	of objects which need to be mapped into the \|RMM\|'s address space, and the
				179	maximum number of objects of a given type which need to be mapped at any point
				180	in time.
				181
				182	During Realm entry and Realm exit, the RD is mapped in the "RD" buffer
				183	slot. Once Realm entry or Realm exit is complete, this mapping is
				184	removed. The RD is not mapped during Realm execution.
				185
				186	The REC and the `rmi_rec_run` data structures are both mapped during Realm
				187	execution.
				188
				189	As the `slots` are mapped on the High VA region, each CPU
				190	has its own private translation tables for such mappings, which means
				191	that a particular slot has a fixed VA on every CPU. Since the Translation
				192	tables are private to a CPU, the mapping to the slot is private to the CPU.
				193	This allows the interruption and migration of a REC (vCPU) to another CPU with
				194	live memory allocations in RMM. An example of this scenario is when the Realm
				195	attestation token is being created in RMM, a pending IRQ can cause RMM to yield
				196	to NS Host with live memory allocations in MbedTLS heap. The NS Host can
				197	schedule the REC on another CPU and, since the mapping for the memory
				198	allocations remain at the same VA, the interrupted realm token creation can
				199	continue.
				200
				201	The `slot-buffer` implementation in RMM also has some performance optimizations
				202	like caching of TTE's to avoid walking the Stage 1 translation tables for every
				203	map and unmap operation.
				204
				205	As an alternative to using dynamic mappings as required for the RMI command,
				206	the approach of maintaining static mappings for all physical memory was
Mate Toth-Pal	51bf2fa	2024-01-09 12:32:50 +0100	[diff] [blame]	207	considered, but rejected on the grounds that this could permit arbitrary
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	208	memory access for an attacker who is able to subvert \|RMM\| execution.
				209
				210	The xlat lib APIs are used by the `slot-buffer` to create dynamic mappings.
				211	These dynamic mappings are stored in the high VA region's ``xlat_ctx``
				212	structure and marked by the xlat library as TRANSIENT. This helps xlat lib to
				213	distinguish valid Translation Table Entries from invalid ones as otherwise the
				214	unmapped dynamic TTEs would be identical to INVALID ones.
				215
				216	For further details, see:
				217
				218	- ``enum buffer_slot``
				219	- ``lib/realm/src/buffer.c``
				220
				221	Per-CPU stack mapping
				222	~~~~~~~~~~~~~~~~~~~~~
				223
				224	Each CPU maps its stack to the High VA region which means that the stack has
				225	same VA on all the CPUs and it is private to the CPU. At boot time, each CPU
				226	calculates the PA for the start of the stack and maps it to the designated
				227	High VA address space.
				228
				229	The per-CPU VA mapping also includes a gap at the end of the stack VA to detect
				230	any stack underflows. The gap has a page size.
				231
Javier Almansa Sobrino	1b61c47	2023-10-26 15:43:49 +0100	[diff] [blame]	232	\|RMM\| also uses a separate Per-CPU stack to handle exceptions and faults.
				233	This stack is allocated below the general one, and it allows for \|RMM\| to be
				234	able to handle a stack overflow fault. There is another page gap of unmapped
				235	memory between both stacks to harden security.
				236
				237	The rest of the VA space available below the exception stack is unused and
				238	therefore left unmapped. The stage 1 translation library will not allow to map
				239	anything there.
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	240
				241	Stage 1 translation library (xlat library)
				242	------------------------------------------
				243
				244	The \|RMM\| stage 1 translation management is taken care of by the xlat library.
				245	This library is able to support up to 52-bit addresses and 5 levels of
				246	translation (when ``FEAT_LPA2`` is enabled).
				247
				248	The xlat library is designed to be stateless and it uses the abstraction of
				249	`translation context`, modelled through the ``struct xlat_ctx``. A translation
				250	context stores all the information related to a given VA space, such as the
				251	translation tables, the VA configuration used to initialize the context and any
				252	internal status related to such VA. Once a context has been initialized, its
				253	VA configuration cannot be modified.
				254
				255	At the moment, although the xlat library supports creation of multiple
				256	contexts, it assumes that the caller will only use a single context per
				257	CPU for a given VA region. The library does not offer support to switch
				258	contexts on a CPU at run time. A context can be shared by several CPUs if they
				259	share the same VA configuration and mappings, like on the low va region.
				260
				261	Dynamic mappings can be created by specifying the ``TRANSIENT`` flag. The
				262	high VA region create dynamic mappings using this flag.
				263
				264	For further details, see ``lib/xlat``.
				265
				266	RMM executable bootstrap
				267	------------------------
				268
				269	The \|RMM\| is loaded as a .bin file by the EL3 loader. The size of the sections
				270	in the \|RMM\| binary as well as the placing of \|RMM\| code and data into
				271	appropriate sections is controlled by the linker script in the source tree.
				272
				273	Platform initialization code takes care of importing the linker symbols
				274	that define the boundaries of the different sections and creates static
				275	memory mappings that are then used to initialize an ``xlat_ctx`` structure
				276	for the low VA region. The RMM binary sections are flat-mapped and are shared
				277	across all the CPUs on the system. In addition, as \|RMM\| is compiled as a
Mate Toth-Pal	51bf2fa	2024-01-09 12:32:50 +0100	[diff] [blame]	278	Position Independent Executable (PIE) at address 0x0, the Global Offset
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	279	Table (GOT) and other relocations in the binary are fixed up with the right
				280	offsets as part of boot. This allows RMM to be run at any physical address as
				281	a PIE regardless of the compile time address.
				282
				283	For further details, see:
				284
				285	- ``runtime/linker.lds``
				286	- ``plat/common/src/plat_common_init.c``
				287	- ``plat/fvp/src/fvp_setup.c``
				288
				289	_______________________________________________________________________________
				290
Javier Almansa Sobrino	b444a97	2023-11-20 17:13:36 +0000	[diff] [blame]	291	.. \|full va space\| image:: ./diagrams/full_va_space_diagram.drawio.png
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	292	:height: 500
Javier Almansa Sobrino	b444a97	2023-11-20 17:13:36 +0000	[diff] [blame]	293	.. \|high va region\| image:: ./diagrams/high_va_memory_map.drawio.png
Javier Almansa Sobrino	7af29bc	2023-01-06 12:32:21 +0000	[diff] [blame]	294	:height: 600
				295	.. _Realm Management Monitor (RMM) Specification: https://developer.arm.com/documentation/den0137/1-0eac5/?lang=en
				296	.. _`RMM-EL3 communications interface`: https://trustedfirmware-a.readthedocs.io/en/latest/components/rmm-el3-comms-spec.html