Blame - docs/architecture/psa-shared-memory.md - mirror/mbed-tls

blob: 7e2c280202ac0b69a6cbe6c03fbf8bb0b0225d86 [file] [log] [blame] [view]

Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	1	PSA API functions and shared memory
				2	===================================
				3
				4	## Introduction
				5
				6	This document discusses the security architecture of systems where PSA API functions might receive arguments that are in memory that is shared with an untrusted process. On such systems, the untrusted process might access a shared memory buffer while the cryptography library is using it, and thus cause unexpected behavior in the cryptography code.
				7
				8	### Core assumptions
				9
				10	We assume the following scope limitations:
				11
				12	* Only PSA Crypto API functions are in scope (including Mbed TLS extensions to the official API specification). Legacy crypto, X.509, TLS, or any other function which is not called `psa_xxx` is out of scope.
				13	* We only consider [input buffers](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#input-buffer-sizes) and [output buffers](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#output-buffer-sizes). Any other data is assumed to be in non-shared memory.
				14
				15	## System architecture discussion
				16
				17	### Architecture overview
				18
Gilles Peskine	8daedae	2023-10-13 18:47:29 +0200	[diff] [blame]	19	We consider a system that has memory separation between partitions: a partition can't access another partition's memory directly. Partitions are meant to be isolated from each other: a partition may only affect the integrity of another partition via well-defined system interfaces. For example, this can be a Unix/POSIX-like system that isolates processes, or isolation between the secure world and the non-secure world relying on a mechanism such as TrustZone, or isolation between secure-world applications on such a system.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	20
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	21	More precisely, we consider such a system where our PSA Crypto implementation is running inside one partition, called the crypto service. The crypto service receives remote procedure calls (RPC) from other partitions, validates their arguments (e.g. validation of key identifier ownership), and calls a PSA Crypto API function. This document is concerned with environments where the arguments passed to a PSA Crypto API function may be in shared memory (as opposed to environments where the inputs are always copied into memory that is solely accessible by the crypto service before calling the API function, and likewise with output buffers after the function returns).
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	22
Gilles Peskine	8daedae	2023-10-13 18:47:29 +0200	[diff] [blame]	23	When the data is accessible to another partition, there is a risk that this other partition will access it while the crypto implementation is working. Although this could be prevented by suspending the whole system while crypto is working, such a limitation is rarely desirable and most systems don't offer a way to do it. (Even systems that have absolute thread priorities, and where crypto has a higher priority than any untrusted partition, may be vulnerable due to having multiple cores or asynchronous data transfers with peripherals.)
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	24
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	25	The crypto service must guarantee that it behaves as if the rest of the world was suspended while it is executed. A behavior that is only possible if an untrusted entity accesses a buffer while the crypto service is processing the data is a security violation.
				26
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	27	### Risks and vulnerabilities
				28
				29	We consider a security architecture with two or three entities:
				30
				31	* a crypto service, which offers PSA crypto API calls over RPC (remote procedure call) using shared memory for some input or output arguments;
				32	* a client of the crypto service, which makes a RPC to the crypto service;
				33	* in some scenarios, a client of the client, which makes a RPC to the crypto client which re-shares the memory with the crypto service.
				34
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	35	The behavior of RPC is defined for in terms of values of inputs and outputs. This models an ideal world where the content of input and output buffers is not accessible outside the crypto service while it is processing an RPC. It is a security violation if the crypto service behaves in a way that cannot be achieved by setting the inputs before the RPC call, and reading the outputs after the RPC call is finished.
				36
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	37	#### Read-read inconsistency
				38
				39	If an input argument is in shared memory, there is a risk of a read-read inconsistency:
				40
				41	1. The crypto code reads part of the input and validates it, or injects it into a calculation.
				42	2. The client (or client's client) modifies the input.
				43	3. The crypto code reads the same part again, and performs an action which would be impossible if the input had had the same value all along.
				44
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	45	Vulnerability example (parsing): suppose the input contains data with a type-length-value or length-value encoding (for example, importing an RSA key). The crypto code reads the length field and checks that it fits within the buffer. (This could be the length of the overall data, or the length of an embedded field) Later, the crypto code reads the length again and uses it without validation. A malicious client can modify the length field in the shared memory between the two reads and thus cause a buffer overread on the second read.
				46
Gilles Peskine	a3ce643	2023-10-16 15:39:37 +0200	[diff] [blame]	47	Vulnerability example (dual processing): consider an RPC to perform authenticated encryption, using a mechanism with an encrypt-and-MAC structure. The authenticated encryption implementation separately calculates the ciphertext and the MAC from the plaintext. A client sets the plaintext input to `"PPPP"`, then starts the RPC call, then changes the input buffer to `"QQQQ"` while the crypto service is working.
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	48
				49	* Any of `enc("PPPP")+mac("PPPP")`, `enc("PPQQ")+mac("PPQQ")` or `enc("QQQQ")+mac("QQQQ")` are valid outputs: they are outputs that can be produced by this authenticated encryption RPC.
				50	* If the authenticated encryption calculates the ciphertext before the client changes the output buffer and calculates the MAC after that change, reading the input buffer again each time, the output will be `enc("PPPP")+mac("QQQQ")`. There is no input that can lead to this output, hence this behavior violates the security guarantees of the crypto service.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	51
				52	#### Write-read inconsistency
				53
				54	If an output argument is in shared memory, there is a risk of a write-read inconsistency:
				55
				56	1. The crypto code writes some intermediate data into the output buffer.
				57	2. The client (or client's client) modifies the intermediate data.
Gilles Peskine	8daedae	2023-10-13 18:47:29 +0200	[diff] [blame]	58	3. The crypto code reads the intermediate data back and continues the calculation, leading to an outcome that would not be possible if the intermediate data had not been modified.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	59
				60	Vulnerability example: suppose that an RSA signature function works by formatting the data in place in the output buffer, then applying the RSA private-key operation in place. (This is how `mbedtls_rsa_pkcs1_sign` works.) A malicious client may write badly formatted data into the buffer, so that the private-key operation is not a valid signature (e.g. it could be a decryption), violating the RSA key's usage policy.
				61
				62	Vulnerability example with chained calls: we consider the same RSA signature operation as before. In this example, we additionally assume that the data to sign comes from an attestation application which signs some data on behalf of a final client: the key and the data to sign are under the attestation application's control, and the final client must not be able to obtain arbitrary signatures. The final client shares an output buffer for the signature with the attestation application, and the attestation application re-shares this buffer with the crypto service. A malicious final client can modify the intermediate data and thus sign arbitrary data.
				63
				64	#### Write-write disclosure
				65
				66	If an output argument is in shared memory, there is a risk of a write-write disclosure:
				67
				68	1. The crypto code writes some intermediate data into the output buffer. This intermediate data must remain confidential.
				69	2. The client (or client's client) reads the intermediate data.
				70	3. The crypto code overwrites the intermediate data.
				71
Gilles Peskine	60c453e	2023-10-13 19:07:56 +0200	[diff] [blame]	72	Vulnerability example with chained calls (temporary exposure): an application encrypts some data, and lets its clients store the ciphertext. Clients may not have access to the plaintext. To save memory, when it calls the crypto service, it passes an output buffer that is in the final client's memory. Suppose the encryption mechanism works by copying its input to the output buffer then encrypting in place (for example, to simplify considerations related to overlap, or because the implementation relies on a low-level API that works in place). In this scenario, the plaintext is exposed to the final client while the encryption in progress, which violates the confidentiality of the plaintext.
				73
				74	Vulnerability example with chained calls (backtrack): we consider a provisioning application that provides a data encryption service on behalf of multiple clients, using a single shared key. Clients are not allowed to access each other's data. The provisioning application isolates clients by including the client identity in the associated data. Suppose that an AEAD decryption function processes the ciphertext incrementally by simultaneously writing the plaintext to the output buffer and calculating the tag. (This is how AEAD decryption usually works.) At the end, if the tag is wrong, the decryption function wipes the output buffer. Assume that the output buffer for the plaintext is shared from the client to the provisioning application, which re-shares it with the crypto service. A malicious client can read another client (the victim)'s encrypted data by passing the ciphertext to the provisioning application, which will attempt to decrypt it with associated data identifying the requesting client. Although the operation will fail beacuse the tag is wrong, the malicious client still reads the victim plaintext.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	75
Gilles Peskine	db00543	2023-10-13 19:57:53 +0200	[diff] [blame]	76	#### Write-read feedback
				77
				78	If a function both has an input argument and an output argument in shared memory, and processes its input incrementally to emit output incrementally, the following sequence of events is possible:
				79
				80	1. The crypto code processes part of the input and writes the corresponding part of the output.
				81	2. The client reads the early output and uses that to calculate the next part of the input.
				82	3. The crypto code processes the rest of the input.
				83
				84	There are cryptographic mechanisms for which this breaks security properties. An example is [CBC encryption](https://link.springer.com/content/pdf/10.1007/3-540-45708-9_2.pdf): if the client can choose the content of a plaintext block after seeing the immediately preceding ciphertext block, this gives the client a decryption oracle. This is a security violation if the key policy only allowed the client to encrypt, not to decrypt.
				85
				86	TODO: is this a risk we want to take into account? Although this extends the possible behaviors of the one-shot interface, the client can do the same thing legitimately with the multipart interface.
				87
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	88	### Possible countermeasures
				89
				90	In this section, we briefly discuss generic countermeasures.
				91
				92	#### Copying
				93
				94	Copying is a valid countermeasure. It is conceptually simple. However, it is often unattractive because it requires additional memory and time.
				95
				96	Note that although copying is very easy to write into a program, there is a risk that a compiler (especially with whole-program optimization) may optimize the copy away, if it does not understand that copies between shared memory and non-shared memory are semantically meaningful.
				97
				98	Example: the PSA Firmware Framework 1.0 forbids shared memory between partitions. This restriction is lifted in version 1.1 due to concerns over RAM usage.
				99
				100	#### Careful accesses
				101
Gilles Peskine	db00543	2023-10-13 19:57:53 +0200	[diff] [blame]	102	The following rules guarantee that shared memory cannot result in a security violation other than [write-read feedback](#write-read feedback):
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	103
				104	* Never read the same input twice at the same index.
				105	* Never read back from an output.
Gilles Peskine	352095c	2023-10-13 19:56:22 +0200	[diff] [blame]	106	* Never write to the output twice at the same index.
				107	* This rule can usefully be relaxed in many circumstances. It is ok to write data that is independent of the inputs (and not otherwise confidential), then overwrite it. For example, it is ok to zero the output buffer before starting to process the input.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	108
				109	These rules are very difficult to enforce.
				110
				111	Example: these are the rules that a GlobalPlatform TEE Trusted Application (application running on the secure side of TrustZone on Cortex-A) must follow.
				112
				113	## Protection requirements
				114
				115	### Responsibility for protection
				116
Gilles Peskine	8daedae	2023-10-13 18:47:29 +0200	[diff] [blame]	117	A call to a crypto service to perform a crypto operation involves the following components:
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	118
				119	1. The remote procedure call framework provided by the operating system.
				120	2. The code of the crypto service.
				121	3. The code of the PSA Crypto dispatch layer (also known as the core), which is provided by Mbed TLS.
Gilles Peskine	2859267	2023-10-13 20:01:36 +0200	[diff] [blame]	122	4. The driver implementing the cryptographic mechanism, which may be provided by Mbed TLS (built-in driver) or by a third-party driver.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	123
				124	The [PSA Crypto API specification](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#stability-of-parameters) puts the responsibility for protection on the implementation of the PSA Crypto API, i.e. (3) or (4).
				125
				126	> In an environment with multiple threads or with shared memory, the implementation carefully accesses non-overlapping buffer parameters in order to prevent any security risk resulting from the content of the buffer being modified or observed during the execution of the function. (...)
				127
				128	In Mbed TLS 2.x and 3.x up to and including 3.5.0, there is no defense against buffers in shared memory. The responsibility shifts to (1) or (2), but this is not documented.
				129
				130	In the remainder of this chapter, we will discuss how to implement this high-level requirement where it belongs: inside the implementation of the PSA Crypto API. Note that this allows two possible levels: in the dispatch layer (independently of the implementation of each mechanism) or in the driver (specific to each implementation).
				131
				132	#### Protection in the dispatch layer
				133
				134	The dispatch layer has no control over how the driver layer will access buffers. Therefore the only possible protection at this layer method is to ensure that drivers have no access to shared memory. This means that any buffer located in shared memory must be copied into or out of a buffer in memory owned by the crypto service (heap or stack). This adds inefficiency, mostly in terms of RAM usage.
				135
				136	For buffers with a small static size limit, this is something we often do for convenience, especially with output buffers. However, as of Mbed TLS 3.5.0, it is not done systematically.
				137
				138	It is ok to skip the copy if it is known for sure that a buffer is not in shared memory. However, the location of the buffer is not under the control of Mbed TLS. This means skipping the copy would have to be a compile-time or run-time option which has to be set by the application using Mbed TLS. This is both an additional maintenance cost (more code to analyze, more testing burden), and a residual security risk in case the party who is responsible for setting this option does not set it correctly. As a consequence, Mbed TLS will not offer this configurability unless there is a compelling argument.
				139
				140	#### Protection in the driver layer
				141
				142	Putting the responsibility for protection in the driver layer increases the overall amount of work since there are more driver implementations than dispatch implementations. (This is true even inside Mbed TLS: almost all API functions have multiple underlying implementations, one for each algorithm.) It also increases the risk to the ecosystem since some drivers might not protect correctly. Therefore having drivers be responsible for protection is only a good choice if there is a definite benefit to it, compared to allocating an internal buffer and copying. An expected benefit in some cases is that there are practical protection methods other than copying.
				143
				144	Some cryptographic mechanisms are naturally implemented by processing the input in a single pass, with a low risk of ever reading the same byte twice, and by writing the final output directly into the output buffer. For such mechanism, it is sensible to mandate that drivers respect these rules.
				145
				146	In the next section, we will analyze how susceptible various cryptographic mechanisms are to shared memory vulnerabilities.
				147
				148	### Susceptibility of different mechanisms
				149
				150	#### Operations involving small buffers
				151
				152	For operations involving small buffers, the cost of copying is low. For many of those, the risk of not copying is high:
				153
				154	* Any parsing of formatted data has a high risk of [read-read inconsistency](#read-read-inconsistency).
				155	* An internal review shows that for RSA operations, it is natural for an implementation to have a [write-read inconsistency](#write-read-inconsistency) or a [write-write disclosure](#write-write-disclosure).
				156
				157	Note that in this context, a “small buffer” is one with a size limit that is known at compile time, and small enough that copying the data is not prohibitive. For example, an RSA key fits in a small buffer. A hash input is not a small buffer, even if it happens to be only a few bytes long in one particular call.
				158
				159	The following buffers are considered small buffers:
				160
Gilles Peskine	35de1f7	2023-10-13 20:04:16 +0200	[diff] [blame]	161	* Any input or output directly related to asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
				162	* Note that this does not include inputs or outputs that are not processed by an asymmetric primitives, for example the message input to `psa_sign_message` or `psa_verify_message`.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	163	* Cooked key derivation output.
Gilles Peskine	35de1f7	2023-10-13 20:04:16 +0200	[diff] [blame]	164	* The output of a hash or MAC operation.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	165
				166	Design decision: the dispatch layer shall copy all small buffers.
				167
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	168	#### Symmetric cryptography inputs with small output
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	169
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	170	Message inputs to hash, MAC and key derivation operations are at a low risk of [read-read inconsistency](#read-read-inconsistency) because they are unformatted data, and for all specified algorithms, it is natural to process the input one byte at a time.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	171
				172	Design decision: require symmetric cryptography drivers to read their input without a risk of read-read inconsistency.
				173
				174	TODO: what about IV/nonce inputs? They are typically small, but don't necessarily have a static size limit (e.g. GCM recommends a 12-byte nonce, but also allows large nonces).
				175
				176	#### Key derivation outputs
				177
				178	Key derivation typically emits its output as a stream, with no error condition detected after setup other than operational failures (e.g. communication failure with an accelerator) or running out of data to emit (which can easily be checked before emitting any data, since the data size is known in advance).
				179
				180	(Note that this is about raw byte output, not about cooked key derivation, i.e. deriving a structured key, which is considered a [small buffer](#operations-involving-small-buffers).)
				181
				182	Design decision: require key derivation drivers to emit their output without reading back from the output buffer.
				183
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	184	#### Cipher and AEAD
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	185
				186	AEAD decryption is at risk of [write-write disclosure](#write-write-disclosure) when the tag does not match.
				187
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	188	AEAD encryption and decryption are at risk of [read-read inconsistency](#read-read-inconsistency) if they process the input multiple times, which is natural in a number of cases:
				189
				190	* when encrypting with an encrypt-and-authenticate or authenticate-then-encrypt structure (one read to calculate the authentication tag and another read to encrypt);
				191	* when decrypting with an encrypt-then-authenticate structure (one read to decrypt and one read to calculate the authentication tag);
				192	* with SIV modes (not yet present in the PSA API, but likely to come one day) (one full pass to calculate the IV, then another full pass for the core authenticated encryption);
				193
Gilles Peskine	a3ce643	2023-10-16 15:39:37 +0200	[diff] [blame]	194	Cipher and AEAD outputs are at risk of [write-read inconsistency](#write-read-inconsistency) and [write-write disclosure](#write-write-disclosure) if they are implemented by copying the input into the output buffer with `memmove`, then processing the data in place. In particular, this approach makes it easy to fully support overlapping, since `memmove` will take care of overlapping cases correctly, which is otherwise hard to do portably (C99 does not offer an efficient, portable way to check whether two buffers overlap).
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	195
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	196	Design decision: the dispatch layer shall allocate an intermediate buffer for cipher and AEAD plaintext/ciphertext inputs and outputs.
				197
				198	Note that this can be a single buffer for the input and the output if the driver supports in-place operation (which it is supposed to, since it is supposed to support arbitrary overlap, although this is not always the case in Mbed TLS, a [known issue](https://github.com/Mbed-TLS/mbedtls/issues/3266)). A side benefit of doing this intermediate copy is that overlap will be supported.
				199
Gilles Peskine	a3ce643	2023-10-16 15:39:37 +0200	[diff] [blame]	200	For all currently implemented AEAD modes, the associated data is only processed once to calculate an intermediate value of the authentication tag.
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	201
Gilles Peskine	87889eb	2023-10-16 15:40:02 +0200	[diff] [blame]	202	Design decision: for now, require AEAD drivers to read the additional data without a risk of read-read inconsistency. Make a note to revisit this when we start supporting an SIV mode, at which point the dispatch layer shall copy the input for modes that are not known to be low-risk.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	203
Gilles Peskine	35de1f7	2023-10-13 20:04:16 +0200	[diff] [blame]	204	#### Message signature
				205
				206	For signature algorithms with a hash-and-sign framework, the input to sign/verify-message is passed to a hash, and thus can follow the same rules as [symmetric cryptography inputs with small output](#symmetric-cryptography-inputs-with-small-output). This is also true for `PSA_ALG_RSA_PKCS1V15_SIGN_RAW`, which is the only non-hash-and-sign signature mechanism implemented in Mbed TLS 3.5. This is not true for PureEdDSA (`#PSA_ALG_PURE_EDDSA`), which is not yet implemented: [PureEdDSA signature](https://www.rfc-editor.org/rfc/rfc8032#section-5.1.6) processes the message twice. (However, PureEdDSA verification only processes the message once.)
				207
				208	Design decision: for now, require sign/verify-message drivers to read their input without a risk of read-read inconsistency. Make a note to revisit this when we start supporting PureEdDSA, at which point the dispatch layer shall copy the input for algorithms such as PureEdDSA that are not known to be low-risk.
				209
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	210	## Design of shared memory protection
				211
Gilles Peskine	8daedae	2023-10-13 18:47:29 +0200	[diff] [blame]	212	This section explains how Mbed TLS implements the shared memory protection strategy summarized below.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	213
				214	### Shared memory protection strategy
				215
				216	* The core (dispatch layer) shall make a copy of the following buffers, so that drivers do not receive arguments that are in shared memory:
				217	* Any input or output from asymmetric cryptography (signature, encryption/decryption, key exchange, PAKE), including key import and export.
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	218	* Plaintext/ciphertext inputs and outputs for cipher and AEAD.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	219	* The output of a hash or MAC operation.
				220	* Cooked key derivation output.
				221
				222	* A document shall explain the requirements on drivers for arguments whose access needs to be protected:
				223	* Hash and MAC input.
				224	* Cipher/AEAD IV/nonce (to be confirmed).
Gilles Peskine	9cad3b3	2023-10-13 20:03:18 +0200	[diff] [blame]	225	* AEAD associated data (to be confirmed).
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	226	* Key derivation input (excluding key agreement).
				227	* Raw key derivation output (excluding cooked key derivation output).
				228
				229	* The built-in implementations of cryptographic mechanisms with arguments whose access needs to be protected shall protect those arguments.
				230
				231	Justification: see “[Susceptibility of different mechanisms](susceptibility-of-different-mechanisms)”.
				232
				233	### Implementation of copying
				234
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	235	Copy what needs copying. This is broadly straightforward, however there are a few things to consider.
				236
				237	#### Compiler optimization of copies
				238
				239	It is unclear whether the compiler will attempt to optimize away copying operations.
				240
				241	Once the copying code is implemented, it should be evaluated to see whether compiler optimization is a problem. Specifically, for the major compilers and platforms supported by Mbed TLS:
David Horstmann	a72b4ca	2023-10-19 15:22:15 +0100	[diff] [blame]	242	* Write a small program that uses a PSA function which copies inputs or outputs.
				243	* Build the program with link-time optimization / full-program optimization enabled (e.g. `-flto` with `gcc`).
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	244	* Inspect the generated code with `objdump` or a similar tool to see if copying operations are preserved.
				245
				246	If copying behaviour is preserved by all major compilers and platforms then assume that compiler optimization is not a problem.
				247
				248	If copying behaviour is optimized away by the compiler, further investigation is needed. Experiment with using the `volatile` keyword to force the compiler not to optimize accesses to the copied buffers.
				249
				250	Open questions: Will the compiler optimize away copies? If so, can it be prevented from doing so in a portable way?
				251
				252	#### Copying code
				253
				254	We may either copy buffers on an ad-hoc basis using `memcpy()` in each PSA function, or use a unified set of functions for copying input and output data. The advantages of the latter are obvious:
				255
				256	* Any test hooks need only be added in one place.
				257	* Copying code must only be reviewed for correctness in one place, rather than in all functions where it occurs.
				258	* Copy bypass is simpler as we can just replace these functions with no-ops in a single place.
				259	* Any complexity needed to prevent the compiler optimizing copies away does not have to be duplicated.
				260
				261	On the other hand, the only advantage of ad-hoc copying is slightly greater flexibility.
				262
				263	Design decision: Create a unified set of functions for copying input and output data.
				264
				265	#### Copying in multipart APIs
				266
				267	Multipart APIs may follow one of 2 possible approaches for copying of input:
				268
				269	##### 1. Allocate a buffer and copy input on each call to `update()`
				270
				271	This is simple and mirrors the approach for one-shot APIs nicely. However, allocating memory in the middle of a multi-part operation is likely to be bad for performance. Multipart APIs are designed in part for systems that do not have time to perform an operation at once, so introducing poor performance may be a problem here.
				272
				273	Open question: Does memory allocation in `update()` cause a performance problem? If so, to what extent?
				274
				275	##### 2. Allocate a buffer at the start of the operation and subdivide calls to `update()`
				276
				277	In this approach, input and output buffers are allocated at the start of the operation that are large enough to hold the expected average call to `update()`. When `update()` is called with larger buffers than these, the PSA API layer makes multiple calls to the driver, chopping the input into chunks of the temporary buffer size and filling the output from the results until the operation is finished.
				278
				279	This would be more complicated than approach (1) and introduces some extra issues. For example, if one of the intermediate calls to the driver's `update()` returns an error, it is not possible for the driver's state to be rolled back to before the first call to `update()`. It is unclear how this could be solved.
				280
				281	However, this approach would reduce memory usage in some cases and prevent memory allocation during an operation. Additionally, since the input and output buffers would be fixed-size it would be possible to allocate them statically, avoiding the need for any dynamic memory allocation at all.
				282
				283	Design decision: Initially use approach (1) and treat approach (2) as an optimization to be done if necessary.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	284
				285	### Validation of copying
				286
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	287	#### Validation of copying by review
				288
				289	This is fairly self-explanatory. Review all functions that use shared memory and ensure that they each copy memory. This is the simplest strategy to implement but is less reliable than automated validation.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	290
Gilles Peskine	1f2802c	2023-10-13 21:49:17 +0200	[diff] [blame]	291	#### Validation of copying with memory pools
				292
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	293	Proposed general idea: have tests where the test code calling API functions allocates memory in a certain pool, and code in the library allocates memory in a different pool. Test drivers check that needs-copying arguments are within the library pool, not within the test pool.
				294
Gilles Peskine	1f2802c	2023-10-13 21:49:17 +0200	[diff] [blame]	295	#### Validation of copying by memory poisoning
				296
				297	Proposed general idea: in test code, “poison” the memory area used by input and output parameters that must be copied. Poisoning means something that prevents accessing memory while it is poisoned. This could be via memory protection (allocate with `mmap` then disable access with `mprotect`), or some kind of poisoning for an analyzer such as MSan or Valgrind.
				298
				299	In the library, the code that does the copying temporarily unpoisons the memory by calling a test hook.
				300
				301	```
				302	static void copy_to_user(void copy_buffer, void const input_buffer, size_t length) {
				303	#if defined(MBEDTLS_TEST_HOOKS)
				304	if (mbedtls_psa_core_poison_memory != NULL) {
				305	mbedtls_psa_core_poison_memory(copy_buffer, length, 0);
				306	}
				307	#endif
				308	memcpy(copy_buffer, input_buffer, length);
				309	#if defined(MBEDTLS_TEST_HOOKS)
				310	if (mbedtls_psa_core_poison_memory != NULL) {
				311	mbedtls_psa_core_poison_memory(copy_buffer, length, 1);
				312	}
				313	#endif
				314	}
				315	```
Gilles Peskine	1f2802c	2023-10-13 21:49:17 +0200	[diff] [blame]	316	The reason to poison the memory before calling the library, rather than after the copy-in (and symmetrically for output buffers) is so that the test will fail if we forget to copy, or we copy the wrong thing. This would not be the case if we relied on the library's copy function to do the poisoning: that would only validate that the driver code does not access the memory on the condition that the copy is done as expected.
Gilles Peskine	7bc1bb6	2023-10-13 20:05:25 +0200	[diff] [blame]	317
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	318	Note: Extra work may be needed when allocating buffers to ensure that each shared buffer lies in its own separate page, allowing its permissions to be set independently.
				319
				320	Question: Should we try to build memory poisoning validation on existing Mbed TLS tests, or write new tests for this?
				321
				322	##### Validation with existing tests
				323
				324	It should be possible to integrate memory poisoning validation with existing tests. This has two main advantages:
				325	* All of the tests are written already, potentially saving development time.
				326	* The code coverage of these tests is already much greater than would be achievable writing new tests from scratch.
				327
				328	In an ideal world, we would be able to take a grand, encompassing approach whereby we would simply replace the implementation of `mbedtls_calloc()` and all tests would transparently run with memory poisoning enabled. Unfortunately, there are some significant difficulties with this idea:
				329	* We cannot automatically distinguish which allocated buffers are shared buffers that need memory poisoning enabled.
				330	* Some input buffers to tested functions may be stack allocated so cannot be poisoned automatically.
				331
				332	Instead, consider a more modest strategy. Create a function:
				333	```c
				334	uint8_t mbedtls_test_get_poisoned_copy(uint8_t buffer, size_t len)
				335	```
				336	that creates a poisoned copy of a buffer ready to be passed to the PSA function. Also create:
				337	```c
				338	uint8_t mbedtls_test_copy_free_poisoned_buffer(uint8_t poisoned_buffer, uint8_t *original_buffer, size_t len)
				339	```
				340	which copies the poisoned buffer contents back into the original buffer and frees the poisoned copy.
				341
				342	In each test case, manually wrap any calls to PSA functions in code that substitutes a poisoned buffer. For example, the code:
				343	```c
				344	psa_api_do_some_operation(input, input_len, output, output_len);
				345	```
				346	Would be transformed to:
				347	```c
				348	input_poisoned = mbedtls_test_get_poisoned_copy(input, input_len);
				349	output_poisoned = mbedtls_test_get_poisoned_copy(output, output_len);
				350	psa_api_do_some_operation(input_poisoned, input_len, output_poisoned, output_len);
				351	mbedtls_test_copy_free_poisoned_buffer(input_poisoned, input, input_len);
				352	mbedtls_test_copy_free_poisoned_buffer(output_poisoned, output, output_len);
				353	```
				354	Further interface design or careful use of macros may make this a little less cumbersome than it seems in this example.
				355
				356	The poison copying functions should be written so as to evaluate to no-ops based on the value of a config option. They also need not be added to all tests, only to a 'covering set' of important tests.
				357
				358	##### Validation with new tests
				359
				360	Validation with newly created tests is initially simpler to implement than using the existing tests, since the tests can know about memory poisoning from the start. However, re-implementing testing for most PSA interfaces (even only basic positive testing) is a large undertaking. Furthermore, not much is gained over the previous approach, given that it seems straightforward to wrap PSA function calls in existing tests with poisoning code.
				361
				362	Design decision: Add memory poisoning code to existing tests rather than creating new ones, since this is simpler and produces greater coverage.
				363
				364	#### Discussion
				365
				366	Of all discussed approaches, validation by memory poisoning appears as the best. This is because it:
				367	* Does not require complex linking against different versions of `malloc()` (as is the case with the memory pool approach).
				368	* Allows automated testing (unlike the review approach).
				369
				370	Design decision: Use a memory poisoning approach to validate copying.
				371
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	372	### Shared memory protection requirements
				373
				374	TODO: write document and reference it here.
				375
David Horstmann	05ca3d9	2023-10-19 16:45:37 +0100	[diff] [blame]	376	### Validation of careful access for built-in drivers
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	377
David Horstmann	05ca3d9	2023-10-19 16:45:37 +0100	[diff] [blame]	378	For PSA functions whose inputs and outputs are not copied, it is important that we validate that the builtin drivers are correctly accessing their inputs and outputs so as not to cause a security issue. Specifically, we must check that each memory location in a shared buffer is not accessed more than once by a driver function. In this section we examine various possible methods for performing this validation.
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	379
David Horstmann	05ca3d9	2023-10-19 16:45:37 +0100	[diff] [blame]	380	Note: We are focusing on read-read inconsistencies for now, as most of the cases where we aren't copying are inputs.
Gilles Peskine	8ebeb9c	2023-10-16 18:35:54 +0200	[diff] [blame]	381
David Horstmann	05ca3d9	2023-10-19 16:45:37 +0100	[diff] [blame]	382	#### Review
				383
				384	As with validation of copying, the simplest method of validation we can implement is careful code review. This is the least desirable method of validation for several reasons:
				385	1. It is tedious for the reviewers.
				386	2. Reviewers are prone to make mistakes (especially when performing tedious tasks).
				387	3. It requires engineering time linear in the number of PSA functions to be tested.
				388	4. It cannot assure the quality of third-party drivers, whereas automated tests can be ported to any driver implementation in principle.
				389
				390	If all other approaches turn out to be prohibitively difficult, code review exists as a fallback option. However, it should be understood that this is far from ideal.
				391
				392	#### Tests using `mprotect()`
				393
				394	Checking that a memory location is not accessed more than once may be achieved by using `mprotect()` on a Linux system to cause a segmentation fault whenever a memory access happens. Tests based on this approach are sketched below.
				395
				396	##### Linux mprotect+ptrace
Gilles Peskine	8ebeb9c	2023-10-16 18:35:54 +0200	[diff] [blame]	397
				398	Idea: call `mmap` to allocate memory for arguments and `mprotect` to deny or reenable access. Use `ptrace` from a parent process to react to SIGSEGV from a denied access. On SIGSEGV happening in the faulting region:
				399
				400	1. Use `ptrace` to execute a `mprotect` system call in the child to enable access. TODO: How? `ptrace` can modify registers and memory in the child, which includes changing parameters of a syscall that's about to be executed, but not directly cause the child process to execute a syscall that it wasn't about to execute.
				401	2. Use `ptrace` with `PTRACE_SINGLESTEP` to re-execute the failed load/store instrution.
				402	3. Use `ptrace` to execute a `mprotect` system call in the child to disable access.
				403	4. Use `PTRACE_CONT` to resume the child execution.
				404
				405	Record the addresses that are accessed. Mark the test as failed if the same address is read twice.
				406
David Horstmann	05ca3d9	2023-10-19 16:45:37 +0100	[diff] [blame]	407	##### Debugger + mprotect
Gilles Peskine	8ebeb9c	2023-10-16 18:35:54 +0200	[diff] [blame]	408
				409	Idea: call `mmap` to allocate memory for arguments and `mprotect` to deny or reenable access. Use a debugger to handle SIGSEGV (Gdb: set signal catchpoint). If the segfault was due to accessing the protected region:
				410
				411	1. Execute `mprotect` to allow access.
				412	2. Single-step the load/store instruction.
				413	3. Execute `mprotect` to disable access.
				414	4. Continue execution.
				415
				416	Record the addresses that are accessed. Mark the test as failed if the same address is read twice. This part might be hard to do in the gdb language, so we may want to just log the addresses and then use a separate program to analyze the logs, or do the gdb tasks from Python.
				417
David Horstmann	4e54abf	2023-10-19 17:59:45 +0100	[diff] [blame^]	418	#### Instrumentation (Valgrind)
				419
				420	An alternative approach is to use a dynamic instrumentation tool (the most obvious being Valgrind) to trace memory accesses and check that each of the important memory addresses is accessed no more than once.
				421
				422	Valgrind has no tool specifically that checks the property that we are looking for. However, it is possible to generate a memory trace with Valgrind using the following:
				423
				424	```
				425	valgrind --tool=lackey --trace-mem=yes --log-file=logfile ./myprogram
				426	```
				427	This will execute `myprogram` and dump a record of every memory access to `logfile`, with its address and data width. If `myprogram` is a test that does the following:
				428	1. Set up input and output buffers for a PSA function call.
				429	2. Leak the start and end address of each buffer via `print()`.
				430	3. Write data into the input buffer exactly once.
				431	4. Call the PSA function.
				432	5. Read data from the output buffer exactly once.
				433
				434	Then it should be possible to parse the output from the program and from Valgrind and check that each location was accessed exactly twice: once by the program's setup and once by the PSA function.
				435
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	436	#### Discussion
				437
				438	The best approach for validating the correctness of memory accesses is an open question that requires further investigation and prototyping. The above sections discuss some possibilities.
				439
				440	However, there is one additional consideration that may make this easier. The careful-access approach to memory protection is only planned for hash and MAC algorithms. These lend themselves to a linear access pattern on input data; it may be simpler to test that a linear pattern is followed, rather than a random-access single-access-per-location pattern.
				441
Gilles Peskine	2859267	2023-10-13 20:01:36 +0200	[diff] [blame]	442	## Analysis of argument protection in built-in drivers
Gilles Peskine	f7806ca	2023-10-12 16:00:11 +0200	[diff] [blame]	443
Gilles Peskine	7bc1bb6	2023-10-13 20:05:25 +0200	[diff] [blame]	444	TODO: analyze the built-in implementations of mechanisms for which there is a requirement on drivers. By code inspection, how satisfied are we that they meet the requirement?
Gilles Peskine	6998721	2023-10-13 20:05:32 +0200	[diff] [blame]	445
				446	## Copy bypass
				447
				448	For efficiency, we are likely to want mechanisms to bypass the copy and process buffers directly in builds that are not affected by shared memory considerations.
				449
				450	Expand this section to document any mechanisms that bypass the copy.
				451
				452	Make sure that such mechanisms preserve the guarantees when buffers overlap.
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	453
				454	## Detailed design
				455
David Horstmann	3f7e42a	2023-10-19 15:14:20 +0100	[diff] [blame]	456	### Implementation by module
				457
				458	Module \| Input protection strategy \| Output protection strategy \| Notes
				459	---\|---\|---\|---
				460	Hash and MAC \| Careful access \| Careful access \| Low risk of multiple-access as the input and output are raw unformatted data.
				461	Cipher \| Copying \| Copying \|
				462	AEAD \| Copying (careful access for additional data) \| Copying \|
				463	Key derivation \| Careful access \| Careful access \|
				464	Asymmetric signature \| Careful access \| Copying \| Inputs to signatures are passed to a hash. This will no longer hold once PureEdDSA support is implemented.
				465	Asymmetric encryption \| Copying \| Copying \|
				466	Key agreement \| Copying \| Copying \|
				467	PAKE \| Copying \| Copying \|
				468	Key import / export \| Copying \| Copying \| Keys may be imported and exported in DER format, which is a structured format and therefore susceptible to read-read inconsistencies and potentially write-read inconsistencies.
				469
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	470	### Copying functions
				471
				472	As discussed above, it is simpler to use a single unified API for copying. Therefore, we create the following functions:
				473
				474	* `psa_crypto_copy_input(const uint8_t input, size_t input_length, uint8_t input_copy, size_t input_copy_length)`
				475	* `psa_crypto_copy_output(const uint8_t output_copy, size_t output_copy_length, uint8_t output, size_t output_length)`
				476
				477	These seem to be a repeat of the same function, however it is useful to retain two separate functions for input and output parameters so that we can use different test hooks in each when using memory poisoning for tests.
				478
				479	Given that the majority of functions will be allocating memory on the heap to copy, it may help to build convenience functions that allocate the memory as well. One function allocates and copies the buffers:
				480
				481	* `psa_crypto_alloc_and_copy(const uint8_t input, size_t input_length, uint8_t output, size_t output_length, struct {uint8_t inp, uint8_t out} *buffers)`
				482
				483	This function allocates an input and output buffer in `buffers` and copy the input from the user-supplied input buffer to `buffers->inp`.
				484
				485	An analogous function is needed to copy and free the buffers:
				486
				487	* `psa_crypto_copy_and_free(struct {uint8_t inp, uint8_t out} buffers, const uint8_t input, size_t input_length, const uint8_t output, size_t output_length)`
				488
				489	This function would first copy the `buffers->out` buffer to the user-supplied output buffer and then free `buffers->inp` and `buffers->out`.
				490
				491	Some PSA functions may not use these convenience functions as they may have local optimizations that reduce memory usage. For example, ciphers may be able to use a single intermediate buffer for both input and output.
				492
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	493	### Validation of copying
				494
				495	As discussed above, the best strategy for validation of copies appears to be validation by memory poisoning.
				496
				497	To implement this validation, we need several things:
				498	1. The ability to allocate memory in individual pages.
				499	2. The ability to poison memory pages in the copy functions.
				500	3. Tests that exercise this functionality.
				501
				502	We can implement (1) as a test helper function that allocates full pages of memory so that we can safely set permissions on them:
				503	```c
David Horstmann	0bd87f5	2023-10-19 13:45:21 +0100	[diff] [blame]	504	uint8_t *mbedtls_test_get_buffer_poisoned_page(size_t nmemb, size_t size)
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	505	```
				506	This allocates a buffer of the requested size that is guaranteed to lie entirely within its own memory page. It also calls `mprotect()` so that the page is inaccessible.
				507
David Horstmann	dae0ad4	2023-10-19 14:03:51 +0100	[diff] [blame]	508	We also need a function to reset the permissions and free the memory:
				509	```c
				510	void mbedtls_test_free_buffer_poisoned_page(uint8_t *buffer, size_t len)
				511	```
				512	This calls `mprotect()` to restore read and write permissions to the pages of the buffer and then frees the buffer.
				513
				514	On top of this function we can build the functions for testing mentioned above:
				515	```c
				516	uint8_t mbedtls_test_get_poisoned_copy(uint8_t buffer, size_t len)
				517	uint8_t mbedtls_test_copy_free_poisoned_buffer(uint8_t poisoned_buffer, uint8_t *original_buffer, size_t len)
				518	```
				519
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	520	Requirement (2) can be implemented by creating a function as alluded to above:
				521	```c
David Horstmann	0bd87f5	2023-10-19 13:45:21 +0100	[diff] [blame]	522	void mbedtls_psa_core_poison_memory(uint8_t *buffer, size_t len, int poisoned)
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	523	```
				524	This function should call `mprotect()` on the buffer to prevent it from being accessed (when `poisoned == 1`) or to allow it to be accessed (when `poisoned == 0`). Note that `mprotect()` requires a page-aligned address, so the function may have to do some preliminary work to find the correct page-aligned address that contains `buffer`.
				525
David Horstmann	dae0ad4	2023-10-19 14:03:51 +0100	[diff] [blame]	526	Requirement (3) is implemented by wrapping calls to PSA functions with code that creates poisoned copies of its inputs and outputs as described above.
				527
David Horstmann	23661cc	2023-10-16 19:31:41 +0100	[diff] [blame]	528	### Validation of protection by careful access