Blame - docs/architecture/psa-thread-safety/psa-thread-safety.md - mirror/mbed-tls.git

blob: edb94c56bac7166dfca003d7393af4c2b06a0b21 [file] [log] [blame] [view]

Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	1	# Thread-safety of the PSA subsystem
Gilles Peskine	a42a8de	2021-11-03 12:18:41 +0100	[diff] [blame]	2
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	3	Currently, PSA Crypto API calls in Mbed TLS releases are not thread-safe.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	4
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	5	As of Mbed TLS 3.6, an MVP for making the [PSA Crypto key management API](https://arm-software.github.io/psa-api/crypto/1.1/api/keys/management.html) and [`psa_crypto_init`](https://arm-software.github.io/psa-api/crypto/1.1/api/library/library.html#c.psa_crypto_init) thread-safe has been implemented. Implementations which only ever call PSA functions from a single thread are not affected by this new feature.
Janos Follath	5258689	2023-10-20 14:26:57 +0100	[diff] [blame]	6
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	7	Summary of recent work:
Janos Follath	5258689	2023-10-20 14:26:57 +0100	[diff] [blame]	8
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	9	- Key Store:
				10	- Slot states are described in the [Key slot states](#key-slot-states) section. They guarantee safe concurrent access to slot contents.
				11	- Key slots are protected by a global mutex, as described in [Key store consistency and abstraction function](#key-store-consistency-and-abstraction-function).
				12	- Key destruction strategy abiding by [Key destruction guarantees](#key-destruction-guarantees), with an implementation discussed in [Key destruction implementation](#key-destruction-implementation).
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	13	- `global_data` variables in `psa_crypto.c` and `psa_crypto_slot_management.c` are now protected by mutexes, as described in the [Global data](#global-data) section.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	14	- The testing system has now been made thread-safe. Tests can now spin up multiple threads, see [Thread-safe testing](#thread-safe-testing) for details.
				15	- Some multithreaded testing of the key management API has been added, this is outlined in [Testing-and-analysis](#testing-and-analysis).
				16	- The solution uses the pre-existing `MBEDTLS_THREADING_C` threading abstraction.
				17	- The core makes no additional guarantees for drivers. See [Driver policy](#driver-policy) for details.
Janos Follath	5258689	2023-10-20 14:26:57 +0100	[diff] [blame]	18
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	19	The other functions in the PSA Crypto API are planned to be made thread-safe in future, but currently we are not testing this.
Janos Follath	5258689	2023-10-20 14:26:57 +0100	[diff] [blame]	20
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	21	## Overview of the document
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	22
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	23	* The [Guarantees](#guarantees) section describes the properties that are followed when PSA functions are invoked by multiple threads.
				24	* The [Usage guide](#usage-guide) section gives guidance on initializing, using and freeing PSA when using multiple threads.
				25	* The [Current strategy](#current-strategy) section describes how thread-safety of key management and `global_data` is achieved.
				26	* The [Testing and analysis](#testing-and-analysis) section discusses the state of our testing, as well as how this testing will be extended in future.
				27	* The [Future work](#future-work) section outlines our long-term goals for thread-safety; it also analyses how we might go about achieving these goals.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	28
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	29	## Definitions
Gilles Peskine	41618da	2022-02-16 22:32:12 +0100	[diff] [blame]	30
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	31	Concurrent calls
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	32
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	33	The PSA specification defines concurrent calls as: "In some environments, an application can make calls to the Crypto API in separate threads. In such an environment, concurrent calls are two or more calls to the API whose execution can overlap in time." (See PSA documentation [here](https://arm-software.github.io/psa-api/crypto/1.1/overview/conventions.html#concurrent-calls).)
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	34
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	35	Thread-safety
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	36
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	37	In general, a system is thread-safe if any valid set of concurrent calls is handled as if the effect and return code of every call is equivalent to some sequential ordering. We implement a weaker notion of thread-safety, we only guarantee thread-safety in the circumstances described in the [PSA Concurrent calling conventions](#psa-concurrent-calling-conventions) section.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	38
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	39	## Guarantees
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	40
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	41	### Correctness out of the box
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	42
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	43	Building with `MBEDTLS_PSA_CRYPTO_C` and `MBEDTLS_THREADING_C` gives code which is correct; there are no race-conditions, deadlocks or livelocks when concurrently calling any set of PSA key management functions once `psa_crypto_init` has been called (see the [Initialization](#initialization) section for details on how to correctly initialize the PSA subsystem when using multiple threads).
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	44
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	45	We do not test or support calling other PSA API functions concurrently.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	46
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	47	There is no busy-waiting in our implementation, every API call completes in a finite number of steps regardless of the locking policy of the underlying mutexes.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	48
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	49	When only considering key management functions: Mbed TLS 3.6 abides by the minimum expectation for concurrent calls set by the PSA specification (see [PSA Concurrent calling conventions](#psa-concurrent-calling-conventions)).
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	50
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	51	#### PSA Concurrent calling conventions
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	52
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	53	These are the conventions which are planned to be added to the PSA 1.2 specification, Mbed TLS 3.6 abides by these when only considering [key management functions](https://arm-software.github.io/psa-api/crypto/1.1/api/keys/management.html):
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	54
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	55	> The result of two or more concurrent calls must be consistent with the same set of calls being executed sequentially in some order, provided that the calls obey the following constraints:
				56	>
				57	> * There is no overlap between an output parameter of one call and an input or output parameter of another call. Overlap between input parameters is permitted.
				58	>
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	59	> * A call to `psa_destroy_key()` must not overlap with a concurrent call to any of the following functions:
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	60	> - Any call where the same key identifier is a parameter to the call.
				61	> - Any call in a multi-part operation, where the same key identifier was used as a parameter to a previous step in the multi-part operation.
				62	>
				63	> * Concurrent calls must not use the same operation object.
				64	>
				65	> If any of these constraints are violated, the behaviour is undefined.
				66	>
				67	> The consistency requirement does not apply to errors that arise from resource failures or limitations. For example, errors resulting from resource exhaustion can arise in concurrent execution that do not arise in sequential execution.
				68	>
				69	> As an example of this rule: suppose two calls are executed concurrently which both attempt to create a new key with the same key identifier that is not already in the key store. Then:
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	70	> * If one call returns `PSA_ERROR_ALREADY_EXISTS`, then the other call must succeed.
				71	> * If one of the calls succeeds, then the other must fail: either with `PSA_ERROR_ALREADY_EXISTS` or some other error status.
				72	> * Both calls can fail with error codes that are not `PSA_ERROR_ALREADY_EXISTS`.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	73	>
				74	> If the application concurrently modifies an input parameter while a function call is in progress, the behaviour is undefined.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	75
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	76	### Backwards compatibility
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	77
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	78	Code which was working prior to Mbed TLS 3.6 will still work. Implementations which only ever call PSA functions from a single thread, or which protect all PSA calls using a mutex, are not affected by this new feature. If an application previously worked with a 3.X version, it will still work on version 3.6.
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	79
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	80	### Supported threading implementations
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	81
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	82	Currently, the only threading library with support shipped in the code base is pthread (enabled by `MBEDTLS_THREADING_PTHREAD`). The only concurrency primitives we use are mutexes, see [Condition variables](#condition-variables) for discussion about implementing new primitives in future major releases.
				83
				84	Users can add support to any platform which has mutexes using the Mbed TLS platform abstraction layer (see `include/mbedtls/threading.h` for details).
				85
				86	We intend to ship support for other platforms including Windows in future releases.
				87
				88	### Key destruction guarantees
				89
				90	Much like all other API calls, `psa_destroy_key` does not block indefinitely, and when `psa_destroy_key` returns:
				91
				92	1. The key identifier does not exist. This is a functional requirement for persistent keys: any thread can immediately create a new key with the same identifier.
				93	2. The resources from the key have been freed. This allows threads to create similar keys immediately after destruction, regardless of resources.
				94
				95	When `psa_destroy_key` is called on a key that is in use, guarantee 2 may be violated. This is consistent with the PSA specification requirements, as destruction of a key in use is undefined.
				96
				97	In future versions we aim to enforce stronger requirements for key destruction, see [Long term key destruction requirements](#long-term-key-destruction-requirements) for details.
				98
				99	### Driver policy
				100
				101	The core makes no additional guarantees for drivers. Driver entry points may be called concurrently from multiple threads. Threads can concurrently call entry points using the same key, there is also no protection from destroying a key which is in use.
				102
				103	### Random number generators
				104
				105	The PSA RNG can be accessed both from various PSA functions, and from application code via `mbedtls_psa_get_random`.
				106
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	107	When using the built-in RNG implementations, i.e. when `MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` is disabled, querying the RNG is thread-safe (`mbedtls_psa_random_init` and `mbedtls_psa_random_seed` are only thread-safe when called while holding `mbedtls_threading_psa_rngdata_mutex`. `mbedtls_psa_random_free` is not thread-safe).
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	108
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	109	When `MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` is enabled, it is down to the external implementation to ensure thread-safety, should threading be enabled.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	110
				111	## Usage guide
				112
				113	### Initialization
				114
				115	The PSA subsystem is initialized via a call to [`psa_crypto_init`](https://arm-software.github.io/psa-api/crypto/1.1/api/library/library.html#c.psa_crypto_init). This is a thread-safe function, and multiple calls to `psa_crypto_init` are explicitly allowed. It is valid to have multiple threads each calling `psa_crypto_init` followed by a call to any PSA key management function (if the init succeeds).
				116
				117	### General usage
				118
				119	Once initialized, threads can use any PSA function if there is no overlap between their calls. All threads share the same set of keys, as soon as one thread returns from creating/loading a key via a key management API call the key can be used by any thread. If multiple threads attempt to load the same persistent key, with the same key identifier, only one thread can succeed - the others will return `PSA_ERROR_ALREADY_EXISTS`.
				120
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	121	Applications may need careful handling of resource management errors. As explained in ([PSA Concurrent calling conventions](#psa-concurrent-calling-conventions)), operations in progress can have memory related side effects. It is possible for a lack of resources to cause errors which do not arise in sequential execution. For example, multiple threads attempting to load the same persistent key can lead to some threads returning `PSA_ERROR_INSUFFICIENT_MEMORY` if the key is not currently in the key store - while trying to load a persistent key into the key store a thread temporarily reserves a free key slot.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	122
				123	If a mutex operation fails, which only happens if the mutex implementation fails, the error code `PSA_ERROR_SERVICE_FAILURE` will be returned. If this code is returned, execution of the PSA subsystem must be stopped. All functions which have internal mutex locks and unlocks (except for when the lock/unlock occurs in a function that has no return value) will return with this error code in this situation.
				124
				125	### Freeing
				126
				127	There is no thread-safe way to free all PSA resources. This is because any such operation would need to wait for all other threads to complete their tasks before wiping resources.
				128
				129	`mbedtls_psa_crypto_free` must only be called by a single thread once all threads have completed their operations.
				130
				131	## Current strategy
				132
				133	This section describes how we have implemented thread-safety. There is discussion of: techniques, internal properties for enforcing thread-safe access, how the system stays consistent and our abstraction model.
				134
				135	### Protected resources
				136
				137	#### Global data
				138
				139	We have added a mutex `mbedtls_threading_psa_globaldata_mutex` defined in `include/mbedtls/threading.h`, which is used to make `psa_crypto_init` thread-safe.
				140
				141	There are two `psa_global_data_t` structs, each with a single instance `global_data`:
				142
				143	* The struct in `library/psa_crypto.c` is protected by `mbedtls_threading_psa_globaldata_mutex`. The RNG fields within this struct are not protected by this mutex, and are not always thread-safe (see [Random number generators](#random-number-generators)).
				144	* The struct in `library/psa_crypto_slot_management.c` has two fields: `key_slots` is protected as described in [Key slots](#key-slots), `key_slots_initialized` is protected by the global data mutex.
				145
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	146	#### Mutex usage
				147
				148	A deadlock would occur if a thread attempts to lock a mutex while already holding it. Functions which need to be called while holding the global mutex have documentation to say this.
				149
Ryan Everett	765b75f	2024-03-18 10:20:43 +0000	[diff] [blame]	150	To avoid performance degradation, functions must hold mutexes for as short a time as possible. In particular, they must not start expensive operations (eg. doing cryptography) while holding the mutex.
				151
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	152	#### Key slots
				153
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	154
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	155	Keys are stored internally in a global array of key slots known as the "key store", defined in `library/psa_slot_management.c`.
				156
				157	##### Key slot states
				158
				159	Each key slot has a state variable and a `registered_readers` counter. These two variables dictate whether an operation can access a slot, and in what way the slot can be used.
				160
				161	There are four possible states for a key slot:
				162
				163	* `PSA_SLOT_EMPTY`: no thread is currently accessing the slot, and no information is stored in the slot. Any thread is able to change the slot's state to `PSA_SLOT_FILLING` and begin to load data into the slot.
				164	* `PSA_SLOT_FILLING`: one thread is currently loading or creating material to fill the slot, this thread is responsible for the next state transition. Other threads cannot read the contents of a slot which is in this state.
				165	* `PSA_SLOT_FULL`: the slot contains a key, and any thread is able to use the key after registering as a reader, increasing `registered_readers` by 1.
				166	* `PSA_SLOT_PENDING_DELETION`: the key within the slot has been destroyed or marked for destruction, but at least one thread is still registered as a reader (`registered_readers > 0`). No thread can register to read this slot. The slot must not be wiped until the last reader unregisters. It is during the last unregister that the contents of the slot are wiped, and the slot's state is set to `PSA_SLOT_EMPTY`.
				167
				168	###### Key slot state transition diagram
				169	![](key-slot-state-transitions.png)
				170
Ryan Everett	c408ef4	2024-03-15 17:29:46 +0000	[diff] [blame]	171	In the state transition diagram above, an arrow between two states `q1` and `q2` with label `f` indicates that if the state of a slot is `q1` immediately before `f`'s linearization point, it may be `q2` immediately after `f`'s linearization point. Internal functions have italicized labels. The `PSA_SLOT_PENDING_DELETION -> PSA_SLOT_EMPTY` transition can be done by any function which calls `psa_unregister_read`.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	172
Ryan Everett	c408ef4	2024-03-15 17:29:46 +0000	[diff] [blame]	173	The state transition diagram can be generated in https://app.diagrams.net/ via this [url](https://viewer.diagrams.net/?tags=%7B%7D&highlight=0000ff&edit=_blank&layers=1&nav=1#R3Vxbd5s4EP4t%2B%2BDH5CBxf6zrJJvW7aYn7W7dFx9qZFstBg7gW379CnMxkoUtY%2BGQ%2BiVISCPQjD59mhnSU98vNg%2BRE84%2FBS7yelBxNz110IMQAEsnf9KabVZjmHnFLMJu3mhf8YxfUF6p5LVL7KKYapgEgZfgkK6cBL6PJglV50RRsKabTQOPHjV0Zuig4nnieIe1%2F2E3mWe1FjT39X8jPJsXIwPDzu4snKJx%2Fibx3HGDdaVKveup76MgSLKrxeY98tLJK%2BYl63dfc7d8sAj5iUiHH%2BBlOP338cP6i%2B37%2Ff7oV%2Fjr442aSVk53jJ%2F4R40PCKv7%2BIVuZyll%2FffhsOimsiv3OE0njvxOEKOi6K4uPszYtuzUnbzk2yLSScPTvRLCv31HCfoOXQm6Z01MbF0hGThkRIgl04cZkqf4g1yS1HVScnnaYWiBG0qVfkkPaBggZJoS5rkdzUrV1hhsUpeXlf0n1fNK6ov6pzc4mal5L1SyEWulzN0BABHSeyM%2Be671NpJaeI5cYwn9ERFwdJ30xkaKKREJifafs9v7QqjamGwqbYbbIvSBidlJ3I9qtTvu6SFoketNuJgGU3QabtMnGiGkiPttKwdcqlVfKjbiu50ju6Kugh5ToJX9NrnKTQf4SnA5M1qTUc3GJvI3jvvVV2rrCDTvrUrP4sSq6mM2GyaDsTurK2chAsMENaiBC7WcBg746UfoRmOExTtEKCy2HH9UieaGzo%2Fya5BL2wPz%2FzUmInloIhUpOsXE1h%2Bl99YYNdNZfQjFOMX5%2BdOXmpzYToLu3nR%2Bz19wLXC48uMRYpyc8lHofCbhyDKLVRMm1LZDbzMwAoxgOkSTKcxakfpIjvD3aenr6O3CfOdQ3lbOsrneK1U8BocxetyXygLo2qhZl9ojvJQEOVBt1CetpwDNBYG%2BRObRcuoXvDSU6g%2BdbA3%2Fo224wkB9QQH%2FlvD9WJhdRHXc8mQEsr2bw%2FkDzf2%2B8fh8PHzQ6exWjVeGas1kb3xrFPTX3%2FcsenVlaSLKOnp7vNgZ%2B6CehrcDe%2B%2BPv7z%2BW3qqHOkx2yL84ifUZudhZtznsKJdYrzwE5xHqiQzc%2FSoAnI2VTTDXoX1DXj1gS6CS1TJwWVES9KiIDBMCvtuozIEkEMLkciZAVFKzSeRgjtuFLsBQmfJwkCDXeYmExAwuViXBw6OWpnOVuBC12kbKUY7VosDfD4hnyYvNWbHA6zXq96POyWEzCFSkUpoNIgqEaDGkhdewVWqpZiNgNLTWHAkti6yphk237B5oA5xT6O5wLHyjcGXOVSvRi5bogVabZJQ5cqx0ItrtQrABmPkzO6nCzJRuqWFOx6YQ1xN1lzRBMNa6idQjStiNmWMdyGHi%2FdYASxB4sawCI24GwrzfLlWf%2FANo2NpqIcfy7ItAcn2mvWMfnkInvipotn0NcmAD9MQu8FLR%2Fxs%2F7uaSN2nq1hpyejMpew0pqwTzNKKjYkMZKx47tjL5j8Lvn2%2BPtFA6VyJ14Q7wj8Wb3CJbHaaq%2BDwf8wel7iuIxdDqgWvZou5Oe5ZJr0Q%2F1ae5zKS6mQQtarG5SgT6PCztuN5GiCG1u3IjnQhJSV6HrDjQ3UOdauxMRV3gmRi1UuipMo2F6OcXLwtLMQVy5jCS4IzTLoM2CxDC403xuaTdktQByXicj32nKJ%2Bym0Oh8X28e3bnltVYbX6k1D1arJOBsEibssi6t3NDR1w3YBeI4uLinUymYc9ZJwBxRujjY9CNzZuUqSjLAnlIarFj2hon4DvdPwY4Cm8MOkyhjtJUByra547orZHXCpzgKKtPSXFFCKrpKJDO3mbCP9ha%2FXK2VWn4aGJjDUHE50QTjp2Gmtxkt3NpxAhs0Y7WXe8c0O1tKZhr42eZ61NQ4PqdPbdV8dX%2FYywsvlF05yIRGorwSJPKrNaFJ6iKaxX6oryMTEGxoHSFTNvIWWpWtQszUbqpbKyqVCy1AIts6NnpC3qY4CbPohTEW9NaFS%2FtTjbwTso8IAOEeY3vzJ2gnKcLP23%2FKnMcdBQQJgKrpFc0hJFLKNbJwnvNwMp3BsWbMvqx%2F3Hye%2BH3I%2FjJHDGanEmkZf47XGGEWzFruViqMyOTI667YSxmX9hCNNHmPk2pwQYUxxBi%2FCIEsRPMtPP0M%2BipykgYM%2FCM%2BPJaT00kURXu3yfsbBMgmX1DOfn1X9GlB5FB0kIKWuAe65%2BGLvHSX0almMsLMJDCeyCeScfv6wT%2FdEAyKimUz7YFkRebtSbpNNu7IPcs6F8zEZQaIh4L0gqUvww0j7vh7F%2FW9ujL7iR%2FfmYWy1QF0KOy2JxzmWSicnvP4nF93KumPJi9n4UMmQFxOKWea550bW3W9qcrPiuCZdz4yaJ4x1gVwcXb8SyAWwDTlsQmUijIxPogmYkeL%2B3%2BJkzff%2FXEi9%2Bx8%3D).
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	174	##### Key slot access primitives
				175
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	176	The state of a key slot is updated via the internal function `psa_key_slot_state_transition`. To change the state of `slot` from `expected_state` to `new_state`, when `new_state` is not `PSA_SLOT_EMPTY`, one must call `psa_key_slot_state_transition(slot, expected_state, new_state)`; if the state was not `expected_state` then `PSA_ERROR_CORRUPTION_DETECTED` is returned. The sole reason for having an expected state parameter here is to help guarantee that our functions work as expected, this error code cannot occur without an internal coding error.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	177
				178	Changing a slot's state to `PSA_SLOT_EMPTY` is done via `psa_wipe_key_slot`, this function wipes the entirety of the key slot.
				179
				180	The reader count of a slot is incremented via `psa_register_read`, and decremented via `psa_unregister_read`. Library functions register to read a slot via the `psa_get_and_lock_key_slot_X` functions, read from the slot, then call `psa_unregister_read` to make known that they have finished reading the slot's contents.
				181
				182	##### Key store consistency and abstraction function
				183
				184	The key store is protected by a single global mutex `mbedtls_threading_key_slot_mutex`.
				185
				186	We maintain the consistency of the key store by ensuring that all reads and writes to `slot->state` and `slot->registered_readers` are performed under `mbedtls_threading_key_slot_mutex`. All the access primitives described above must be called while the mutex is held; there is a convenience function `psa_unregister_read_under_mutex` which wraps a call to `psa_unregister_read` in a mutex lock/unlock pair.
				187
				188	A thread can only traverse the key store while holding `mbedtls_threading_key_slot_mutex`, the set of keys within the key store which the thread holding the mutex can access is equivalent to the set:
				189
				190	{mbedtls_svc_key_id_t k : (\exists slot := &global_data.key_slots[i]) [
				191	(slot->state == PSA_SLOT_FULL) &&
				192	(slot->attr.id == k)]}
				193
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	194	The union of this set and the set of persistent keys not currently loaded into slots is our abstraction function for the key store, any key not in this union does not currently exist as far as the code is concerned (even if the key is in a slot which has a `PSA_SLOT_FILLING` or `PSA_SLOT_PENDING_DELETION` state). Attempting to start using any key which is not a member of the union will result in a `PSA_ERROR_INVALID_HANDLE` error code.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	195
				196	##### Locking and unlocking the mutex
				197
				198	If a lock or unlock operation fails and this is the first failure within a function, the function will return `PSA_ERROR_SERVICE_FAILURE`. If a lock or unlock operation fails after a different failure has been identified, the status code is not overwritten.
				199
				200	We have defined a set of macros in `library/psa_crypto_core.h` to capture the common pattern of (un)locking the mutex and returning or jumping to an exit label upon failure.
				201
				202	##### Key creation and loading
				203
				204	To load a new key into a slot, the following internal utility functions are used:
				205
				206	* `psa_reserve_free_key_slot` - This function, which must be called under `mbedtls_threading_key_slot_mutex`, iterates through the key store to find a slot whose state is `PSA_SLOT_EMPTY`. If found, it reserves the slot by setting its state to `PSA_SLOT_FILLING`. If not found, it will see if there are any persistent keys loaded which do not have any readers, if there are it will kick one such key out of the key store.
				207	* `psa_start_key_creation` - This function wraps around `psa_reserve_free_key_slot`, if a slot has been found then the slot id is set. This second step is not done under the mutex, at this point the calling thread has exclusive access to the slot.
				208	* `psa_finish_key_creation` - After the contents of the key have been loaded (again this loading is not done under the mutex), the thread calls `psa_finish_key_creation`. This function takes the mutex, checks that the key does not exist in the key store (this check cannot be done before this stage), sets the slot's state to `PSA_SLOT_FULL` and releases the mutex. Upon success, any thread is immediately able to use the new key.
				209	* `psa_fail_key_creation` - If there is a failure at any point in the key creation stage, this clean-up function takes the mutex, wipes the slot, and releases the mutex. Immediately after this unlock, any thread can start to use the slot for another key load.
				210
				211	##### Re-loading persistent keys
				212
				213	As described above, persistent keys can be kicked out of the key slot array provided they are not currently being used (`registered_readers == 0`). When attempting to use a persistent key that has been kicked out of a slot, the call to `psa_get_and_lock_key_slot` will see that the key is not in a slot, call `psa_reserve_free_key_slot` and load the key back into the reserved slot. This entire sequence is done during a single mutex lock, which is necessary for thread-safety (see documentation of `psa_get_and_lock_key_slot`).
				214
				215	If `psa_reserve_free_key_slot` cannot find a suitable slot, the key cannot be loaded back in. This will lead to a `PSA_ERROR_INSUFFICIENT_MEMORY` error.
				216
				217	##### Using existing keys
				218
				219	One-shot operations follow a standard pattern when using an existing key:
				220
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	221	* They call one of the `psa_get_and_lock_key_slot_X` functions, which then finds the key and registers the thread as a reader.
				222	* They operate on the key slot, usually copying the key into a separate buffer to be used by the operation. This step is not performed under the key slot mutex.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	223	* Once finished, they call `psa_unregister_read_under_mutex`.
				224
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	225	Multi-part and restartable operations each have a "setup" function where the key is passed in, these functions follow the above pattern. The key is copied into the `operation` object, and the thread unregisters from reading the key (the operations do not access the key slots again). The copy of the key will not be destroyed during a call to `psa_destroy_key`, the thread running the operation is responsible for deleting its copy in the clean-up. This may need to change to enforce the long term key requirements ([Long term key destruction requirements](#long-term-key-destruction-requirements)).
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	226
				227	##### Key destruction implementation
				228
				229	The locking strategy here is explained in `library/psa_crypto.c`. The destroying thread (the thread calling `psa_destroy_key`) does not always wipe the key slot. The destroying thread registers to read the key, sets the slot's state to `PSA_SLOT_PENDING_DELETION`, wipes the slot from memory if the key is persistent, and then unregisters from reading the slot.
				230
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	231	`psa_unregister_read` internally calls `psa_wipe_key_slot` if and only if the slot's state is `PSA_SLOT_PENDING_DELETION` and the slot's registered reader counter is equal to 1. This implements a "last one out closes the door" approach. The final thread to unregister from reading a destroyed key will automatically wipe the contents of the slot; no readers remain to reference the slot post deletion, so there cannot be corruption.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	232
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	233	### linearizability of the system
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	234
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	235	To satisfy the requirements in [Correctness out of the box](#correctness-out-of-the-box), we require our functions to be "linearizable" (under certain constraints). This means that any (constraint satisfying) set of concurrent calls are performed as if they were executed in some sequential order.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	236
				237	The standard way of reasoning that this is the case is to identify a "linearization point" for each call, this is a single execution step where the function takes effect (this is usually a step in which the effects of the call become visible to other threads). If every call has a linearization point, the set of calls is equivalent to sequentially performing the calls in order of when their linearization point occurred.
				238
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	239	We only require linearizability to hold in the case where a resource-management error is not returned. In a set of concurrent calls, it is permitted for a call c to fail with a `PSA_ERROR_INSUFFICIENT_MEMORY` return code even if there does not exist a sequential ordering of the calls in which c returns this error. Even if such an error occurs, all calls are still required to be functionally correct.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	240
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	241	To help justify that our system is linearizable, here are the linearization points/planned linearization points of each PSA call :
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	242
				243	* Key creation functions (including `psa_copy_key`) - The linearization point for a successful call is the mutex unlock within `psa_finish_key_creation`; it is at this point that the key becomes visible to other threads. The linearization point for a failed call is the closest mutex unlock after the failure is first identified.
				244	* `psa_destroy_key` - The linearization point for a successful destruction is the mutex unlock, the slot is now in the state `PSA_SLOT_PENDING_DELETION` meaning that the key has been destroyed. For failures, the linearization point is the same.
				245	* `psa_purge_key`, `psa_close_key` - The linearization point is the mutex unlock after wiping the slot for a success, or unregistering for a failure.
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	246	* One shot operations - The linearization point is the final unlock of the mutex within `psa_get_and_lock_key_slot`, as that is the point in which it is decided whether or not the key exists.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	247	* Multi-part operations - The linearization point of the key input function is the final unlock of the mutex within `psa_get_and_lock_key_slot`. All other steps have no non resource-related side effects (except for key derivation, covered in the key creation functions).
				248
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	249	Please note that one shot operations and multi-part operations are not yet considered thread-safe, as we have not yet tested whether they rely on unprotected global resources. The key slot access in these operations is thread-safe.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	250
				251	## Testing and analysis
				252
				253	### Thread-safe testing
				254
				255	It is now possible for individual tests to spin up multiple threads. This work has made the global variables used in tests thread-safe. If multiple threads fail a test assert, the first failure will be reported with correct line numbers.
				256
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	257	Although the `step` feature used in some tests is thread-safe, it may produce unexpected results for multi-threaded tests. `mbedtls_test_set_step` or `mbedtls_test_increment_step` calls within threads can happen in any order, thus may not produce the desired result when precise ordering is required.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	258
				259	### Current state of testing
				260
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	261	Our testing is a work in progress. It is not feasible to run our traditional, single-threaded, tests in such a way that tests concurrency. We need to write new test suites for concurrency testing.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	262
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	263	Our tests currently only run on pthread, we hope to expand this in the future (our API already allows this).
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	264
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	265	We run tests using [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) to detect data races. We test the key store, and test that our key slot state system is enforced. We also test the thread-safety of `psa_crypto_init`.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	266
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	267	Currently, not every API call is tested, we also cannot feasibly test every combination of concurrent API calls. API calls can in general be split into a few categories, each category calling the same internal key management functions in the same order - it is the internal functions that are in charge of locking mutexes and interacting with the key store; we test the thread-safety of these functions.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	268
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	269	Since we do not run every cryptographic operation concurrently, we do not test that operations are free of unexpected global variables.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	270
				271	### Expanding testing
				272
				273	Through future work on testing, it would be good to:
				274
				275	* For every API call, have a test which runs multiple copies of the call simultaneously.
				276	* After implementing other threading platforms, expand the tests to these platforms.
				277	* Have increased testing for kicking persistent keys out of slots.
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	278	* Explicitly test that all global variables are protected, for this we would need to cover every operation in a concurrent scenario while running ThreadSanitizer.
				279	* Run tests on more threading implementations, once these implementations are supported.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	280
				281	### Performance
				282
				283	Key loading does somewhat run in parallel, deriving the key and copying it key into the slot is not done under any mutex.
				284
				285	Key destruction is entirely sequential, this is required for persistent keys to stop issues with re-loading keys which cannot otherwise be avoided without changing our approach to thread-safety.
				286
				287
				288	## Future work
				289
				290	### Long term requirements
				291
				292	As explained previously, we eventually aim to make the entirety of the PSA API thread-safe. This will build on the work that we have already completed. This requires a full suite of testing, see [Expanding testing](#expanding-testing) for details.
				293
				294	### Long term performance requirements
				295
Ryan Everett	f266b51	2024-03-15 17:30:31 +0000	[diff] [blame]	296	Our plan for cryptographic operations is that they are not performed under any global mutex. One-shot operations and multi-part operations will each only hold the global mutex for finding the relevant key in the key slot, and unregistering as a reader after the operation, using their own operation-specific mutexes to guard any shared data that they use.
				297
				298	We aim to eventually replace some/all of the mutexes with RWLocks, if possible.
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	299
				300	### Long term key destruction requirements
				301
				302	The [PSA Crypto Key destruction specification](https://arm-software.github.io/psa-api/crypto/1.1/api/keys/management.html#key-destruction) mandates that implementations make a best effort to ensure that the key material cannot be recovered. In the long term, it would be good to guarantee that `psa_destroy_key` wipes all copies of the key material.
				303
				304	Here are our long term key destruction goals:
				305
				306	`psa_destroy_key` does not block indefinitely, and when `psa_destroy_key` returns:
				307
				308	1. The key identifier does not exist. This is a functional requirement for persistent keys: any thread can immediately create a new key with the same identifier.
				309	2. The resources from the key have been freed. This allows threads to create similar keys immediately after destruction, regardless of resources.
				310	4. No copy of the key material exists. Rationale: this is a security requirement. We do not have this requirement yet, but we need to document this as a security weakness, and we would like to satisfy this security requirement in the future.
				311
				312	#### Condition variables
				313
				314	It would be ideal to add these to a future major version; we cannot add these as requirements to the default `MBEDTLS_THREADING_C` for backwards compatibility reasons.
				315
				316	Condition variables would enable us to fulfil the final requirement in [Long term key destruction requirements](#long-term-key-destruction-requirements). Destruction would then work as follows:
				317
				318	* When a thread calls `psa_destroy_key`, they continue as normal until the `psa_unregister_read` call.
				319	* Instead of calling `psa_unregister_read`, the thread waits until the condition `slot->registered_readers == 1` is true (the destroying thread is the final reader).
				320	* At this point, the destroying thread directly calls `psa_wipe_key_slot`.
				321
				322	A few changes are needed for this to follow our destruction requirements:
				323
				324	* Multi-part operations will need to remain registered as readers of their key slot until their copy of the key is destroyed, i.e. at the end of the finish/abort call.
				325	* The functionality where `psa_unregister_read` can wipe the key slot will need to be removed, slot wiping is now only done by the destroying/wiping thread.
				326
				327	### Protecting operation contexts
				328
				329	Currently, we rely on the crypto service to ensure that the same operation is not invoked concurrently. This abides by the PSA Crypto API Specification ([PSA Concurrent calling conventions](#psa-concurrent-calling-conventions)).
				330
				331	Concurrent access to the same operation object can compromise the crypto service. For example, if the operation context has a pointer (depending on the compiler and the platform, the pointer assignment may or may not be atomic). This violates the functional correctness requirement of the crypto service.
				332
				333	If, in future, we want to protect against this within the library then operations will require a status field protected by a global mutex. On entry, API calls would check the state and return an error if the state is ACTIVE. If the state is INACTIVE, then the call will set the state to ACTIVE, do the operation section and then restore the state to INACTIVE before returning.
				334
				335	### Future driver work
				336
				337	A future policy we may wish to enforce for drivers is:
Gilles Peskine	41d0334	2022-02-14 23:55:59 +0100	[diff] [blame]	338
				339	* By default, each driver only has at most one entry point active at any given time. In other words, each driver has its own exclusive lock.
				340	* Drivers have an optional `"thread_safe"` boolean property. If true, it allows concurrent calls to this driver.
				341	* Even with a thread-safe driver, the core never starts the destruction of a key while there are operations in progress on it, and never performs concurrent calls on the same multipart operation.
				342
Janos Follath	811a954	2023-10-17 11:08:12 +0100	[diff] [blame]	343	In the non-thread-safe case we have these natural assumptions/requirements:
Janos Follath	811a954	2023-10-17 11:08:12 +0100	[diff] [blame]	344
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	345	1. Drivers don't call the core for any operation for which they provide an entry point.
				346	2. The core doesn't hold the driver mutex between calls to entry points.
				347
				348	With these, the only way of a deadlock is when there are several drivers with circular dependencies. That is, Driver A makes a call that is dispatched to Driver B; upon executing this call Driver B makes a call that is dispatched to Driver A. For example Driver A does CCM, which calls driver B to do CBC-MAC, which in turn calls Driver A to perform AES.
Janos Follath	811a954	2023-10-17 11:08:12 +0100	[diff] [blame]	349
				350	Potential ways for resolving this:
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	351
				352	1. Non-thread-safe drivers must not call the core.
				353	2. Provide a new public API that drivers can safely call.
				354	3. Make the dispatch layer public for drivers to call.
Janos Follath	19192a5	2023-10-20 13:05:55 +0100	[diff] [blame]	355	4. There is a whitelist of core APIs that drivers can call. Drivers providing entry points to these must not make a call to the core when handling these calls. (Drivers are still allowed to call any core API that can't have a driver entry point.)
Janos Follath	811a954	2023-10-17 11:08:12 +0100	[diff] [blame]	356
Janos Follath	23f7e41	2023-10-23 10:11:18 +0100	[diff] [blame]	357	The first is too restrictive, the second and the third would require making it a stable API, and would likely increase the code size for a relatively rare feature. We are choosing the fourth as that is the most viable option.
Janos Follath	19192a5	2023-10-20 13:05:55 +0100	[diff] [blame]	358
				359	Thread-safe drivers:
				360
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	361	A driver would be non-thread-safe if the `thread-safe` property is set to true.
Janos Follath	19192a5	2023-10-20 13:05:55 +0100	[diff] [blame]	362
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	363	To make re-entrancy in non-thread-safe drivers work, thread-safe drivers must not make a call to the core when handling a call that is on the non-thread-safe driver core API whitelist.
Janos Follath	19192a5	2023-10-20 13:05:55 +0100	[diff] [blame]	364
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	365	Thread-safe drivers have fewer guarantees from the core and need to implement more complex logic. We can reasonably expect them to be more flexible in terms of re-entrancy as well. At this point it is hard to see what further guarantees would be useful and feasible. Therefore, we don't provide any further guarantees for now.
Janos Follath	19192a5	2023-10-20 13:05:55 +0100	[diff] [blame]	366
Ryan Everett	d4d6a7a	2024-03-14 13:21:34 +0000	[diff] [blame]	367	Thread-safe drivers must not make any assumption about the operation of the core beyond what is discussed here.