Blame - docs/architecture/testing/invasive-testing.md - mirror/mbed-tls

blob: a1488a31b4381c109404926127833aa3623b3c6a [file] [log] [blame] [view]

Gilles Peskine	dff10c7	2020-03-24 22:50:26 +0100	[diff] [blame]	1	# Mbed TLS invasive testing strategy
Gilles Peskine	ab4b9b4	2019-09-09 18:23:10 +0200	[diff] [blame]	2
				3	## Introduction
				4
Gilles Peskine	dff10c7	2020-03-24 22:50:26 +0100	[diff] [blame]	5	In Mbed TLS, we use black-box testing as much as possible: test the documented behavior of the product, in a realistic environment. However this is not always sufficient.
Gilles Peskine	ab4b9b4	2019-09-09 18:23:10 +0200	[diff] [blame]	6
				7	The goal of this document is to identify areas where black-box testing is insufficient and to propose solutions.
				8
				9	This is a test strategy document, not a test plan. A description of exactly what is tested is out of scope.
				10
Gilles Peskine	2182585	2020-05-11 23:03:33 +0200	[diff] [blame]	11	This document is structured as follows:
				12
				13	* [“Rules”](#rules) gives general rules and is written for brevity.
				14	* [“Requirements”](#requirements) explores the reasons why invasive testing is needed and how it should be done.
				15	* [“Possible approaches”](#possible-approaches) discusses some general methods for non-black-box testing.
				16	* [“Solutions”](#solutions) explains how we currently solve, or intend to solve, specific problems.
				17
Gilles Peskine	dff10c7	2020-03-24 22:50:26 +0100	[diff] [blame]	18	### TLS
				19
				20	This document currently focuses on data structure manipulation and storage, which is what the crypto/keystore and X.509 parts of the library are about. More work is needed to fully take TLS into account.
				21
Gilles Peskine	ab4b9b4	2019-09-09 18:23:10 +0200	[diff] [blame]	22	## Rules
				23
				24	Always follow these rules unless you have a good reason not to. If you deviate, document the rationale somewhere.
				25
				26	See the section [“Possible approaches”](#possible-approaches) for a rationale.
				27
				28	### Interface design for testing
				29
				30	Do not add test-specific interfaces if there's a practical way of doing it another way. All public interfaces should be useful in at least some configurations. Features with a significant impact on the code size or attack surface should have a compile-time guard.
				31
				32	### Reliance on internal details
				33
Gilles Peskine	fa51820	2019-09-11 13:25:55 +0200	[diff] [blame]	34	In unit tests and in test programs, it's ok to include header files from `library/`. Do not define non-public interfaces in public headers (`include/mbedtls` has `*_internal.h` headers for legacy reasons, but this approach is deprecated). In contrast, sample programs must not include header files from `library/`.
				35
				36	Sometimes it makes sense to have unit tests on functions that aren't part of the public API. Declare such functions in `library/*.h` and include the corresponding header in the test code. If the function should be `static` for optimization but can't be `static` for testing, declare it as `MBEDTLS_STATIC_TESTABLE`, and make the tests that use it depend on `MBEDTLS_TEST_HOOKS` (see [“rules for compile-time options”](#rules-for-compile-time-options)).
Gilles Peskine	ab4b9b4	2019-09-09 18:23:10 +0200	[diff] [blame]	37
				38	If test code or test data depends on internal details of the library and not just on its documented behavior, add a comment in the code that explains the dependency. For example:
				39
				40	> ```
				41	> /* This test file is specific to the ITS implementation in PSA Crypto
				42	> * on top of stdio. It expects to know what the stdio name of a file is
				43	> * based on its keystore name.
				44	> */
				45	> ```
				46
				47	> ```
				48	> # This test assumes that PSA_MAX_KEY_BITS (currently 65536-8 bits = 8191 bytes
				49	> # and not expected to be raised any time soon) is less than the maximum
				50	> # output from HKDF-SHA512 (255*64 = 16320 bytes).
				51	> ```
				52
				53	### Rules for compile-time options
				54
				55	If the most practical way to test something is to add code to the product that is only useful for testing, do so, but obey the following rules. For more information, see the [rationale](#guidelines-for-compile-time-options).
				56
				57	* Only use test-specific code when necessary. Anything that can be tested through the documented API must be tested through the documented API.
				58	* Test-specific code must be guarded by `#if defined(MBEDTLS_TEST_HOOKS)`. Do not create fine-grained guards for test-specific code.
				59	* Do not use `MBEDTLS_TEST_HOOKS` for security checks or assertions. Security checks belong in the product.
				60	* Merely defining `MBEDTLS_TEST_HOOKS` must not change the behavior. It may define extra functions. It may add fields to structures, but if so, make it very clear that these fields have no impact on non-test-specific fields.
				61	* Where tests must be able to change the behavior, do it by function substitution. See [“rules for function substitution”](#rules-for-function-substitution) for more details.
				62
				63	#### Rules for function substitution
				64
				65	The code calls a function `mbedtls_foo()`. Usually this a macro defined to be a system function (like `mbedtls_calloc` or `mbedtls_fopen`), which we replace to mock or wrap it. This is useful to simulate I/O failure, for example.
				66
				67	Sometimes the substitutable function is a `static inline` function that does nothing (not a macro, to avoid accidentally skipping side effects in its parameters), to provide a hook for test code; such functions should have a name that starts with the prefix `mbedtls_test_hook_`. In such cases, the function should generally not modify its parameters, so any pointer argument should be const. The function should return void.
				68
Gilles Peskine	fa51820	2019-09-11 13:25:55 +0200	[diff] [blame]	69	With `MBEDTLS_TEST_HOOKS` set, `mbedtls_foo` is a global variable of function pointer type. This global variable is initialized to the system function, or to a function that does nothing. The global variable is defined in a header in the `library` directory such as `psa_crypto_invasive.h`.
Gilles Peskine	ab4b9b4	2019-09-09 18:23:10 +0200	[diff] [blame]	70
				71	In test code that needs to modify the internal behavior:
				72
				73	* The test function (or the whole test file) must depend on `MBEDTLS_TEST_HOOKS`.
				74	* At the beginning of the function, set the global function pointers to the desired value.
				75	* In the function's cleanup code, restore the global function pointers to their default value.
				76
				77	## Requirements
				78
				79	### General goals
				80
				81	We need to balance the following goals, which are sometimes contradictory.
				82
				83	* Coverage: we need to test behaviors which are not easy to trigger by using the API or which cannot be triggered deterministically, for example I/O failures.
				84	* Correctness: we want to test the actual product, not a modified version, since conclusions drawn from a test of a modified product may not apply to the real product.
				85	* Effacement: the product should not include features that are solely present for test purposes, since these increase the attack surface and the code size.
				86	* Portability: tests should work on every platform. Skipping tests on certain platforms may hide errors that are only apparent on such platforms.
				87	* Maintainability: tests should only enforce the documented behavior of the product, to avoid extra work when the product's internal or implementation-specific behavior changes. We should also not give the impression that whatever the tests check is guaranteed behavior of the product which cannot change in future versions.
				88
				89	Where those goals conflict, we should at least mitigate the goals that cannot be fulfilled, and document the architectural choices and their rationale.
				90
				91	### Problem areas
				92
				93	#### Allocation
				94
				95	Resource allocation can fail, but rarely does so in a typical test environment. How does the product cope if some allocations fail?
				96
				97	Resources include:
				98
				99	* Memory.
				100	* Files in storage (PSA API only — in the Mbed TLS API, black-box unit tests are sufficient).
				101	* Key handles (PSA API only).
				102	* Key slots in a secure element (PSA SE HAL).
				103	* Communication handles (PSA crypto service only).
				104
				105	#### Storage
				106
				107	Storage can fail, either due to hardware errors or to active attacks on trusted storage. How does the code cope if some storage accesses fail?
				108
				109	We also need to test resilience: if the system is reset during an operation, does it restart in a correct state?
				110
				111	#### Cleanup
				112
				113	When code should clean up resources, how do we know that they have truly been cleaned up?
				114
				115	* Zeroization of confidential data after use.
				116	* Freeing memory.
				117	* Closing key handles.
				118	* Freeing key slots in a secure element.
				119	* Deleting files in storage (PSA API only).
				120
				121	#### Internal data
				122
				123	Sometimes it is useful to peek or poke internal data.
				124
				125	* Check consistency of internal data (e.g. output of key generation).
				126	* Check the format of files (which matters so that the product can still read old files after an upgrade).
				127	* Inject faults and test corruption checks inside the product.
				128
				129	## Possible approaches
				130
				131	Key to requirement tables:
				132
				133	* ++ requirement is fully met
				134	* \+ requirement is mostly met
				135	* ~ requirement is partially met but there are limitations
				136	* ! requirement is somewhat problematic
				137	* !! requirement is very problematic
				138
				139	### Fine-grained public interfaces
				140
				141	We can include all the features we want to test in the public interface. Then the tests can be truly black-box. The limitation of this approach is that this requires adding a lot of interfaces that are not useful in production. These interfaces have costs: they increase the code size, the attack surface, and the testing burden (exponentially, because we need to test all these interfaces in combination).
				142
				143	As a rule, we do not add public interfaces solely for testing purposes. We only add public interfaces if they are also useful in production, at least sometimes. For example, the main purpose of `mbedtls_psa_crypto_free` is to clean up all resources in tests, but this is also useful in production in some applications that only want to use PSA Crypto during part of their lifetime.
				144
				145	Mbed TLS traditionally has very fine-grained public interfaces, with many platform functions that can be substituted (`MBEDTLS_PLATFORM_xxx` macros). PSA Crypto has more opacity and less platform substitution macros.
				146
				147	\| Requirement \| Analysis \|
				148	\| ----------- \| -------- \|
				149	\| Coverage \| ~ Many useful tests are not reasonably achievable \|
				150	\| Correctness \| ++ Ideal \|
				151	\| Effacement \| !! Requires adding many otherwise-useless interfaces \|
				152	\| Portability \| ++ Ideal; the additional interfaces may be useful for portability beyond testing \|
				153	\| Maintainability \| !! Combinatorial explosion on the testing burden \|
				154	\| \| ! Public interfaces must remain for backward compatibility even if the test architecture changes \|
				155
				156	### Fine-grained undocumented interfaces
				157
				158	We can include all the features we want to test in undocumented interfaces. Undocumented interfaces are described in public headers for the sake of the C compiler, but are described as “do not use” in comments (or not described at all) and are not included in Doxygen-rendered documentation. This mitigates some of the downsides of [fine-grained public interfaces](#fine-grained-public-interfaces), but not all. In particular, the extra interfaces do increase the code size, the attack surface and the test surface.
				159
				160	Mbed TLS traditionally has a few internal interfaces, mostly intended for cross-module abstraction leakage rather than for testing. For the PSA API, we favor [internal interfaces](#internal-interfaces).
				161
				162	\| Requirement \| Analysis \|
				163	\| ----------- \| -------- \|
				164	\| Coverage \| ~ Many useful tests are not reasonably achievable \|
				165	\| Correctness \| ++ Ideal \|
				166	\| Effacement \| !! Requires adding many otherwise-useless interfaces \|
				167	\| Portability \| ++ Ideal; the additional interfaces may be useful for portability beyond testing \|
				168	\| Maintainability \| ! Combinatorial explosion on the testing burden \|
				169
				170	### Internal interfaces
				171
				172	We can write tests that call internal functions that are not exposed in the public interfaces. This is nice when it works, because it lets us test the unchanged product without compromising the design of the public interface.
				173
				174	A limitation is that these interfaces must exist in the first place. If they don't, this has mostly the same downside as public interfaces: the extra interfaces increase the code size and the attack surface for no direct benefit to the product.
				175
				176	Another limitation is that internal interfaces need to be used correctly. We may accidentally rely on internal details in the tests that are not necessarily always true (for example that are platform-specific). We may accidentally use these internal interfaces in ways that don't correspond to the actual product.
				177
				178	This approach is mostly portable since it only relies on C interfaces. A limitation is that the test-only interfaces must not be hidden at link time (but link-time hiding is not something we currently do). Another limitation is that this approach does not work for users who patch the library by replacing some modules; this is a secondary concern since we do not officially offer this as a feature.
				179
				180	\| Requirement \| Analysis \|
				181	\| ----------- \| -------- \|
				182	\| Coverage \| ~ Many useful tests require additional internal interfaces \|
				183	\| Correctness \| + Does not require a product change \|
				184	\| \| ~ The tests may call internal functions in a way that does not reflect actual usage inside the product \|
				185	\| Effacement \| ++ Fine as long as the internal interfaces aren't added solely for test purposes \|
				186	\| Portability \| + Fine as long as we control how the tests are linked \|
				187	\| \| ~ Doesn't work if the users rewrite an internal module \|
				188	\| Maintainability \| + Tests interfaces that are documented; dependencies in the tests are easily noticed when changing these interfaces \|
				189
				190	### Static analysis
				191
				192	If we guarantee certain properties through static analysis, we don't need to test them. This puts some constraints on the properties:
				193
				194	* We need to have confidence in the specification (but we can gain this confidence by evaluating the specification on test data).
				195	* This does not work for platform-dependent properties unless we have a formal model of the platform.
				196
				197	\| Requirement \| Analysis \|
				198	\| ----------- \| -------- \|
				199	\| Coverage \| ~ Good for platform-independent properties, if we can guarantee them statically \|
				200	\| Correctness \| + Good as long as we have confidence in the specification \|
				201	\| Effacement \| ++ Zero impact on the code \|
				202	\| Portability \| ++ Zero runtime burden \|
				203	\| Maintainability \| ~ Static analysis is hard, but it's also helpful \|
				204
				205	### Compile-time options
				206
				207	If there's code that we want to have in the product for testing, but not in production, we can add a compile-time option to enable it. This is very powerful and usually easy to use, but comes with a major downside: we aren't testing the same code anymore.
				208
				209	\| Requirement \| Analysis \|
				210	\| ----------- \| -------- \|
				211	\| Coverage \| ++ Most things can be tested that way \|
				212	\| Correctness \| ! Difficult to ensure that what we test is what we run \|
				213	\| Effacement \| ++ No impact on the product when built normally or on the documentation, if done right \|
				214	\| \| ! Risk of getting “no impact” wrong \|
				215	\| Portability \| ++ It's just C code so it works everywhere \|
				216	\| \| ~ Doesn't work if the users rewrite an internal module \|
				217	\| Maintainability \| + Test interfaces impact the product source code, but at least they're clearly marked as such in the code \|
				218
				219	#### Guidelines for compile-time options
				220
				221	* Minimize the number of compile-time options.<br>
				222	Either we're testing or we're not. Fine-grained options for testing would require more test builds, especially if combinatorics enters the play.
				223	* Merely enabling the compile-time option should not change the behavior.<br>
				224	When building in test mode, the code should have exactly the same behavior. Changing the behavior should require some action at runtime (calling a function or changing a variable).
				225	* Minimize the impact on code.<br>
				226	We should not have test-specific conditional compilation littered through the code, as that makes the code hard to read.
				227
				228	### Runtime instrumentation
				229
				230	Some properties can be tested through runtime instrumentation: have the compiler or a similar tool inject something into the binary.
				231
				232	* Sanitizers check for certain bad usage patterns (ASan, MSan, UBSan, Valgrind).
				233	* We can inject external libraries at link time. This can be a way to make system functions fail.
				234
				235	\| Requirement \| Analysis \|
				236	\| ----------- \| -------- \|
				237	\| Coverage \| ! Limited scope \|
				238	\| Correctness \| + Instrumentation generally does not affect the program's functional behavior \|
				239	\| Effacement \| ++ Zero impact on the code \|
				240	\| Portability \| ~ Depends on the method \|
				241	\| Maintainability \| ~ Depending on the instrumentation, this may require additional builds and scripts \|
				242	\| \| + Many properties come for free, but some require effort (e.g. the test code itself must be leak-free to avoid false positives in a leak detector) \|
				243
				244	### Debugger-based testing
				245
				246	If we want to do something in a test that the product isn't capable of doing, we can use a debugger to read or modify the memory, or hook into the code at arbitrary points.
				247
				248	This is a very powerful approach, but it comes with limitations:
				249
				250	* The debugger may introduce behavior changes (e.g. timing). If we modify data structures in memory, we may do so in a way that the code doesn't expect.
				251	* Due to compiler optimizations, the memory may not have the layout that we expect.
				252	* Writing reliable debugger scripts is hard. We need to have confidence that we're testing what we mean to test, even in the face of compiler optimizations. Languages such as gdb make it hard to automate even relatively simple things such as finding the place(s) in the binary corresponding to some place in the source code.
				253	* Debugger scripts are very much non-portable.
				254
				255	\| Requirement \| Analysis \|
				256	\| ----------- \| -------- \|
				257	\| Coverage \| ++ The sky is the limit \|
				258	\| Correctness \| ++ The code is unmodified, and tested as compiled (so we even detect compiler-induced bugs) \|
				259	\| \| ! Compiler optimizations may hinder \|
				260	\| \| ~ Modifying the execution may introduce divergence \|
				261	\| Effacement \| ++ Zero impact on the code \|
				262	\| Portability \| !! Not all environments have a debugger, and even if they do, we'd need completely different scripts for every debugger \|
				263	\| Maintainability \| ! Writing reliable debugger scripts is hard \|
				264	\| \| !! Very tight coupling with the details of the source code and even with the compiler \|
				265
				266	## Solutions
				267
Gilles Peskine	4b7279e	2019-09-10 17:39:33 +0200	[diff] [blame]	268	This section lists some strategies that are currently used for invasive testing, or planned to be used. This list is not intended to be exhaustive.
				269
				270	### Memory management
				271
				272	#### Zeroization testing
				273
				274	Goal: test that `mbedtls_platform_zeroize` does wipe the memory buffer.
				275
				276	Solution ([debugger](#debugger-based-testing)): implemented in `tests/scripts/test_zeroize.gdb`.
				277
				278	Rationale: this cannot be tested by adding C code, because the danger is that the compiler optimizes the zeroization away, and any C code that observes the zeroization would cause the compiler not to optimize it away.
				279
				280	#### Memory cleanup
				281
				282	Goal: test the absence of memory leaks.
				283
Gilles Peskine	d04b9ed	2020-05-11 23:03:24 +0200	[diff] [blame]	284	Solution ([instrumentation](#runtime-instrumentation)): run tests with ASan. (We also use Valgrind, but it's slower than ASan, so we favor ASan.)
Gilles Peskine	4b7279e	2019-09-10 17:39:33 +0200	[diff] [blame]	285
				286	Since we run many test jobs with a memory leak detector, each test function must clean up after itself. Use the cleanup code (after the `exit` label) to free any memory that the function may have allocated.
				287
				288	#### Robustness against memory allocation failure
				289
				290	Solution: TODO. We don't test this at all at this point.
				291
				292	#### PSA key store memory cleanup
				293
				294	Goal: test the absence of resource leaks in the PSA key store code, in particular that `psa_close_key` and `psa_destroy_key` work correctly.
				295
Gilles Peskine	5925183	2020-05-11 23:05:01 +0200	[diff] [blame^]	296	Solution ([internal interface](#internal-interfaces)): in most tests involving PSA functions, the cleanup code explicitly calls `PSA_DONE()` instead of `mbedtls_psa_crypto_free()`. `PSA_DONE` fails the test if the key store in memory is not empty.
Gilles Peskine	4b7279e	2019-09-10 17:39:33 +0200	[diff] [blame]	297
				298	Note there must also be tests that call `mbedtls_psa_crypto_free` with keys still open, to verify that it does close all keys.
				299
Gilles Peskine	5925183	2020-05-11 23:05:01 +0200	[diff] [blame^]	300	`PSA_DONE` is a macro defined in `psa_crypto_helpers.h` which uses `mbedtls_psa_get_stats()` to get information about the keystore content before calling `mbedtls_psa_crypto_free()`. This feature is mostly but not exclusively useful for testing, and may be moved under `MBEDTLS_TEST_HOOKS`.
Gilles Peskine	4b7279e	2019-09-10 17:39:33 +0200	[diff] [blame]	301
				302	### PSA storage
				303
				304	#### PSA storage cleanup on success
				305
				306	Goal: test that no stray files are left over in the key store after a test that succeeded.
				307
				308	Solution: TODO. Currently the various test suites do it differently.
				309
				310	#### PSA storage cleanup on failure
				311
				312	Goal: ensure that no stray files are left over in the key store even if a test has failed (as that could cause other tests to fail).
				313
				314	Solution: TODO. Currently the various test suites do it differently.
				315
				316	#### PSA storage resilience
				317
				318	Goal: test the resilience of PSA storage against power failures.
				319
				320	Solution: TODO.
				321
				322	See the [secure element driver interface test strategy](driver-interface-test-strategy.html) for more information.
				323
				324	#### Corrupted storage
				325
				326	Goal: test the robustness against corrupted storage.
				327
				328	Solution ([internal interface](#internal-interfaces)): call `psa_its` functions to modify the storage.
				329
				330	#### Storage read failure
				331
				332	Goal: test the robustness against read errors.
				333
				334	Solution: TODO
				335
				336	#### Storage write failure
				337
				338	Goal: test the robustness against write errors (`STORAGE_FAILURE` or `INSUFFICIENT_STORAGE`).
				339
				340	Solution: TODO
				341
				342	#### Storage format stability
				343
				344	Goal: test that the storage format does not change between versions (or if it does, an upgrade path must be provided).
				345
				346	Solution ([internal interface](#internal-interfaces)): call internal functions to inspect the content of the file.
				347
				348	Note that the storage format is defined not only by the general layout, but also by the numerical values of encodings for key types and other metadata. For numerical values, there is a risk that we would accidentally modify a single value or a few values, so the tests should be exhaustive. This probably requires some compile-time analysis (perhaps the automation for `psa_constant_names` can be used here). TODO
				349
				350	### Other fault injection
				351
				352	#### PSA crypto init failure
				353
				354	Goal: test the failure of `psa_crypto_init`.
				355
				356	Solution ([compile-time option](#compile-time-options)): replace entropy initialization functions by functions that can fail. This is the only failure point for `psa_crypto_init` that is present in all builds.
				357
				358	When we implement the PSA entropy driver interface, this should be reworked to use the entropy driver interface.
				359
				360	#### PSA crypto data corruption
				361
				362	The PSA crypto subsystem has a few checks to detect corrupted data in memory. We currently don't have a way to exercise those checks.
				363
				364	Solution: TODO. To corrupt a multipart operation structure, we can do it by looking inside the structure content, but only when running without isolation. To corrupt the key store, we would need to add a function to the library or to use a debugger.
				365