Blame - architecture/virtualization.rst - OP-TEE/optee_docs

blob: b12f94c73e730266f86bd3071b3da7eedc65d143 [file] [log] [blame]

Joakim Bech	8e5c5b3	2018-10-25 08:18:32 +0200	[diff] [blame]	1	##############
				2	Virtualization
				3	##############
				4	OP-TEE have experimental virtualization support. This is when one OP-TEE
				5	instance can run TAs from multiple virtual machines. OP-TEE isolates all
				6	VM-related states, so one VM can't affect another in any way.
				7
				8	With virtualization support enabled, OP-TEE will rely on a hypervisor, because
				9	only the hypervisor knows which VM is calling OP-TEE. Also, naturally the
				10	hypervisor should inform OP-TEE about creation and destruction of VMs. Besides,
				11	in almost all cases, hypervisor enables two-stage MMU translation, so VMs does
				12	not see real physical address of memory, instead they work with intermediate
				13	physical addresses (IPAs). On other hand OP-TEE can't translate IPA to PA, so
				14	this is a hypervisor's responsibility to do this kind of translation. So,
				15	hypervisor should include a component that knows about OP-TEE protocol internals
				16	and can do this translation. We call this component "TEE mediator" and right now
				17	only XEN hypervisor have OP-TEE mediator.
				18
				19	Configuration
				20	*************
				21	Virtualization support is enabled with ``CFG_VIRTUALIZATION`` configuration
				22	option. When this option is enabled, OP-TEE will not work without compatible
				23	a hypervisor. This is because the hypervisor should send
				24	``OPTEE_SMC_VM_CREATED`` SMC with VM ID before any standard SMC can be received
				25	from client.
				26
				27	``CFG_VIRT_GUEST_COUNT`` controls the maximum number of supported VMs. As OP-TEE
				28	have limited size of available memory, increasing this count will decrease
				29	amount of memory available to one VM. Because we want VMs to be independent,
				30	OP-TEE splits available memory in equal portions to every VM, so one VM can't
				31	consume all memory and cause DoS to other VMs.
				32
				33	Requirements for hypervisor
				34	***************************
				35	As said earlier, hypervisor should be aware of OP-TEE and SMCs from virtual
				36	guests to OP-TEE. This is a list of things, that compatible hypervisor should
				37	perform:
				38
				39	1. When new OP-TEE-capable VM is created, hypervisor should inform OP-TEE
				40	about it with SMC ``OPTEE_SMC_VM_CREATED``. ``a1`` parameter should
				41	contain VM id. ID 0 is defined as ``HYP_CLNT_ID`` and is reserved for
				42	hypervisor itself.
				43
				44	2. When OP-TEE-capable VM is being destroyed, hypervisor should stop all
				45	VCPUs (this will ensure that OP-TEE have no active threads for that VMs)
				46	and send SMC ``OPTEE_SMC_VM_DESTROYED`` with the same parameters as for
				47	``OPTEE_SMC_VM_CREATED``.
				48
				49	3. Any SMC to OP-TEE should have VM ID in ``a7`` parameter. This is either
				50	``HYP_CLNT_ID`` if call originates from hypervisor or VM ID that was
				51	passed in ``OPTEE_SMC_VM_CREATED`` call.
				52
				53	4. Hypervisor should perform IPA<->PA address translation for all SMCs. This
				54	includes both arguments in ``a1``-``a6`` registers and in in-memory
				55	command buffers.
				56
				57	5. Hypervisor should pin memory pages that VM shares with OP-TEE. This
				58	means, that hypervisor should ensure that pinned page will reside at the
				59	original PA as long, as it is shared with OP-TEE. Also it should still
				60	belong to the VM that shared it. For example, the hypervisor should not
				61	swap out this page, transfer ownership to another VM, unmap it from VM
				62	address space and so on.
				63
				64	6. Naturally, the hypervisor should correctly handle the OP-TEE protocol, so
				65	for any VM it should look like it is working with OP-TEE directly.
				66
				67	Limitations
				68	***********
				69	Virtualization support is in experimental state and it have some limitations,
				70	user should be aware of.
				71
				72	Platforms support
				73	=================
				74	Only Armv8 architecture is supported. There is no hard restriction, but
				75	currently Armv7-specific code (like MMU or thread manipulation) just know
				76	nothing about virtualization. Only one platform has been tested right now and
				77	that is QEMU-V8 (aka qemu that emulates Arm Versatile Express with Armv8
				78	architecture). Support for Rcar Gen3 should be added soon.
				79
				80	Static VMs guest count and memory allocation
				81	============================================
				82	Currently, a user should configure maximum number of guests. OP-TEE will split
				83	memory into equal chunks, so every VM will have the same amount of memory. For
				84	example, if you have 6MB for your TAs, you can set ``CFG_VIRT_GUEST_COUNT`` to 3
				85	and every VM would be able to use 2MB maximum, even if there is no other VMs
				86	running. This is okay for embedded setups when you know exact number and roles
				87	of VMs, but can be inconvenient for server applications. Also, it is impossible
				88	to configure amount of memory available for a given VM. Every VM instance will
				89	have exactly the same amount of memory.
				90
				91	Sharing hardware resources and PTAs
				92	===================================
				93	Right now only HW that can be used by multiple VMs simultaneously is serial
				94	console, used for logging. Devices like HW crypto accelerators, secure storage
				95	devices (e.g. external flash storage, accessed directly from OP-TEE) and others
				96	are not supported right now. Drivers should be made virtualization-aware before
				97	they can be used with virtualization extensions.
				98
				99	Every VM will have own PTA states, which is a good thing in most cases. But if
				100	one wants PTA to have some global state that is shared between VMs, he need to
				101	write PTA accordingly.
				102
				103	No compatibility with "normal" mode
				104	===================================
				105	OP-TEE built with ``CFG_VIRTUALIZATION=y`` will not work without a hypervisor,
				106	because before executing any standard SMC, ``OPTEE_SMC_VM_CREATED`` must be
				107	called. This can be inconvenient if one wants to switch between virtualized and
				108	non-virtualized environment frequently. On other hand, it is not a big deal in a
				109	production environment. Simple workaround can be made for this: if OP-TEE
				110	receives standard SMC prior to ``OPTEE_SMC_VM_CREATED``, it implicitly creates
				111	VM context and uses it for all subsequent calls.
				112
				113	Implementation details
				114	======================
				115	OP-TEE as a whole can be split into two entities. Let us call them "nexus" and
				116	TEE. Nexus is a core part of OP-TEE that takes care of low level things: SMC
				117	handling, memory management, threads creation and so on. TEE is a part that does
				118	the actual job: handles requests, loads TAs, executes them, and so on. So, it is
				119	natural to have one nexus instance and multiple instances of TEE, one TEE
				120	instance per registered VM. This can be done either explicitly or implicitly.
				121
				122	Explicit way is to move TEE state in some sort of structure and make all code to
				123	access fields of this structure. Something like ``struct task_struct`` and
				124	``current`` in linux kernel. Then it is easy to allocate such structure for
				125	every VM instance. But this approach basically requires to rewrite all OP-TEE
				126	code.
				127
				128	Implicit way is to have banked memory sections for TEE/VM instances. So memory
				129	layout can look something like that:
				130
				131	.. code-block:: none
				132
				133	+-------------------------------------------------+
				134	\| Nexus: .nex_bss, .nex_data, ... \|
				135	+-------------------------------------------------+
				136	\| TEE states \|
				137	\| \|
				138	\| VM1 TEE state \| VM 2 TEE state \| VM 3 TEE state \|
				139	\| .bss, .data \| .bss, .data \| .bss, .data, \|
				140	+-------------------------------------------------+
				141
				142	This approach requires no changes in TEE code and requires some changes into
				143	nexus code. So, idea that Nexus state resides in separate sections
				144	(``.nex_data``, ``.nex_bss``, ``.nex_nozi``, ``.nex_heap`` and others) and is
				145	always mapped.
				146
				147	TEE state resides in standard sections (like ``.data``, ``.bss``, ``.heap`` and
				148	so on). There is a separate set of this sections for every VM registered and
				149	Nexus maps them only when it receives call from corresponding VM.
				150
				151	As Nexus and TEE have separate heaps, ``bget`` allocator was extended to work
				152	with multiple "contexts". ``malloc()``, ``free()`` with friends work with one
				153	context. ``nex_malloc()`` (and other ``nex_`` functions) were added. They use
				154	different context, so now Nexus can use separate heap, which is always mapped
				155	into OP-TEE address space. When virtualization support is disabled, all those
				156	``nex_`` functions are defined to point to standard ``malloc()`` counterparts.
				157
				158	To change memory mappings in run-time, in MMU code we have added a new entity,
				159	named "partition", which is defined by ``struct mmu_partition``. It holds
				160	information about all page-tables, so the whole MMU mapping can be switched by
				161	one write to ``TTBR`` register.
				162
				163	There is the default partition, it holds MMU state when there is no VM context
				164	active, so no TEE state is mapped. When OP-TEE receives ``OPTEE_SMC_VM_CREATED``
				165	call, it copies default partition into new one and then maps sections with TEE
				166	data. This is done by ``prepare_memory_map()`` function in ``virtualization.c``.
				167
				168	When OP-TEE receives STD call it checks that the supplied VM ID is valid and
				169	then activates corresponding MMU partition, so TEE code can access its own data.
				170	This is basically how virtualization support is working.