blob: b12f94c73e730266f86bd3071b3da7eedc65d143 [file] [log] [blame]
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001##############
2Virtualization
3##############
4OP-TEE have experimental virtualization support. This is when one OP-TEE
5instance can run TAs from multiple virtual machines. OP-TEE isolates all
6VM-related states, so one VM can't affect another in any way.
7
8With virtualization support enabled, OP-TEE will rely on a hypervisor, because
9only the hypervisor knows which VM is calling OP-TEE. Also, naturally the
10hypervisor should inform OP-TEE about creation and destruction of VMs. Besides,
11in almost all cases, hypervisor enables two-stage MMU translation, so VMs does
12not see real physical address of memory, instead they work with intermediate
13physical addresses (IPAs). On other hand OP-TEE can't translate IPA to PA, so
14this is a hypervisor's responsibility to do this kind of translation. So,
15hypervisor should include a component that knows about OP-TEE protocol internals
16and can do this translation. We call this component "TEE mediator" and right now
17only XEN hypervisor have OP-TEE mediator.
18
19Configuration
20*************
21Virtualization support is enabled with ``CFG_VIRTUALIZATION`` configuration
22option. When this option is enabled, OP-TEE will **not** work without compatible
23a hypervisor. This is because the hypervisor should send
24``OPTEE_SMC_VM_CREATED`` SMC with VM ID before any standard SMC can be received
25from client.
26
27``CFG_VIRT_GUEST_COUNT`` controls the maximum number of supported VMs. As OP-TEE
28have limited size of available memory, increasing this count will decrease
29amount of memory available to one VM. Because we want VMs to be independent,
30OP-TEE splits available memory in equal portions to every VM, so one VM can't
31consume all memory and cause DoS to other VMs.
32
33Requirements for hypervisor
34***************************
35As said earlier, hypervisor should be aware of OP-TEE and SMCs from virtual
36guests to OP-TEE. This is a list of things, that compatible hypervisor should
37perform:
38
39 1. When new OP-TEE-capable VM is created, hypervisor should inform OP-TEE
40 about it with SMC ``OPTEE_SMC_VM_CREATED``. ``a1`` parameter should
41 contain VM id. ID 0 is defined as ``HYP_CLNT_ID`` and is reserved for
42 hypervisor itself.
43
44 2. When OP-TEE-capable VM is being destroyed, hypervisor should stop all
45 VCPUs (this will ensure that OP-TEE have no active threads for that VMs)
46 and send SMC ``OPTEE_SMC_VM_DESTROYED`` with the same parameters as for
47 ``OPTEE_SMC_VM_CREATED``.
48
49 3. Any SMC to OP-TEE should have VM ID in ``a7`` parameter. This is either
50 ``HYP_CLNT_ID`` if call originates from hypervisor or VM ID that was
51 passed in ``OPTEE_SMC_VM_CREATED`` call.
52
53 4. Hypervisor should perform IPA<->PA address translation for all SMCs. This
54 includes both arguments in ``a1``-``a6`` registers and in in-memory
55 command buffers.
56
57 5. Hypervisor should pin memory pages that VM shares with OP-TEE. This
58 means, that hypervisor should ensure that pinned page will reside at the
59 original PA as long, as it is shared with OP-TEE. Also it should still
60 belong to the VM that shared it. For example, the hypervisor should not
61 swap out this page, transfer ownership to another VM, unmap it from VM
62 address space and so on.
63
64 6. Naturally, the hypervisor should correctly handle the OP-TEE protocol, so
65 for any VM it should look like it is working with OP-TEE directly.
66
67Limitations
68***********
69Virtualization support is in experimental state and it have some limitations,
70user should be aware of.
71
72Platforms support
73=================
74Only Armv8 architecture is supported. There is no hard restriction, but
75currently Armv7-specific code (like MMU or thread manipulation) just know
76nothing about virtualization. Only one platform has been tested right now and
77that is QEMU-V8 (aka qemu that emulates Arm Versatile Express with Armv8
78architecture). Support for Rcar Gen3 should be added soon.
79
80Static VMs guest count and memory allocation
81============================================
82Currently, a user should configure maximum number of guests. OP-TEE will split
83memory into equal chunks, so every VM will have the same amount of memory. For
84example, if you have 6MB for your TAs, you can set ``CFG_VIRT_GUEST_COUNT`` to 3
85and every VM would be able to use 2MB maximum, even if there is no other VMs
86running. This is okay for embedded setups when you know exact number and roles
87of VMs, but can be inconvenient for server applications. Also, it is impossible
88to configure amount of memory available for a given VM. Every VM instance will
89have exactly the same amount of memory.
90
91Sharing hardware resources and PTAs
92===================================
93Right now only HW that can be used by multiple VMs simultaneously is serial
94console, used for logging. Devices like HW crypto accelerators, secure storage
95devices (e.g. external flash storage, accessed directly from OP-TEE) and others
96are not supported right now. Drivers should be made virtualization-aware before
97they can be used with virtualization extensions.
98
99Every VM will have own PTA states, which is a good thing in most cases. But if
100one wants PTA to have some global state that is shared between VMs, he need to
101write PTA accordingly.
102
103No compatibility with "normal" mode
104===================================
105OP-TEE built with ``CFG_VIRTUALIZATION=y`` will not work without a hypervisor,
106because before executing any standard SMC, ``OPTEE_SMC_VM_CREATED`` must be
107called. This can be inconvenient if one wants to switch between virtualized and
108non-virtualized environment frequently. On other hand, it is not a big deal in a
109production environment. Simple workaround can be made for this: if OP-TEE
110receives standard SMC prior to ``OPTEE_SMC_VM_CREATED``, it implicitly creates
111VM context and uses it for all subsequent calls.
112
113Implementation details
114======================
115OP-TEE as a whole can be split into two entities. Let us call them "nexus" and
116TEE. Nexus is a core part of OP-TEE that takes care of low level things: SMC
117handling, memory management, threads creation and so on. TEE is a part that does
118the actual job: handles requests, loads TAs, executes them, and so on. So, it is
119natural to have one nexus instance and multiple instances of TEE, one TEE
120instance per registered VM. This can be done either explicitly or implicitly.
121
122Explicit way is to move TEE state in some sort of structure and make all code to
123access fields of this structure. Something like ``struct task_struct`` and
124``current`` in linux kernel. Then it is easy to allocate such structure for
125every VM instance. But this approach basically requires to rewrite all OP-TEE
126code.
127
128Implicit way is to have banked memory sections for TEE/VM instances. So memory
129layout can look something like that:
130
131.. code-block:: none
132
133 +-------------------------------------------------+
134 | Nexus: .nex_bss, .nex_data, ... |
135 +-------------------------------------------------+
136 | TEE states |
137 | |
138 | VM1 TEE state | VM 2 TEE state | VM 3 TEE state |
139 | .bss, .data | .bss, .data | .bss, .data, |
140 +-------------------------------------------------+
141
142This approach requires no changes in TEE code and requires some changes into
143nexus code. So, idea that Nexus state resides in separate sections
144(``.nex_data``, ``.nex_bss``, ``.nex_nozi``, ``.nex_heap`` and others) and is
145always mapped.
146
147TEE state resides in standard sections (like ``.data``, ``.bss``, ``.heap`` and
148so on). There is a separate set of this sections for every VM registered and
149Nexus maps them only when it receives call from corresponding VM.
150
151As Nexus and TEE have separate heaps, ``bget`` allocator was extended to work
152with multiple "contexts". ``malloc()``, ``free()`` with friends work with one
153context. ``nex_malloc()`` (and other ``nex_`` functions) were added. They use
154different context, so now Nexus can use separate heap, which is always mapped
155into OP-TEE address space. When virtualization support is disabled, all those
156``nex_`` functions are defined to point to standard ``malloc()`` counterparts.
157
158To change memory mappings in run-time, in MMU code we have added a new entity,
159named "partition", which is defined by ``struct mmu_partition``. It holds
160information about all page-tables, so the whole MMU mapping can be switched by
161one write to ``TTBR`` register.
162
163There is the default partition, it holds MMU state when there is no VM context
164active, so no TEE state is mapped. When OP-TEE receives ``OPTEE_SMC_VM_CREATED``
165call, it copies default partition into new one and then maps sections with TEE
166data. This is done by ``prepare_memory_map()`` function in ``virtualization.c``.
167
168When OP-TEE receives STD call it checks that the supplied VM ID is valid and
169then activates corresponding MMU partition, so TEE code can access its own data.
170This is basically how virtualization support is working.