blob: 6142e29d66468b2b7452746fed46afb0b25f2627 [file] [log] [blame]
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001.. _core:
2
3####
4Core
5####
6
7.. _interrupt_handling:
8
9Interrupt handling
10******************
11This section describes how :ref:`optee_os` handles switches of world execution
Joakim Bech1e506862019-06-24 10:00:51 +020012context based on :ref:`SMC` exceptions and interrupt notifications. Interrupt
Joakim Bech8e5c5b32018-10-25 08:18:32 +020013notifications are IRQ/FIQ exceptions which may also imply switching of world
14execution context: normal world to secure world, or secure world to normal
15world.
16
17Use cases of world context switch
18=================================
Jens Wiklander01d61a92021-05-12 14:56:22 +020019This section lists all the cases where OP-TEE OS is involved in world context
Joakim Bech8e5c5b32018-10-25 08:18:32 +020020switches. Optee_os executes in the secure world. World switch is done by the
Joakim Becheb397802019-09-13 11:45:06 +020021core's secure monitor level/mode, referred below as the Monitor.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020022
23When the normal world invokes the secure world, the normal world executes a SMC
24instruction. The SMC exception is always trapped by the Monitor. If the related
Jens Wiklander01d61a92021-05-12 14:56:22 +020025service targets the trusted OS, the Monitor will switch to OP-TEE OS world
26execution. When the secure world returns to the normal world, OP-TEE OS executes
Joakim Bech8e5c5b32018-10-25 08:18:32 +020027a SMC that is caught by the Monitor which switches back to the normal world.
28
Jens Wiklander01d61a92021-05-12 14:56:22 +020029When a secure interrupt is signaled by the Arm GIC, it shall reach the OP-TEE OS
30interrupt exception vector. If the secure world is executing, OP-TEE OS will
31handle interrupt straight from its exception vector. If the normal world is
Joakim Bech8e5c5b32018-10-25 08:18:32 +020032executing when the secure interrupt raises, the Monitor vector must handle the
Jens Wiklander01d61a92021-05-12 14:56:22 +020033exception and invoke OP-TEE OS to serve the interrupt.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020034
35When a non-secure interrupt is signaled by the Arm GIC, it shall reach the
36normal world interrupt exception vector. If the normal world is executing, it
37will handle straight the exception from its exception vector. If the secure
Jens Wiklander01d61a92021-05-12 14:56:22 +020038world is executing when the non-secure interrupt raises, OP-TEE OS will
Joakim Bech8e5c5b32018-10-25 08:18:32 +020039temporarily return back to normal world via the Monitor to let normal world
40serve the interrupt.
41
42Core exception vectors
43======================
44Monitor vector is ``VBAR_EL3`` in AArch64 and ``MVBAR`` in Armv7-A/AArch32.
45Monitor can be reached while normal world or secure world is executing. The
46executing secure state is known to the Monitor through the ``SCR_NS``.
47
48Monitor can be reached from a SMC exception, an IRQ or FIQ exception (so-called
49interrupts) and from asynchronous aborts. Obviously monitor aborts (data,
50prefetch, undef) are local to the Monitor execution.
51
Jens Wiklander01d61a92021-05-12 14:56:22 +020052The Monitor can be external to OP-TEE OS (case ``CFG_WITH_ARM_TRUSTED_FW=y``).
Joakim Bech8e5c5b32018-10-25 08:18:32 +020053If not, provides a local secure monitor ``core/arch/arm/sm``. Armv7-A platforms
Jens Wiklander01d61a92021-05-12 14:56:22 +020054should use the OP-TEE OS secure monitor. Armv8-A platforms are likely to rely on
Joakim Bech8e5c5b32018-10-25 08:18:32 +020055an `Trusted Firmware A`_.
56
57When executing outside the Monitor, the system is executing either in the
58normal world (``SCR_NS=1``) or in the secure world (``SCR_NS=0``). Each world
59owns its own exception vector table (state vector):
60
61 - ``VBAR_EL2`` or ``VBAR_EL1`` non-secure or ``VBAR_EL1`` secure for
62 AArch64.
63 - ``HVBAR`` or ``VBAR`` non-secure or ``VBAR`` secure for Armv7-A and
64 AArch32.
65
66All SMC exceptions are trapped in the Monitor vector. IRQ/FIQ exceptions can be
67trapped either in the Monitor vector or in the state vector of the executing
68world.
69
70When the normal world is executing, the system is configured to route:
71
Jens Wiklander01d61a92021-05-12 14:56:22 +020072 - secure interrupts to the Monitor that will forward to OP-TEE OS
Joakim Bech8e5c5b32018-10-25 08:18:32 +020073 - non-secure interrupts to the executing world exception vector.
74
75When the secure world is executing, the system is configured to route:
76
Jens Wiklander01d61a92021-05-12 14:56:22 +020077 - secure and non-secure interrupts to the executing OP-TEE OS exception
78 vector. OP-TEE OS shall forward the non-secure interrupts to the normal
Joakim Bech8e5c5b32018-10-25 08:18:32 +020079 world.
80
81Optee_os non-secure interrupts are always trapped in the state vector of the
82executing world. This is reflected by a static value of ``SCR_(IRQ|FIQ)``.
83
84.. _native_foreign_irqs:
85
86Native and foreign interrupts
87=============================
Jens Wiklander01d61a92021-05-12 14:56:22 +020088Two types of interrupt are defined from OP-TEE OS point of view.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020089
Jens Wiklander01d61a92021-05-12 14:56:22 +020090 - **Native interrupt** - The interrupt handled by OP-TEE OS, secure
91 interrupts targetting S-EL1 or secure privileged mode
92 - **Foreign interrupt** - The interrupt not handled by OP-TEE OS, non-secure
93 interrupts targetting normal world or secure interrupts targetting EL3.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020094
Jens Wiklander01d61a92021-05-12 14:56:22 +020095For Arm **GICv2** mode, a native interrupt is signalled with a FIQ and a
96foreign interrupt is signalled with an IRQ. For Arm **GICv3** mode, a
97foreign interrupts is signalled as a FIQ which could be handled by either
98secure world (aarch32 Monitor mode or aarch64 EL3) or normal world.
99
100Arm GICv3 mode can be enabled by setting ``CFG_ARM_GICV3=y``.
101Native interrupts must be securely routed to OP-TEE OS. Foreign interrupts, when
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200102trapped during secure world execution might need to be efficiently routed to
103the normal world.
104
Jens Wiklander01d61a92021-05-12 14:56:22 +0200105IRQ and FIQ keeps their meaning in normal world so for clarity we will keep
106using those names in the normal world context.
107
108Normal World invokes OP-TEE OS using SMC
109========================================
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200110
111**Entering the Secure Monitor**
112
113The monitor manages all entries and exits of secure world. To enter secure
114world from normal world the monitor saves the state of normal world (general
115purpose registers and system registers which are not banked) and restores the
116previous state of secure world. Then a return from exception is performed and
117the restored secure state is resumed. Exit from secure world to normal world is
118the reverse.
119
120Some general purpose registers are not saved and restored on entry and exit,
121those are used to pass parameters between secure and normal world (see
122ARM_DEN0028A_SMC_Calling_Convention_ for details).
123
124**Entry and exit of Trusted OS**
125
126On entry and exit of Trusted OS each CPU is uses a separate entry stack and runs
Jens Wiklander01d61a92021-05-12 14:56:22 +0200127with IRQ and FIQ masked. SMCs are categorised in two flavors: **fast** and
128**yielding**.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200129
Jens Wiklander01d61a92021-05-12 14:56:22 +0200130 - For **fast** SMCs, OP-TEE OS will execute on the entry stack with IRQ/FIQ
131 masked until the execution returns to normal world.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200132
Jens Wiklander01d61a92021-05-12 14:56:22 +0200133 - For **yielding** SMCs, OP-TEE OS will at some point execute the requested
134 service with interrupts unmasked. In order to handle interrupts, mainly
135 forwarding of foreign interrupts, OP-TEE OS assigns a trusted thread
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200136 (`core/arch/arm/kernel/thread.c`_) to the SMC request. The trusted thread
137 stores the execution context of the requested service. This context can be
138 suspended and resumed as the requested service executes and is
139 interrupted. The trusted thread is released only once the service
140 execution returns with a completion status.
141
Jens Wiklander01d61a92021-05-12 14:56:22 +0200142 For **yielding** SMCs, OP-TEE OS allocates or resumes a trusted thread
143 then unmasks the IRQ and FIQ lines. When the OP-TEE OS needs to invoke the
144 normal world from a foreign interrupt or a remote service call, OP-TEE OS
145 masks IRQ and FIQ and suspends the trusted thread. When suspending,
146 OP-TEE OS gets back to the entry stack.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200147
Jens Wiklander01d61a92021-05-12 14:56:22 +0200148 - **Both** fast and yielding SMCs end on the entry stack with IRQ and
149 FIQ masked and OP-TEE OS invokes the Monitor through a SMC to return
150 to the normal world.
Jens Wiklander2c39d742021-05-10 16:02:08 +0200151
152.. uml::
153 :align: center
154 :caption: SMC entry to secure world
155
156 participant "Normal World" as nwd
157 participant "Secure Monitor" as smon
158 participant "OP-TEE OS entry" as entry
159 participant "OP-TEE OS" as optee
160 == IRQ and FIQ unmasked ==
161 nwd -> smon : smc: TEE_FUNC_INVOKE
162 smon -> smon : Save non-secure context
163 smon -> smon : Restore secure context
164 smon --> entry : eret: TEE_FUNC_INVOKE
165 entry -> entry : assign thread
166 entry -> optee : TEE_FUNC_INVOKE
167 == IRQ and FIQ unmasked ==
168 optee -> optee : process
169 == IRQ and FIQ masked ==
170 optee --> entry : SMC_CALL_RETURN
171 entry -> smon : smc: SMC_CALL_RETURN
172 smon -> smon : Save secure context
173 smon -> smon : Restore non-secure context
174 == IRQ and FIQ unmasked ==
175 smon --> nwd : eret: return
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200176
177Deliver non-secure interrupts to Normal World
178=============================================
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200179
180**Forward a Foreign Interrupt from Secure World to Normal World**
181
Jens Wiklander01d61a92021-05-12 14:56:22 +0200182When a foreign interrupt is received in secure world as an IRQ or FIQ
183exception then secure world:
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200184
185 1. Saves trusted thread context (entire state of all processor modes for
186 Armv7-A)
187
Jens Wiklander01d61a92021-05-12 14:56:22 +0200188 2. Masks all interrupts (IRQ and FIQ)
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200189
190 3. Switches to entry stack
191
192 4. Issues an SMC with a value to indicates to normal world that an IRQ has
Jens Wiklander01d61a92021-05-12 14:56:22 +0200193 been detected and last SMC call should be continued
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200194
195The monitor restores normal world context with a return code indicating that an
196IRQ is about to be delivered. Normal world issues a new SMC indicating that it
197should continue last SMC.
198
Jens Wiklander01d61a92021-05-12 14:56:22 +0200199The monitor restores secure world context which locates the previously
200saved context and checks that it is a return from a foreign interrupt that
201is requested before restoring the context and lets the secure world foreign
202interrupt handler return from exception where the execution would be
203resumed.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200204
Jens Wiklander01d61a92021-05-12 14:56:22 +0200205Note that the monitor itself does not know or care that it has just forwarded
206a foreign interrupt to normal world. The bookkeeping is done in the trusted
207thread handling in OP-TEE OS. Normal world is responsible to decide when
208the secure world thread should resume execution (for details, see
209:ref:`thread_handling`).
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200210
Jens Wiklander01d61a92021-05-12 14:56:22 +0200211.. uml::
212 :align: center
213 :caption: Foreign interrupt received in secure world and forwarded to
214 normal world
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200215
Jens Wiklander01d61a92021-05-12 14:56:22 +0200216 participant "Normal World" as nwd
217 participant "Secure Monitor" as smon
218 participant "OP-TEE OS entry" as entry
219 participant "OP-TEE OS" as optee
220 == IRQ and FIQ unmasked ==
221 optee -> optee : process
222 == IRQ and FIQ unmasked,\nForeign interrupt received ==
223 optee -> optee : suspend thread
224 optee -> entry : forward foreign interrupt
225 entry -> smon : smc: forward foreign interrupt
226 smon -> smon: Save secure context
227 smon -> smon: Restore non-secure context
228 == IRQ and FIQ unmasked ==
229 smon --> nwd : eret: IRQ forwarded
230 == FIQ unmasked, IRQ received ==
231 nwd -> nwd : process IRQ
232 == IRQ and FIQ unmasked ==
233 nwd -> smon : smc: return from IRQ
234 == IRQ and FIQ masked ==
235 smon -> smon : Save non-secure context
236 smon -> smon : Restore secure context
237 smon --> entry : eret: return from foreign interrupt
238 entry -> entry : find thread
239 entry --> optee : resume execution
240 == IRQ and FIQ unmasked ==
241 optee -> optee : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200242
Jens Wiklander01d61a92021-05-12 14:56:22 +0200243**Deliver a foreign interrupt to normal world when ``SCR_NS`` is set**
244
245Since ``SCR_IRQ`` is cleared, an IRQ will be delivered using the exception
246vector (``VBAR``) in the normal world. The IRQ is received as any other
247exception by normal world, the monitor and the OP-TEE OS are not involved
248at all.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200249
250Deliver secure interrupts to Secure World
251=========================================
Jens Wiklander01d61a92021-05-12 14:56:22 +0200252A secure (foreign) interrupt can be received during two different states,
253either in normal world (``SCR_NS`` is set) or in secure world (``SCR_NS``
254is cleared). When the secure monitor is active (Armv8-A EL3 or Armv7-A
255Monitor mode) FIQ and IRQ are masked. FIQ reception in the two different
256states is described below.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200257
Jens Wiklander01d61a92021-05-12 14:56:22 +0200258**Deliver secure interrupt to secure world when SCR_NS is set**
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200259
Jens Wiklander01d61a92021-05-12 14:56:22 +0200260When the monitor traps a secure interrupt it:
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200261
262 1. Saves normal world context and restores secure world context from last
263 secure world exit (which will have IRQ and FIQ blocked)
264 2. Clears ``SCR_FIQ`` when clearing ``SCR_NS``
Jens Wiklander01d61a92021-05-12 14:56:22 +0200265 3. Does a return from exception into OP-TEE OS via the secure interrupt
266 entry point
267 4. OP-TEE OS handles the native interrupt directly in the entry point
268 5. OP-TEE OS issues an SMC to return to normal world
269 6. The monitor saves the secure world context and restores the normal world context
270 7. Does a return from exception into the restored context
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200271
Jens Wiklander01d61a92021-05-12 14:56:22 +0200272.. uml::
273 :align: center
274 :caption: Secure interrupt received when SCR_NS is set
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200275
Jens Wiklander01d61a92021-05-12 14:56:22 +0200276 participant "Normal World" as nwd
277 participant "Secure Monitor" as smon
278 participant "OP-TEE OS entry" as entry
279 participant "OP-TEE OS" as optee
280 == IRQ and FIQ unmasked ==
281 == Running in non-secure world (SCR_NS set) ==
282 nwd -> nwd : process
283 == IRQ and FIQ masked,\nSecure interrupt received ==
284 smon -> smon : Save non-secure context
285 smon -> smon : Restore secure context
286 smon --> entry : eret: native interrupt entry point
287 entry -> entry: process received native interrupt
288 entry -> smon: smc: return
289 smon -> smon : Save secure context
290 smon -> smon : Restore non-secure context
291 smon --> nwd : eret: return to Normal world
292 == IRQ and FIQ unmasked ==
293 nwd -> nwd : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200294
295**Deliver FIQ to secure world when SCR_NS is cleared**
296
Jens Wiklander01d61a92021-05-12 14:56:22 +0200297.. uml::
298 :align: center
299 :caption: FIQ received while processing an IRQ forwarded from secure world
300
301 participant "Normal World" as nwd
302 participant "Secure Monitor" as smon
303 participant "OP-TEE OS entry" as entry
304 participant "OP-TEE OS" as optee
305 == IRQ and FIQ unmasked ==
306 optee -> optee : process
307 == IRQ and FIQ unmasked,\nForeign interrupt received ==
308 optee -> optee : suspend thread
309 optee -> entry : forward foreign interrupt
310 entry -> smon : smc: forward foreign interrupt
311 smon -> smon: Save secure context
312 smon -> smon: Restore non-secure context
313 == IRQ and FIQ unmasked ==
314 smon --> nwd : eret: IRQ forwarded
315 == FIQ unmasked, IRQ received ==
316 nwd -> nwd : process IRQ
317 == IRQ and FIQ masked,\nSecure interrupt received ==
318 smon -> smon : Save non-secure context
319 smon -> smon : Restore secure context
320 smon --> entry : eret: native interrupt entry point
321 entry -> entry : process received native interrupt
322 entry -> smon: smc: return
323 smon -> smon : Save secure context
324 smon -> smon : Restore non-secure context
325 smon --> nwd : eret: return to Normal world
326 == FIQ unmasked\nIRQ still being processed ==
327 nwd -> nwd : process IRQ
328 == IRQ and FIQ unmasked ==
329 nwd -> smon : smc: return from IRQ
330 == IRQ and FIQ masked ==
331 smon -> smon : Save non-secure context
332 smon -> smon : Restore secure context
333 smon --> entry : eret: return from foreign interrupt
334 entry -> entry : find thread
335 entry --> optee : resume execution
336 == IRQ and FIQ unmasked ==
337 optee -> optee : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200338
339Trusted thread scheduling
340=========================
341**Trusted thread for standard services**
342
Jens Wiklander01d61a92021-05-12 14:56:22 +0200343OP-TEE yielding services are carried through standard SMC. Execution of these
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200344services can be interrupted by foreign interrupts. To suspend and restore the
Jens Wiklander01d61a92021-05-12 14:56:22 +0200345service execution, optee_os assigns a trusted thread at yielding SMC entry.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200346
347The trusted thread terminates when optee_os returns to the normal world with a
348service completion status.
349
350A trusted thread execution can be interrupted by a native interrupt. In this
351case the native interrupt is handled by the interrupt exception handlers and
352once served, optee_os returns to the execution trusted thread.
353
354A trusted thread execution can be interrupted by a foreign interrupt. In this
355case, optee_os suspends the trusted thread and invokes the normal world through
356the Monitor (optee_os so-called RPC services). The trusted threads will resume
357only once normal world invokes the optee_os with the RPC service status.
358
359A trusted thread execution can lead optee_os to invoke a service in normal
360world: access a file, get the REE current time, etc. The trusted thread is
Jens Wiklander01d61a92021-05-12 14:56:22 +0200361first suspended then resumed during remote service execution.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200362
363**Scheduling considerations**
364
365When a trusted thread is interrupted by a foreign interrupt and when optee_os
366invokes a normal world service, the normal world gets the opportunity to
367reschedule the running applications. The trusted thread will resume only once
368the client application is scheduled back. Thus, a trusted thread execution
369follows the scheduling of the normal world caller context.
370
371Optee_os does not implement any thread scheduling. Each trusted thread is
372expected to track a service that is invoked from the normal world and should
373return to it with an execution status.
374
375The OP-TEE Linux driver (as implemented in `drivers/tee/optee`_ since Linux
376kernel 4.12) is designed so that the Linux thread invoking OP-TEE gets assigned
377a trusted thread on TEE side. The execution of the trusted thread is tied to the
378execution of the caller Linux thread which is under the Linux kernel scheduling
379decision. This means trusted threads are scheduled by the Linux kernel.
380
381**Trusted thread constraints**
382
383TEE core handles a static number of trusted threads, see ``CFG_NUM_THREADS``.
384
Jens Wiklander01d61a92021-05-12 14:56:22 +0200385Trusted threads are expensive on memory constrained system, mainly
386because of the execution stack size.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200387
388On SMP systems, optee_os can execute several trusted threads in parallel if the
389normal world supports scheduling of processes. Even on UP systems, supporting
390several trusted threads in optee_os helps normal world scheduler to be
391efficient.
392
393----
394
395.. _memory_objects:
396
397Memory objects
398**************
399A memory object, **MOBJ**, describes a piece of memory. The interface provided
400is mostly abstract when it comes to using the MOBJ to populate translation
401tables etc. There are different kinds of MOBJs describing:
402
403 - Physically contiguous memory
404 - created with ``mobj_phys_alloc(...)``.
405
406 - Virtual memory
407 - one instance with the name ``mobj_virt`` available.
408 - spans the entire virtual address space.
409
410 - Physically contiguous memory allocated from a ``tee_mm_pool_t *``
411 - created with ``mobj_mm_alloc(...)``.
412
413 - Paged memory
414 - created with ``mobj_paged_alloc(...)``.
415 - only contains the supplied size and makes ``mobj_is_paged(...)``
416 return true if supplied as argument.
417
418 - Secure copy paged shared memory
419 - created with ``mobj_seccpy_shm_alloc(...)``.
420 - makes ``mobj_is_paged(...)`` and ``mobj_is_secure(...)`` return true
421 if supplied as argument.
422
423----
424
425.. _mmu:
426
427MMU
428***
429Translation tables
430==================
Jens Wiklander18b953f2021-05-14 13:00:21 +0200431
432OP-TEE supports two translation table formats:
433
4341. Short-descriptor translation table format, available on ARMv7-A and
435 ARMv8-A AArch32
4362. Long-descriptor translation format, available on ARMv7-A with LPAE and
437 ARMv8-A
438
439ARMv7-A without LPAE (Large Physical Address Extension) must use the
440short-descriptor translation table format only. ARMv8-A AArch64 must use
441the long-descriptor translation format only.
442
443Translation table format is a static build time configuration option,
444``CFG_WITH_LPAE``. The design around the translation table handling has
445been centered around these factors:
446
4471. Share translation tables between CPUs when possible to save memory
448 and simplify paging
4492. Support non-global CPU specific mappings to allow executing different
450 TAs in parallel.
451
452Short-descriptor translation table format
453-----------------------------------------
454
455Several L1 translation tables are used, one large spanning 4 GiB and two or
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200456more small tables spanning 32 MiB. The large translation table handles kernel
457mode mapping and matches all addresses not covered by the small translation
458tables. The small translation tables are assigned per thread and covers the
459mapping of the virtual memory space for one TA context.
460
461Memory space between small and large translation table is configured by TTBRC.
462TTBR1 always points to the large translation table. TTBR0 points to the a small
463translation table when user mapping is active and to the large translation table
464when no user mapping is currently active. For details about registers etc,
465please refer to a Technical Reference Manual for your architecture, for example
466`Cortex-A53 TRM`_.
467
468The translation tables has certain alignment constraints, the alignment (of the
469physical address) has to be the same as the size of the translation table. The
470translation tables are statically allocated to avoid fragmentation of memory due
471to the alignment constraints.
472
473Each thread has one small L1 translation table of its own. Each TA context has a
474compact representation of its L1 translation table. The compact representation
475is used to initialize the thread specific L1 translation table when the TA
476context is activated.
477
478.. graphviz::
Jens Wiklander18b953f2021-05-14 13:00:21 +0200479 :align: center
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200480
481 digraph xlat_table {
482 graph [
483 rankdir = "LR"
484 ];
485 node [
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200486 shape = "ellipse"
487 ];
488 edge [
489 ];
490 "node_ttb" [
491 label = "<f0> TTBR0 | <f1> TTBR1"
492 shape = "record"
493 ];
494 "node_large_l1" [
495 label = "<f0> Large L1\nSpans 4 GiB"
496 shape = "record"
497 ];
498 "node_small_l1" [
499 label = "Small L1\nSpans 32 MiB\nper entry | <f0> 0 | <f1> 1 | ... | <fn> n"
500 shape = "record"
501 ];
502
503 "node_ttb":f0 -> "node_small_l1":f0 [ label = "Thread 0 ctx active" ];
504 "node_ttb":f0 -> "node_small_l1":f1 [ label = "Thread 1 ctx active" ];
505 "node_ttb":f0 -> "node_small_l1":fn [ label = "Thread n ctx active" ];
506 "node_ttb":f0 -> "node_large_l1" [ label="No active ctx" ];
507 "node_ttb":f1 -> "node_large_l1";
508 }
509
Jens Wiklander18b953f2021-05-14 13:00:21 +0200510Long-descriptor translation table format
511----------------------------------------
512
513Each CPU is assigned a L1 translation table which is programmed into
514Translation Table Base Register 0 (``TTBR0`` or ``TTBR0_EL1`` as
515appropriate).
516
517L1 and L2 translation tables are statically allocated and initialized at
518boot. Normally there is only one shared L2 table, but with ASLR enabled the
519virtual address space used for the shared mapping may need to use two
520tables. An unused entry in the L1 table is selected to point to the per
521thread L2 table. With ASLR configured this means that different per thread
522entry may be selected each time the system boots. Note that this entry will
523only point to a table when the per thread mapping is activated.
524
525The L2 translation tables in their turn point to L3 tables which use the
526small page granularity of 4 KiB. The shared mappings has the L3 tables
527initialized too at boot, but the per thread L3 tables are dynamic and are
528only assigned when the mapping is activated.
529
530.. graphviz::
531 :align: center
532 :caption: Example translation table setup with 4GiB virtual address space
533 with L3 tables excluded
534
535 digraph xlat_table {
536 graph [ rankdir = "LR" ];
537 node [ ];
538 edge [ ];
539
540 "ttbr0" [
541 label = "TTBR0"
542 shape = "record"
543 ];
544 "node_l1" [
545 label = "<h> Per CPU L1 table | <f0> 0 | <f1> 1 | <f2> 2 | <f3> 3"
546 shape = "record"
547 ];
548 "shared_l2_n" [
549 label = "<h> Shared L2 table n | 0 | ... | 512"
550 shape = "record"
551 ]
552 "shared_l2_m" [
553 label = "<h> Shared L2 table m | 0 | ... | 512"
554 shape = "record"
555 ]
556 "per_thread_l2" [
557 label = "<h> Per thread L2 table | 0 | ... | 512"
558 shape = "record"
559 ]
560 "ttbr0" -> "node_l1":h;
561 "node_l1":f2 -> "shared_l2_n":h;
562 "node_l1":f3 -> "shared_l2_m":h;
563 "node_l1":f0 -> "per_thread_l2":h;
564 }
565
566
Jens Wiklander03b05a02019-02-25 13:44:38 +0100567Page table cache
568================
569Page tables used to map TAs are managed with the page table cache. When the
570context of a TA is unmapped, all its page tables are released with a call
571to ``pgt_free()``. All page tables needed when mapping a TA are allocated
572using ``pgt_alloc()``.
573
574A fixed maximum number of translation tables are available in a pool. One
575thread may execute a TA which needs all or almost all tables. This can
576block TAs from being executed by other threads. To ensure that all TAs
577eventually will be permitted to execute ``pgt_alloc()`` temporarily frees
578eventual tables allocated before waiting for tables to become available.
579
580The page table cache behaves differently depending on configuration
581options.
582
583Without paging (``CFG_WITH_PAGER=n``)
584-------------------------------------
585This is the easiest configuration. All page tables are statically allocated
586in the ``.nozi.pgt_cache`` section. ``pgt_alloc()`` allocates tables from the
587free-list and ``pgt_free()`` returns the tables directly to the free-list.
588
589With paging enabled (``CFG_WITH_PAGER=y``)
590------------------------------------------
591
592Page tables are allocated as zero initialized locked pages during boot
593using ``tee_pager_alloc()``. Locked pages are populated with physical pages
594on demand from the pager. The physical page can be released when not needed
595any longer with ``tee_pager_release_phys()``.
596
597With ``CFG_WITH_LPAE=y`` each translation table has the same size as a
598physical page which makes it easy to release the physical page when the
599translation table isn't needed any longer. With the short-descriptor table
600format (``CFG_WITH_LPAE=n``) it becomes more complicated as four
601translation tables are stored in each page. Additional bookkeeping is used
602to tell when the page for used by four separate translation tables can be
603released.
604
605With paging of user TA enabled (``CFG_PAGED_USER_TA=y``)
606--------------------------------------------------------
607With paging of user TAs enabled a cache of recently used translation tables
608is used. This can save us from a storm of page faults when restoring the
609mappings of a recently unmapped TA. Which translation tables should be
610cached is indicated with reference counting by the pager on used tables.
611When a table needs to be forcefully freed
612``tee_pager_pgt_save_and_release_entries()`` is called to let the pager
613know that the table can't be used any longer.
614
615When a mapping in a TA is removed it also needs to be purged from cached
616tables with ``pgt_flush_ctx_range()`` to prevent old mappings from being
617accidentally reused.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200618
619Switching to user mode
620======================
621This section only applies with following configuration flags:
622
623 - ``CFG_WITH_LPAE=n``
624 - ``CFG_CORE_UNMAP_CORE_AT_EL0=y``
625
626When switching to user mode only a minimal kernel mode mapping is kept. This is
627achieved by selecting a zeroed out big L1 translation in TTBR1 when
628transitioning to user mode. When returning back to kernel mode the original L1
629translation table is restored in TTBR1.
630
631Switching to normal world
632=========================
633When switching to normal world either via a foreign interrupt (see
634:ref:`native_foreign_irqs` or RPC there is a chance that secure world will
635resume execution on a different CPU. This means that the new CPU need to be
636configured with the context of the currently active TA. This is solved by always
Jens Wiklanderddde3a82019-02-25 12:46:18 +0100637setting the TA context in the CPU when resuming execution.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200638
639----
640
641.. _pager:
642
643Pager
644*****
645OP-TEE currently requires >256 KiB RAM for OP-TEE kernel memory. This is not a
646problem if OP-TEE uses TrustZone protected DDR, but for security reasons OP-TEE
647may need to use TrustZone protected SRAM instead. The amount of available SRAM
648varies between platforms, from just a few KiB up to over 512 KiB. Platforms with
649just a few KiB of SRAM cannot be expected to be able to run a complete TEE
650solution in SRAM. But those with 128 to 256 KiB of SRAM can be expected to have
651a capable TEE solution in SRAM. The pager provides a solution to this by demand
652paging parts of OP-TEE using virtual memory.
653
654Secure memory
655=============
656TrustZone protected SRAM is generally considered more secure than TrustZone
657protected DRAM as there is usually more attack vectors on DRAM. The attack
658vectors are hardware dependent and can be different for different platforms.
659
660Backing store
661=============
662TrustZone protected DRAM or in some cases non-secure DRAM is used as backing
663store. The data in the backing store is integrity protected with one hash
664(SHA-256) per page (4KiB). Readonly pages are not encrypted since the OP-TEE
665binary itself is not encrypted.
666
667Partitioning of memory
668======================
669The code that handles demand paging must always be available as it would
670otherwise lead to deadlock. The virtual memory is partitioned as:
671
672 +--------------+-------------------+
673 | Type | Sections |
674 +==============+===================+
675 | unpaged | | text |
676 | | | rodata |
677 | | | data |
678 | | | bss |
679 | | | heap1 |
680 | | | nozi |
681 | | | heap2 |
682 +--------------+-------------------+
683 | init / paged | | text_init |
684 | | | rodata_init |
685 +--------------+-------------------+
686 | paged | | text_pageable |
687 | | | rodata_pageable |
688 +--------------+-------------------+
689 | demand alloc | |
690 +--------------+-------------------+
691
692Where ``nozi`` stands for "not zero initialized", this section contains entry
693stacks (thread stack when TEE pager is not enabled) and translation tables (TEE
694pager cached translation table when the pager is enabled and LPAE MMU is used).
695
696The ``init`` area is available when OP-TEE is initializing and contains
697everything that is needed to initialize the pager. After the pager has been
698initialized this area will be used for demand paged instead.
699
700The ``demand alloc`` area is a special area where the pages are allocated and
701removed from the pager on demand. Those pages are returned when OP-TEE does not
702need them any longer. The thread stacks currently belongs this area. This means
703that when a stack is not used the physical pages can be used by the pager for
704better performance.
705
706The technique to gather code in the different area is based on compiling all
707functions and data into separate sections. The unpaged text and rodata is then
708gathered by linking all object files with ``--gc-sections`` to eliminate
709sections that are outside the dependency graph of the entry functions for
710unpaged functions. A script analyzes this ELF file and generates the bits of the
711final link script. The process is repeated for init text and rodata. What is
712not "unpaged" or "init" becomes "paged".
713
714Partitioning of the binary
715==========================
716.. note::
717 The struct definitions provided in this section are explicitly covered by
718 the following dual license:
719
720 .. code-block:: none
721
722 SPDX-License-Identifier: (BSD-2-Clause OR GPL-2.0)
723
724The binary is partitioned into four parts as:
725
726
727 +----------+
728 | Binary |
729 +==========+
730 | Header |
731 +----------+
732 | Init |
733 +----------+
734 | Hashes |
735 +----------+
736 | Pageable |
737 +----------+
738
739The header is defined as:
740
741.. code-block:: c
742
743 #define OPTEE_MAGIC 0x4554504f
744 #define OPTEE_VERSION 1
745 #define OPTEE_ARCH_ARM32 0
746 #define OPTEE_ARCH_ARM64 1
747
748 struct optee_header {
749 uint32_t magic;
750 uint8_t version;
751 uint8_t arch;
752 uint16_t flags;
753 uint32_t init_size;
754 uint32_t init_load_addr_hi;
755 uint32_t init_load_addr_lo;
756 uint32_t init_mem_usage;
757 uint32_t paged_size;
758 };
759
760The header is only used by the loader of OP-TEE, not OP-TEE itself. To
761initialize OP-TEE the loader loads the complete binary into memory and copies
762what follows the header and the following ``init_size`` bytes to
763``(init_load_addr_hi << 32 | init_load_addr_lo)``. ``init_mem_usage`` is used by
764the loader to be able to check that there is enough physical memory available
765for OP-TEE to be able to initialize at all. The loader supplies in ``r0/x0`` the
766address of the first byte following what was not copied and jumps to the load
767address to start OP-TEE.
768
769In addition to overall binary with partitions inside described as above, three
770extra binaries are generated simultaneously during build process for loaders who
771support loading separate binaries:
772
773 +-----------+
774 | v2 binary |
775 +===========+
776 | Header |
777 +-----------+
778
779 +-----------+
780 | v2 binary |
781 +===========+
782 | Init |
783 +-----------+
784 | Hashes |
785 +-----------+
786
787 +-----------+
788 | v2 binary |
789 +===========+
790 | Pageable |
791 +-----------+
792
793In this case, loaders load header binary first to get image list and information
794of each image; and then load each of them into specific load address assigned in
795structure. These binaries are named with `v2` suffix to distinguish from the
796existing binaries. Header format is updated to help loaders loading binaries
797efficiently:
798
799.. code-block:: c
800
801 #define OPTEE_IMAGE_ID_PAGER 0
802 #define OPTEE_IMAGE_ID_PAGED 1
803
804 struct optee_image {
805 uint32_t load_addr_hi;
806 uint32_t load_addr_lo;
807 uint32_t image_id;
808 uint32_t size;
809 };
810
811 struct optee_header_v2 {
812 uint32_t magic;
813 uint8_t version;
814 uint8_t arch;
815 uint16_t flags;
816 uint32_t nb_images;
817 struct optee_image optee_image[];
818 };
819
820Magic number and architecture are identical as original. Version is increased to
821two. ``load_addr_hi`` and ``load_addr_lo`` may be ``0xFFFFFFFF`` for pageable
822binary since pageable part may get loaded by loader into dynamic available
823position. ``image_id`` indicates how loader handles current binary. Loaders who
824don't support separate loading just ignore all v2 binaries.
825
826Initializing the pager
827======================
828The pager is initialized as early as possible during boot in order to minimize
829the "init" area. The global variable ``tee_mm_vcore`` describes the virtual
830memory range that is covered by the level 2 translation table supplied to
831``tee_pager_init(...)``.
832
833Assign pageable areas
834---------------------
835A virtual memory range to be handled by the pager is registered with a call to
836``tee_pager_add_core_area()``.
837
838.. code-block:: c
839
840 bool tee_pager_add_area(tee_mm_entry_t *mm,
841 uint32_t flags,
842 const void *store,
843 const void *hashes);
844
845which takes a pointer to ``tee_mm_entry_t`` to tell the range, flags to tell how
846memory should be mapped (readonly, execute etc), and pointers to backing store
847and hashes of the pages.
848
849Assign physical pages
850---------------------
851Physical SRAM pages are supplied by calling ``tee_pager_add_pages(...)``
852
853.. code-block:: c
854
855 void tee_pager_add_pages(tee_vaddr_t vaddr,
856 size_t npages,
857 bool unmap);
858
859``tee_pager_add_pages(...)`` takes the physical address stored in the entry
860mapping the virtual address ``vaddr`` and ``npages`` entries after that and uses
861it to map new pages when needed. The unmap parameter tells whether the pages
862should be unmapped immediately since they does not contain initialized data or
863be kept mapped until they need to be recycled. The pages in the "init" area are
864supplied with ``unmap == false`` since those page have valid content and are in
865use.
866
867Invocation
868==========
869The pager is invoked as part of the abort handler. A pool of physical pages are
870used to map different virtual addresses. When a new virtual address needs to be
871mapped a free physical page is mapped at the new address, if a free physical
872page cannot be found the oldest physical page is selected instead. When the page
873is mapped new data is copied from backing store and the hash of the page is
874verified. If it is OK the pager returns from the exception to resume the
875execution.
876
Jens Wiklanderaecf4412019-02-26 12:33:14 +0100877Data structures
878===============
879.. figure:: ../images/core/tee_pager_area.png
880 :figclass: align-center
881
882 How the main pager data structures relates to each other
883
884``struct tee_pager_area``
885-------------------------
886This is a central data structure when handling paged
887memory ranges. It's defined as:
888
889.. code-block:: c
890
891 struct tee_pager_area {
892 struct fobj *fobj;
893 size_t fobj_pgoffs;
894 enum tee_pager_area_type type;
895 uint32_t flags;
896 vaddr_t base;
897 size_t size;
898 struct pgt *pgt;
899 TAILQ_ENTRY(tee_pager_area) link;
900 TAILQ_ENTRY(tee_pager_area) fobj_link;
901 };
902
903Where ``base`` and ``size`` tells the memory range and ``fobj`` and
904``fobj_pgoffs`` holds the content. A ``struct tee_pager_area`` can only use
905``struct fobj`` and one ``struct pgt`` (translation table) so memory ranges
906spanning multiple fobjs or pgts are split into multiple areas.
907
908``struct fobj``
909---------------
910This is a polymorph object, using different implmentations depending on how
911it's initialized. It's defines as:
912
913.. code-block:: c
914
915 struct fobj_ops {
916 void (*free)(struct fobj *fobj);
917 TEE_Result (*load_page)(struct fobj *fobj, unsigned int page_idx,
918 void *va);
919 TEE_Result (*save_page)(struct fobj *fobj, unsigned int page_idx,
920 const void *va);
921 };
922
923 struct fobj {
924 const struct fobj_ops *ops;
925 unsigned int num_pages;
926 struct refcount refc;
927 struct tee_pager_area_head areas;
928 };
929
930:``num_pages``: Tells how many pages this ``fobj`` covers.
931:``refc``: A reference counter, everyone referring to a ``fobj`` need to
932 increase and decrease this as needed.
933:``areas``: A list of areas using this ``fobj``, traversed when making
934 a virtual page unavailable.
935
936``struct tee_pager_pmem``
937-------------------------
938This structure represents a physical page. It's defined as:
939
940.. code-block:: c
941
942 struct tee_pager_pmem {
943 unsigned int flags;
944 unsigned int fobj_pgidx;
945 struct fobj *fobj;
946 void *va_alias;
947 TAILQ_ENTRY(tee_pager_pmem) link;
948 };
949
950:``PMEM_FLAG_DIRTY``: Bit is set in ``flags`` when the page is mapped
951 read/write at at least one location.
952:``PMEM_FLAG_HIDDEN``: Bit is set in ``flags`` when the page is hidden, that
953 is, not accessible anywhere.
954:``fobj_pgidx``: The page at this index in the ``fobj`` is used in this
955 physical page.
956:``fobj``: The ``fobj`` backing this page.
957:``va_alias``: Virtual address where this physical page is updated
958 when loading it from backing store or when writing it
959 back.
960
961All ``struct tee_pager_pmem`` are stored either in the global list
962``tee_pager_pmem_head`` or in ``tee_pager_lock_pmem_head``. The latter is
963used by pages which are mapped and then locked in memory on demand. The
964pages are returned back to ``tee_pager_pmem_head`` when the pages are
965exlicitly released with a call to ``tee_pager_release_phys()``.
966
967A physical page can be used by more than one ``tee_pager_area``
968simultaneously. This is also know as shared secure memory and will appear
969as such for both read-only and read-write mappings.
970
971When a page is hidden it's unmapped from all translation tables and the
972``PMEM_FLAG_HIDDEN`` bit is set, but kept in memory. When a physical page
973is released it's also unmapped from all translation tables and it's content
974is written back to storage, then the ``fobj`` field is set to ``NULL`` to
975note the physical page as unused.
976
977Note that when ``struct tee_pager_pmem`` references a ``fobj`` it doesn't
978update the reference counter since it's already guaranteed to be available
979due the ``struct tee_pager_area`` which must reference the ``fobj`` too.
980
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200981Paging of user TA
982=================
983Paging of user TAs can optionally be enabled with ``CFG_PAGED_USER_TA=y``.
984Paging of user TAs is analogous to paging of OP-TEE kernel parts but with a few
985differences:
986
987 - Read/write pages are paged in addition to read-only pages
988 - Page tables are managed dynamically
989
990``tee_pager_add_uta_area(...)`` is used to setup initial read/write mapping
991needed when populating the TA. When the TA is fully populated and relocated
992``tee_pager_set_uta_area_attr(...)`` changes the mapping of the area to strict
993permissions used when the TA is running.
994
Jens Wiklanderaecf4412019-02-26 12:33:14 +0100995Paging shared secure memory
996---------------------------
997Shared secure memory is achieved by letting several ``tee_pager_area``
998using the same backing ``fobj``. When a ``tee_pager_area`` is allocated and
999assigned a ``fobj`` it's also added to a list for ``tee_pager_areas`` using
1000this ``fobj``. This helps when a physical page is released.
1001
1002When a fault occurs first a matching ``tee_pager_area`` is located. Then
1003``tee_pager_pmem_head`` is searched to see if a physical page already holds
1004the page of the ``fobj`` needed. If so the ``pgt`` is updated to map the
1005physical page at the appropriate locatation. If no physical page was holding
1006the page a new physical page is allocated, initialized and finally mapped.
1007
1008In order to make as few updates to mappings as possible changes to less
1009restricted, no access -> read-only or read-only to read-write, is done only
1010for the virtual address was used when the page fault occurred. Changes in
1011the other direction has to be done in all translation tables used to map
1012the physical page.
1013
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001014----
1015
1016.. _stacks:
1017
1018Stacks
1019******
1020Different stacks are used during different stages. The stacks are:
1021
1022 - **Secure monitor stack** (128 bytes), bound to the CPU. Only available if
1023 OP-TEE is compiled with a secure monitor always the case if the target is
1024 Armv7-A but never for Armv8-A.
1025
1026 - **Temp stack** (small ~1KB), bound to the CPU. Used when transitioning
1027 from one state to another. Interrupts are always disabled when using this
1028 stack, aborts are fatal when using the temp stack.
1029
1030 - **Abort stack** (medium ~2KB), bound to the CPU. Used when trapping a data
1031 or pre-fetch abort. Aborts from user space are never fatal the TA is only
1032 killed. Aborts from kernel mode are used by the pager to do the demand
1033 paging, if pager is disabled all kernel mode aborts are fatal.
1034
1035 - **Thread stack** (large ~8KB), not bound to the CPU instead used by the
1036 current thread/task. Interrupts are usually enabled when using this stack.
1037
1038Notes for Armv7-A/AArch32
1039 .. list-table::
1040 :header-rows: 1
1041 :widths: 1 5
1042
1043 * - Stack
1044 - Comment
1045
1046 * - Temp
1047 - Assigned to ``SP_SVC`` during entry/exit, always assigned to
1048 ``SP_IRQ`` and ``SP_FIQ``
1049
1050 * - Abort
1051 - Always assigned to ``SP_ABT``
1052
1053 * - Thread
1054 - Assigned to ``SP_SVC`` while a thread is active
1055
1056Notes for AArch64
1057 There are only two stack pointers, ``SP_EL1`` and ``SP_EL0``, available for
1058 OP-TEE in AArch64. When an exception is received stack pointer is always
1059 ``SP_EL1`` which is used temporarily while assigning an appropriate stack
1060 pointer for ``SP_EL0``. ``SP_EL1`` is always assigned the value of
1061 ``thread_core_local[cpu_id]``. This structure has some spare space for
1062 temporary storage of registers and also keeps the relevant stack pointers.
1063 In general when we talk about assigning a stack pointer to the CPU below we
1064 mean ``SP_EL0``.
1065
1066Boot
1067====
1068During early boot the CPU is configured with the temp stack which is used until
1069OP-TEE exits to normal world the first time.
1070
1071Notes for AArch64
1072 ``SPSEL`` is always ``0`` on entry/exit to have ``SP_EL0`` acting as stack
1073 pointer.
1074
1075Normal entry
1076============
1077Each time OP-TEE is entered from normal world the temp stack is used as the
1078initial stack. For fast calls, this is the only stack used. For normal calls an
1079empty thread slot is selected and the CPU switches to that stack.
1080
1081Normal exit
1082===========
1083Normal exit occurs when a thread has finished its task and the thread is freed.
1084When the main thread function, ``tee_entry_std(...)``, returns interrupts are
1085disabled and the CPU switches to the temp stack instead. The thread is freed and
1086OP-TEE exits to normal world.
1087
1088RPC exit
1089========
1090RPC exit occurs when OP-TEE need some service from normal world. RPC can
1091currently only be performed with a thread is in running state. RPC is initiated
1092with a call to ``thread_rpc(...)`` which saves the state in a way that when the
1093thread is restored it will continue at the next instruction as if this function
1094did a normal return. CPU switches to use the temp stack before returning to
1095normal world.
1096
1097Foreign interrupt exit
1098======================
1099Foreign interrupt exit occurs when OP-TEE receives a foreign interrupt. For Arm
1100GICv2 mode, foreign interrupt is sent as IRQ which is always handled in normal
1101world. Foreign interrupt exit is similar to RPC exit but it is
1102``thread_irq_handler(...)`` and ``elx_irq(...)`` (respectively for
1103Armv7-A/Aarch32 and for Aarch64) that saves the thread state instead. The thread
1104is resumed in the same way though. For Arm GICv3 mode, foreign interrupt is sent
1105as FIQ which could be handled by either secure world (EL3 in AArch64) or normal
1106world. This mode is not supported yet.
1107
1108Notes for Armv7-A/AArch32
1109 SP_IRQ is initialized to temp stack instead of a separate stack. Prior to
1110 exiting to normal world CPU state is changed to SVC and temp stack is
1111 selected.
1112
1113Notes for AArch64
1114 ``SP_EL0`` is assigned temp stack and is selected during IRQ processing. The
1115 original ``SP_EL0`` is saved in the thread context to be restored when
1116 resuming.
1117
1118Resume entry
1119============
1120OP-TEE is entered using the temp stack in the same way as for normal entry. The
1121thread to resume is looked up and the state is restored to resume execution. The
1122procedure to resume from an RPC exit or an foreign interrupt exit is exactly the
1123same.
1124
1125Syscall
1126=======
1127Syscall's are executed using the thread stack.
1128
1129Notes for Armv7-A/AArch32
1130 Nothing special ``SP_SVC`` is already set with thread stack.
1131
1132Notes for syscall AArch64
1133 Early in the exception processing the original ``SP_EL0`` is saved in
1134 ``struct thread_svc_regs`` in case the TA is executed in AArch64. Current
1135 thread stack is assigned to ``SP_EL0`` which is then selected. When
1136 returning ``SP_EL0`` is assigned what is in ``struct thread_svc_regs``. This
1137 allows ``tee_svc_sys_return_helper(...)`` having the syscall exception
1138 handler return directly to ``thread_unwind_user_mode(...)``.
1139
1140----
1141
1142.. _shared_memory:
1143
1144Shared Memory
1145*************
1146Shared Memory is a block of memory that is shared between the non-secure and the
1147secure world. It is used to transfer data between both worlds.
1148
Etienne Carriere9c600252019-03-11 11:01:48 +01001149The shared memory is allocated and managed by the non-secure world, i.e. the
1150Linux OP-TEE driver. Secure world only considers the individual shared buffers,
1151not their pool. Each shared memory is referenced with associated attributes:
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001152
Etienne Carriere9c600252019-03-11 11:01:48 +01001153 - Buffer start address and byte size,
1154 - Cache attributes of the shared memory buffer,
1155 - List of chunks if mapped from noncontiguous pages.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001156
Etienne Carriere9c600252019-03-11 11:01:48 +01001157Shared memory buffer references manipulated must fit inside one of the
1158shared memory areas known from the OP-TEE core. OP-TEE supports two kinds
Jerome Forissier7fa91cf2020-07-30 16:03:59 +02001159of shared memory areas: an area for contiguous buffers and an area for
1160noncontiguous buffers. At least one has to be enabled.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001161
Etienne Carriere9c600252019-03-11 11:01:48 +01001162Contiguous shared buffers
1163=========================
Etienne Carriere9c600252019-03-11 11:01:48 +01001164Configuration directives ``CFG_SHMEM_START`` and ``CFG_SHMEM_SIZE``
1165define a share memory area where shared memory buffers are contiguous.
1166Generic memory layout registers it as the ``MEM_AREA_NSEC_SHM`` memory area.
1167
1168The non-secure world issues ``OPTEE_SMC_GET_SHM_CONFIG`` to retrieve contiguous
1169shared memory area configuration:
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001170
1171 - Physical address of the start of the pool
1172 - Size of the pool
1173 - Whether or not the memory is cached
1174
Jens Wiklandera70d2f42019-04-25 12:40:49 +02001175Contiguous shared memory (also known as static or reserved shared memory)
1176is enabled with the configuration flag ``CFG_CORE_RESERVED_SHM=y``.
1177
Etienne Carriere9c600252019-03-11 11:01:48 +01001178Noncontiguous shared buffers
1179============================
Etienne Carriere9c600252019-03-11 11:01:48 +01001180To benefit from noncontiguous shared memory buffers, secure world register
1181dynamic shared memory areas and non-secure world must register noncontiguous
1182buffers prior to referring to them using the OP-TEE API.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001183
Etienne Carriere9c600252019-03-11 11:01:48 +01001184The OP-TEE core generic boot sequence discovers dynamic shared areas from the
1185device tree and/or areas explicitly registered by the platform.
1186
1187Non-secure side needs to register buffers as 4kByte chunks lists into OP-TEE
1188core using the ``OPTEE_MSG_CMD_REGISTER_SHM`` API prior referencing to them
1189using the OP-TEE invocation API.
1190
Jens Wiklandera70d2f42019-04-25 12:40:49 +02001191Noncontiguous shared memory (also known as dynamic shared memory) is
1192enabled with the configuration flag ``CFG_CORE_DYN_SHM=y``.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001193
Jerome Forissier7fa91cf2020-07-30 16:03:59 +02001194For performance reasons, the TEE Client Library (``libteec``) uses
1195noncontiguous shared memory when available since it avoids copies in some
1196situations.
1197
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001198Shared Memory Chunk Allocation
1199==============================
1200It is the Linux kernel driver for OP-TEE that is responsible for allocating
1201chunks of shared memory. OP-TEE linux kernel driver relies on linux kernel
1202generic allocation support (``CONFIG_GENERIC_ALLOCATION``) to allocation/release
1203of shared memory physical chunks. OP-TEE linux kernel driver relies on linux
1204kernel dma-buf support (``CONFIG_DMA_SHARED_BUFFER``) to track shared memory
1205buffers references.
1206
1207Using shared memory
1208===================
1209From the Client Application
1210 The client application can ask for shared memory allocation using the
1211 GlobalPlatform Client API function ``TEEC_AllocateSharedMemory(...)``. The
Etienne Carriere9c600252019-03-11 11:01:48 +01001212 client application can also register a memory through the GlobalPlatform
1213 Client API function ``TEEC_RegisterSharedMemory(...)``. The shared memory
1214 reference can then be used as parameter when invoking a trusted application.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001215
1216From the Linux Driver
1217 Occasionally the Linux kernel driver needs to allocate shared memory for the
1218 communication with secure world, for example when using buffers of type
1219 ``TEEC_TempMemoryReference``.
1220
1221From OP-TEE core
1222 In case OP-TEE core needs information from TEE supplicant (dynamic TA
1223 loading, REE time request,...), shared memory must be allocated. Allocation
1224 depends on the use case. OP-TEE core asks for the following shared memory
1225 allocation:
1226
1227 - ``optee_msg_arg`` structure, used to pass the arguments to the
1228 non-secure world, where the allocation will be done by sending a
1229 ``OPTEE_SMC_RPC_FUNC_ALLOC`` message.
1230
1231 - In some cases, a payload might be needed for storing the result from
1232 TEE supplicant, for example when loading a Trusted Application. This
1233 type of allocation will be done by sending the message
1234 ``OPTEE_MSG_RPC_CMD_SHM_ALLOC(OPTEE_MSG_RPC_SHM_TYPE_APPL,...)``,
1235 which then will return:
1236
1237 - the physical address of the shared memory
1238 - a handle to the memory, that later on will be used later on when
1239 freeing this memory.
1240
1241From TEE Supplicant
1242 TEE supplicant is also working with shared memory, used to exchange data
1243 between normal and secure worlds. TEE supplicant receives a memory address
1244 from the OP-TEE core, used to store the data. This is for example the case
1245 when a Trusted Application is loaded. In this case, TEE supplicant must
1246 register the provided shared memory in the same way a client application
1247 would do, involving the Linux driver.
1248
1249----
1250
1251.. _smc:
1252
1253SMC
1254***
1255SMC Interface
1256=============
1257OP-TEE's SMC interface is defined in two levels using optee_smc.h_ and
1258optee_msg.h_. The former file defines SMC identifiers and what is passed in the
1259registers for each SMC. The latter file defines the OP-TEE Message protocol
1260which is not restricted to only SMC even if that currently is the only option
1261available.
1262
1263SMC communication
1264=================
1265The main structure used for the SMC communication is defined in ``struct
1266optee_msg_arg`` (in optee_msg.h_). If we are looking into the source code, we
1267could see that communication mainly is achieved using ``optee_msg_arg`` and
1268``thread_smc_args`` (in thread.h_), where ``optee_msg_arg`` could be seen as the
1269main structure. What will happen is that the :ref:`linux_kernel` driver will get
1270the parameters either from :ref:`optee_client` or directly from an internal
1271service in Linux kernel. The TEE driver will populate the struct
1272``optee_msg_arg`` with the parameters plus some additional bookkeeping
1273information. Parameters for the SMC are passed in registers 1 to 7, register 0
1274holds the SMC id which among other things tells whether it is a standard or a
1275fast call.
1276
1277----
1278
1279.. _thread_handling:
1280
1281Thread handling
1282***************
1283OP-TEE core uses a couple of threads to be able to support running jobs in
1284parallel (not fully enabled!). There are handlers for different purposes. In
1285thread.c_ you will find a function called ``thread_init_primary(...)`` which
1286assigns ``init_handlers`` (functions) that should be called when OP-TEE core
1287receives standard or fast calls, FIQ and PSCI calls. There are default handlers
1288for these services, but the platform can decide if they want to implement their
1289own platform specific handlers instead.
1290
1291Synchronization primitives
1292==========================
1293OP-TEE has three primitives for synchronization of threads and CPUs:
1294*spin-lock*, *mutex*, and *condvar*.
1295
1296Spin-lock
1297 A spin-lock is represented as an ``unsigned int``. This is the most
1298 primitive lock. Interrupts should be disabled before attempting to take a
1299 spin-lock and should remain disabled until the lock is released. A spin-lock
1300 is initialized with ``SPINLOCK_UNLOCK``.
1301
1302 .. list-table:: Spin lock functions
1303 :header-rows: 1
1304 :widths: 1 5
1305
1306 * - Function
1307 - Purpose
1308
1309 * - ``cpu_spin_lock(...)``
1310 - Locks a spin-lock
1311
1312 * - ``cpu_spin_trylock(...)``
1313 - Locks a spin-lock if unlocked and returns ``0`` else the spin-lock
1314 is unchanged and the function returns ``!0``
1315
1316 * - ``cpu_spin_unlock(...)``
1317 - Unlocks a spin-lock
1318
1319Mutex
1320 A mutex is represented by ``struct mutex``. A mutex can be locked and
1321 unlocked with interrupts enabled or disabled, but only from a normal thread.
1322 A mutex cannot be used in an interrupt handler, abort handler or before a
1323 thread has been selected for the CPU. A mutex is initialized with either
1324 ``MUTEX_INITIALIZER`` or ``mutex_init(...)``.
1325
1326 .. list-table:: Mutex functions
1327 :header-rows: 1
1328 :widths: 1 5
1329
1330 * - Function
1331 - Purpose
1332
1333 * - ``mutex_lock(...)``
1334 - Locks a mutex. If the mutex is unlocked this is a fast operation,
1335 else the function issues an RPC to wait in normal world.
1336
1337 * - ``mutex_unlock(...)``
1338 - Unlocks a mutex. If there is no waiters this is a fast operation,
1339 else the function issues an RPC to wake up a waiter in normal world.
1340
1341 * - ``mutex_trylock(...)``
1342 - Locks a mutex if unlocked and returns ``true`` else the mutex is
1343 unchanged and the function returns ``false``.
1344
1345 * - ``mutex_destroy(...)``
1346 - Asserts that the mutex is unlocked and there is no waiters, after
1347 this the memory used by the mutex can be freed.
1348
1349 When a mutex is locked it is owned by the thread calling ``mutex_lock(...)``
1350 or ``mutex_trylock(...)``, the mutex may only be unlocked by the thread
1351 owning the mutex. A thread should not exit to TA user space when holding a
1352 mutex.
1353
1354Condvar
1355 A condvar is represented by ``struct condvar``. A condvar is similar to a
1356 ``pthread_condvar_t`` in the pthreads standard, only less advanced.
1357 Condition variables are used to wait for some condition to be fulfilled and
1358 are always used together a mutex. Once a condition variable has been used
1359 together with a certain mutex, it must only be used with that mutex until
1360 destroyed. A condvar is initialized with ``CONDVAR_INITIALIZER`` or
1361 ``condvar_init(...)``.
1362
1363 .. list-table:: Condvar functions
1364 :header-rows: 1
1365 :widths: 1 5
1366
1367 * - Function
1368 - Purpose
1369
1370 * - ``condvar_wait(...)``
1371 - Atomically unlocks the supplied mutex and waits in normal world via
1372 an RPC for the condition variable to be signaled, when the function
1373 returns the mutex is locked again.
1374
1375 * - ``condvar_signal(...)``
1376 - Wakes up one waiter of the condition variable (waiting in
1377 ``condvar_wait(...)``).
1378
1379 * - ``condvar_broadcast(...)``
1380 - Wake up all waiters of the condition variable.
1381
1382 The caller of ``condvar_signal(...)`` or ``condvar_broadcast(...)`` should
1383 hold the mutex associated with the condition variable to guarantee that a
1384 waiter does not miss the signal.
1385
1386.. _core/arch/arm/kernel/thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1387.. _optee_msg.h: https://github.com/OP-TEE/optee_os/blob/master/core/include/optee_msg.h
1388.. _optee_smc.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/sm/optee_smc.h
1389.. _thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1390.. _thread.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/kernel/thread.h
1391
1392.. _ARM_DEN0028A_SMC_Calling_Convention: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
1393.. _Cortex-A53 TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf
1394.. _drivers/tee/optee: https://github.com/torvalds/linux/tree/master/drivers/tee/optee
1395.. _Trusted Firmware A: https://github.com/ARM-software/arm-trusted-firmware