blob: e98e74bf19939f0613248814fe437416cca5b67a [file] [log] [blame]
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001.. _core:
2
3####
4Core
5####
6
7.. _interrupt_handling:
8
9Interrupt handling
10******************
11This section describes how :ref:`optee_os` handles switches of world execution
Joakim Bech1e506862019-06-24 10:00:51 +020012context based on :ref:`SMC` exceptions and interrupt notifications. Interrupt
Joakim Bech8e5c5b32018-10-25 08:18:32 +020013notifications are IRQ/FIQ exceptions which may also imply switching of world
14execution context: normal world to secure world, or secure world to normal
15world.
16
17Use cases of world context switch
18=================================
Jens Wiklander01d61a92021-05-12 14:56:22 +020019This section lists all the cases where OP-TEE OS is involved in world context
Joakim Bech8e5c5b32018-10-25 08:18:32 +020020switches. Optee_os executes in the secure world. World switch is done by the
Joakim Becheb397802019-09-13 11:45:06 +020021core's secure monitor level/mode, referred below as the Monitor.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020022
23When the normal world invokes the secure world, the normal world executes a SMC
24instruction. The SMC exception is always trapped by the Monitor. If the related
Jens Wiklander01d61a92021-05-12 14:56:22 +020025service targets the trusted OS, the Monitor will switch to OP-TEE OS world
26execution. When the secure world returns to the normal world, OP-TEE OS executes
Joakim Bech8e5c5b32018-10-25 08:18:32 +020027a SMC that is caught by the Monitor which switches back to the normal world.
28
Jens Wiklander01d61a92021-05-12 14:56:22 +020029When a secure interrupt is signaled by the Arm GIC, it shall reach the OP-TEE OS
30interrupt exception vector. If the secure world is executing, OP-TEE OS will
31handle interrupt straight from its exception vector. If the normal world is
Joakim Bech8e5c5b32018-10-25 08:18:32 +020032executing when the secure interrupt raises, the Monitor vector must handle the
Jens Wiklander01d61a92021-05-12 14:56:22 +020033exception and invoke OP-TEE OS to serve the interrupt.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020034
35When a non-secure interrupt is signaled by the Arm GIC, it shall reach the
36normal world interrupt exception vector. If the normal world is executing, it
37will handle straight the exception from its exception vector. If the secure
Jens Wiklander01d61a92021-05-12 14:56:22 +020038world is executing when the non-secure interrupt raises, OP-TEE OS will
Joakim Bech8e5c5b32018-10-25 08:18:32 +020039temporarily return back to normal world via the Monitor to let normal world
40serve the interrupt.
41
42Core exception vectors
43======================
44Monitor vector is ``VBAR_EL3`` in AArch64 and ``MVBAR`` in Armv7-A/AArch32.
45Monitor can be reached while normal world or secure world is executing. The
46executing secure state is known to the Monitor through the ``SCR_NS``.
47
48Monitor can be reached from a SMC exception, an IRQ or FIQ exception (so-called
49interrupts) and from asynchronous aborts. Obviously monitor aborts (data,
50prefetch, undef) are local to the Monitor execution.
51
Jens Wiklander01d61a92021-05-12 14:56:22 +020052The Monitor can be external to OP-TEE OS (case ``CFG_WITH_ARM_TRUSTED_FW=y``).
Joakim Bech8e5c5b32018-10-25 08:18:32 +020053If not, provides a local secure monitor ``core/arch/arm/sm``. Armv7-A platforms
Jens Wiklander01d61a92021-05-12 14:56:22 +020054should use the OP-TEE OS secure monitor. Armv8-A platforms are likely to rely on
Joakim Bech8e5c5b32018-10-25 08:18:32 +020055an `Trusted Firmware A`_.
56
57When executing outside the Monitor, the system is executing either in the
58normal world (``SCR_NS=1``) or in the secure world (``SCR_NS=0``). Each world
59owns its own exception vector table (state vector):
60
61 - ``VBAR_EL2`` or ``VBAR_EL1`` non-secure or ``VBAR_EL1`` secure for
62 AArch64.
63 - ``HVBAR`` or ``VBAR`` non-secure or ``VBAR`` secure for Armv7-A and
64 AArch32.
65
66All SMC exceptions are trapped in the Monitor vector. IRQ/FIQ exceptions can be
67trapped either in the Monitor vector or in the state vector of the executing
68world.
69
70When the normal world is executing, the system is configured to route:
71
Jens Wiklander01d61a92021-05-12 14:56:22 +020072 - secure interrupts to the Monitor that will forward to OP-TEE OS
Joakim Bech8e5c5b32018-10-25 08:18:32 +020073 - non-secure interrupts to the executing world exception vector.
74
75When the secure world is executing, the system is configured to route:
76
Jens Wiklander01d61a92021-05-12 14:56:22 +020077 - secure and non-secure interrupts to the executing OP-TEE OS exception
78 vector. OP-TEE OS shall forward the non-secure interrupts to the normal
Joakim Bech8e5c5b32018-10-25 08:18:32 +020079 world.
80
81Optee_os non-secure interrupts are always trapped in the state vector of the
82executing world. This is reflected by a static value of ``SCR_(IRQ|FIQ)``.
83
84.. _native_foreign_irqs:
85
86Native and foreign interrupts
87=============================
Jens Wiklander01d61a92021-05-12 14:56:22 +020088Two types of interrupt are defined from OP-TEE OS point of view.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020089
Jens Wiklander01d61a92021-05-12 14:56:22 +020090 - **Native interrupt** - The interrupt handled by OP-TEE OS, secure
91 interrupts targetting S-EL1 or secure privileged mode
92 - **Foreign interrupt** - The interrupt not handled by OP-TEE OS, non-secure
93 interrupts targetting normal world or secure interrupts targetting EL3.
Joakim Bech8e5c5b32018-10-25 08:18:32 +020094
Jens Wiklander01d61a92021-05-12 14:56:22 +020095For Arm **GICv2** mode, a native interrupt is signalled with a FIQ and a
96foreign interrupt is signalled with an IRQ. For Arm **GICv3** mode, a
97foreign interrupts is signalled as a FIQ which could be handled by either
98secure world (aarch32 Monitor mode or aarch64 EL3) or normal world.
99
100Arm GICv3 mode can be enabled by setting ``CFG_ARM_GICV3=y``.
101Native interrupts must be securely routed to OP-TEE OS. Foreign interrupts, when
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200102trapped during secure world execution might need to be efficiently routed to
103the normal world.
104
Jens Wiklander01d61a92021-05-12 14:56:22 +0200105IRQ and FIQ keeps their meaning in normal world so for clarity we will keep
106using those names in the normal world context.
107
108Normal World invokes OP-TEE OS using SMC
109========================================
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200110
111**Entering the Secure Monitor**
112
113The monitor manages all entries and exits of secure world. To enter secure
114world from normal world the monitor saves the state of normal world (general
115purpose registers and system registers which are not banked) and restores the
116previous state of secure world. Then a return from exception is performed and
117the restored secure state is resumed. Exit from secure world to normal world is
118the reverse.
119
120Some general purpose registers are not saved and restored on entry and exit,
121those are used to pass parameters between secure and normal world (see
122ARM_DEN0028A_SMC_Calling_Convention_ for details).
123
124**Entry and exit of Trusted OS**
125
126On entry and exit of Trusted OS each CPU is uses a separate entry stack and runs
Jens Wiklander01d61a92021-05-12 14:56:22 +0200127with IRQ and FIQ masked. SMCs are categorised in two flavors: **fast** and
128**yielding**.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200129
Jens Wiklander01d61a92021-05-12 14:56:22 +0200130 - For **fast** SMCs, OP-TEE OS will execute on the entry stack with IRQ/FIQ
131 masked until the execution returns to normal world.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200132
Jens Wiklander01d61a92021-05-12 14:56:22 +0200133 - For **yielding** SMCs, OP-TEE OS will at some point execute the requested
134 service with interrupts unmasked. In order to handle interrupts, mainly
135 forwarding of foreign interrupts, OP-TEE OS assigns a trusted thread
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200136 (`core/arch/arm/kernel/thread.c`_) to the SMC request. The trusted thread
137 stores the execution context of the requested service. This context can be
138 suspended and resumed as the requested service executes and is
139 interrupted. The trusted thread is released only once the service
140 execution returns with a completion status.
141
Jens Wiklander01d61a92021-05-12 14:56:22 +0200142 For **yielding** SMCs, OP-TEE OS allocates or resumes a trusted thread
143 then unmasks the IRQ and FIQ lines. When the OP-TEE OS needs to invoke the
144 normal world from a foreign interrupt or a remote service call, OP-TEE OS
145 masks IRQ and FIQ and suspends the trusted thread. When suspending,
146 OP-TEE OS gets back to the entry stack.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200147
Jens Wiklander01d61a92021-05-12 14:56:22 +0200148 - **Both** fast and yielding SMCs end on the entry stack with IRQ and
149 FIQ masked and OP-TEE OS invokes the Monitor through a SMC to return
150 to the normal world.
Jens Wiklander2c39d742021-05-10 16:02:08 +0200151
152.. uml::
153 :align: center
154 :caption: SMC entry to secure world
155
156 participant "Normal World" as nwd
157 participant "Secure Monitor" as smon
158 participant "OP-TEE OS entry" as entry
159 participant "OP-TEE OS" as optee
160 == IRQ and FIQ unmasked ==
161 nwd -> smon : smc: TEE_FUNC_INVOKE
162 smon -> smon : Save non-secure context
163 smon -> smon : Restore secure context
164 smon --> entry : eret: TEE_FUNC_INVOKE
165 entry -> entry : assign thread
166 entry -> optee : TEE_FUNC_INVOKE
167 == IRQ and FIQ unmasked ==
168 optee -> optee : process
169 == IRQ and FIQ masked ==
170 optee --> entry : SMC_CALL_RETURN
171 entry -> smon : smc: SMC_CALL_RETURN
172 smon -> smon : Save secure context
173 smon -> smon : Restore non-secure context
174 == IRQ and FIQ unmasked ==
175 smon --> nwd : eret: return
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200176
177Deliver non-secure interrupts to Normal World
178=============================================
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200179
180**Forward a Foreign Interrupt from Secure World to Normal World**
181
Jens Wiklander01d61a92021-05-12 14:56:22 +0200182When a foreign interrupt is received in secure world as an IRQ or FIQ
183exception then secure world:
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200184
185 1. Saves trusted thread context (entire state of all processor modes for
186 Armv7-A)
187
Jens Wiklander01d61a92021-05-12 14:56:22 +0200188 2. Masks all interrupts (IRQ and FIQ)
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200189
190 3. Switches to entry stack
191
192 4. Issues an SMC with a value to indicates to normal world that an IRQ has
Jens Wiklander01d61a92021-05-12 14:56:22 +0200193 been detected and last SMC call should be continued
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200194
195The monitor restores normal world context with a return code indicating that an
196IRQ is about to be delivered. Normal world issues a new SMC indicating that it
197should continue last SMC.
198
Jens Wiklander01d61a92021-05-12 14:56:22 +0200199The monitor restores secure world context which locates the previously
200saved context and checks that it is a return from a foreign interrupt that
201is requested before restoring the context and lets the secure world foreign
202interrupt handler return from exception where the execution would be
203resumed.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200204
Jens Wiklander01d61a92021-05-12 14:56:22 +0200205Note that the monitor itself does not know or care that it has just forwarded
206a foreign interrupt to normal world. The bookkeeping is done in the trusted
207thread handling in OP-TEE OS. Normal world is responsible to decide when
208the secure world thread should resume execution (for details, see
209:ref:`thread_handling`).
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200210
Jens Wiklander01d61a92021-05-12 14:56:22 +0200211.. uml::
212 :align: center
213 :caption: Foreign interrupt received in secure world and forwarded to
214 normal world
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200215
Jens Wiklander01d61a92021-05-12 14:56:22 +0200216 participant "Normal World" as nwd
217 participant "Secure Monitor" as smon
218 participant "OP-TEE OS entry" as entry
219 participant "OP-TEE OS" as optee
220 == IRQ and FIQ unmasked ==
221 optee -> optee : process
222 == IRQ and FIQ unmasked,\nForeign interrupt received ==
223 optee -> optee : suspend thread
224 optee -> entry : forward foreign interrupt
225 entry -> smon : smc: forward foreign interrupt
226 smon -> smon: Save secure context
227 smon -> smon: Restore non-secure context
228 == IRQ and FIQ unmasked ==
229 smon --> nwd : eret: IRQ forwarded
230 == FIQ unmasked, IRQ received ==
231 nwd -> nwd : process IRQ
232 == IRQ and FIQ unmasked ==
233 nwd -> smon : smc: return from IRQ
234 == IRQ and FIQ masked ==
235 smon -> smon : Save non-secure context
236 smon -> smon : Restore secure context
237 smon --> entry : eret: return from foreign interrupt
238 entry -> entry : find thread
239 entry --> optee : resume execution
240 == IRQ and FIQ unmasked ==
241 optee -> optee : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200242
Jens Wiklander01d61a92021-05-12 14:56:22 +0200243**Deliver a foreign interrupt to normal world when ``SCR_NS`` is set**
244
245Since ``SCR_IRQ`` is cleared, an IRQ will be delivered using the exception
246vector (``VBAR``) in the normal world. The IRQ is received as any other
247exception by normal world, the monitor and the OP-TEE OS are not involved
248at all.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200249
250Deliver secure interrupts to Secure World
251=========================================
Jens Wiklander01d61a92021-05-12 14:56:22 +0200252A secure (foreign) interrupt can be received during two different states,
253either in normal world (``SCR_NS`` is set) or in secure world (``SCR_NS``
254is cleared). When the secure monitor is active (Armv8-A EL3 or Armv7-A
255Monitor mode) FIQ and IRQ are masked. FIQ reception in the two different
256states is described below.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200257
Jens Wiklander01d61a92021-05-12 14:56:22 +0200258**Deliver secure interrupt to secure world when SCR_NS is set**
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200259
Jens Wiklander01d61a92021-05-12 14:56:22 +0200260When the monitor traps a secure interrupt it:
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200261
262 1. Saves normal world context and restores secure world context from last
263 secure world exit (which will have IRQ and FIQ blocked)
264 2. Clears ``SCR_FIQ`` when clearing ``SCR_NS``
Jens Wiklander01d61a92021-05-12 14:56:22 +0200265 3. Does a return from exception into OP-TEE OS via the secure interrupt
266 entry point
267 4. OP-TEE OS handles the native interrupt directly in the entry point
268 5. OP-TEE OS issues an SMC to return to normal world
269 6. The monitor saves the secure world context and restores the normal world context
270 7. Does a return from exception into the restored context
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200271
Jens Wiklander01d61a92021-05-12 14:56:22 +0200272.. uml::
273 :align: center
274 :caption: Secure interrupt received when SCR_NS is set
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200275
Jens Wiklander01d61a92021-05-12 14:56:22 +0200276 participant "Normal World" as nwd
277 participant "Secure Monitor" as smon
278 participant "OP-TEE OS entry" as entry
279 participant "OP-TEE OS" as optee
280 == IRQ and FIQ unmasked ==
281 == Running in non-secure world (SCR_NS set) ==
282 nwd -> nwd : process
283 == IRQ and FIQ masked,\nSecure interrupt received ==
284 smon -> smon : Save non-secure context
285 smon -> smon : Restore secure context
286 smon --> entry : eret: native interrupt entry point
287 entry -> entry: process received native interrupt
288 entry -> smon: smc: return
289 smon -> smon : Save secure context
290 smon -> smon : Restore non-secure context
291 smon --> nwd : eret: return to Normal world
292 == IRQ and FIQ unmasked ==
293 nwd -> nwd : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200294
295**Deliver FIQ to secure world when SCR_NS is cleared**
296
Jens Wiklander01d61a92021-05-12 14:56:22 +0200297.. uml::
298 :align: center
299 :caption: FIQ received while processing an IRQ forwarded from secure world
300
301 participant "Normal World" as nwd
302 participant "Secure Monitor" as smon
303 participant "OP-TEE OS entry" as entry
304 participant "OP-TEE OS" as optee
305 == IRQ and FIQ unmasked ==
306 optee -> optee : process
307 == IRQ and FIQ unmasked,\nForeign interrupt received ==
308 optee -> optee : suspend thread
309 optee -> entry : forward foreign interrupt
310 entry -> smon : smc: forward foreign interrupt
311 smon -> smon: Save secure context
312 smon -> smon: Restore non-secure context
313 == IRQ and FIQ unmasked ==
314 smon --> nwd : eret: IRQ forwarded
315 == FIQ unmasked, IRQ received ==
316 nwd -> nwd : process IRQ
317 == IRQ and FIQ masked,\nSecure interrupt received ==
318 smon -> smon : Save non-secure context
319 smon -> smon : Restore secure context
320 smon --> entry : eret: native interrupt entry point
321 entry -> entry : process received native interrupt
322 entry -> smon: smc: return
323 smon -> smon : Save secure context
324 smon -> smon : Restore non-secure context
325 smon --> nwd : eret: return to Normal world
326 == FIQ unmasked\nIRQ still being processed ==
327 nwd -> nwd : process IRQ
328 == IRQ and FIQ unmasked ==
329 nwd -> smon : smc: return from IRQ
330 == IRQ and FIQ masked ==
331 smon -> smon : Save non-secure context
332 smon -> smon : Restore secure context
333 smon --> entry : eret: return from foreign interrupt
334 entry -> entry : find thread
335 entry --> optee : resume execution
336 == IRQ and FIQ unmasked ==
337 optee -> optee : process
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200338
339Trusted thread scheduling
340=========================
341**Trusted thread for standard services**
342
Jens Wiklander01d61a92021-05-12 14:56:22 +0200343OP-TEE yielding services are carried through standard SMC. Execution of these
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200344services can be interrupted by foreign interrupts. To suspend and restore the
Jens Wiklander01d61a92021-05-12 14:56:22 +0200345service execution, optee_os assigns a trusted thread at yielding SMC entry.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200346
347The trusted thread terminates when optee_os returns to the normal world with a
348service completion status.
349
350A trusted thread execution can be interrupted by a native interrupt. In this
351case the native interrupt is handled by the interrupt exception handlers and
352once served, optee_os returns to the execution trusted thread.
353
354A trusted thread execution can be interrupted by a foreign interrupt. In this
355case, optee_os suspends the trusted thread and invokes the normal world through
356the Monitor (optee_os so-called RPC services). The trusted threads will resume
357only once normal world invokes the optee_os with the RPC service status.
358
359A trusted thread execution can lead optee_os to invoke a service in normal
360world: access a file, get the REE current time, etc. The trusted thread is
Jens Wiklander01d61a92021-05-12 14:56:22 +0200361first suspended then resumed during remote service execution.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200362
363**Scheduling considerations**
364
365When a trusted thread is interrupted by a foreign interrupt and when optee_os
366invokes a normal world service, the normal world gets the opportunity to
367reschedule the running applications. The trusted thread will resume only once
368the client application is scheduled back. Thus, a trusted thread execution
369follows the scheduling of the normal world caller context.
370
371Optee_os does not implement any thread scheduling. Each trusted thread is
372expected to track a service that is invoked from the normal world and should
373return to it with an execution status.
374
375The OP-TEE Linux driver (as implemented in `drivers/tee/optee`_ since Linux
376kernel 4.12) is designed so that the Linux thread invoking OP-TEE gets assigned
377a trusted thread on TEE side. The execution of the trusted thread is tied to the
378execution of the caller Linux thread which is under the Linux kernel scheduling
379decision. This means trusted threads are scheduled by the Linux kernel.
380
381**Trusted thread constraints**
382
383TEE core handles a static number of trusted threads, see ``CFG_NUM_THREADS``.
384
Jens Wiklander01d61a92021-05-12 14:56:22 +0200385Trusted threads are expensive on memory constrained system, mainly
386because of the execution stack size.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200387
388On SMP systems, optee_os can execute several trusted threads in parallel if the
389normal world supports scheduling of processes. Even on UP systems, supporting
390several trusted threads in optee_os helps normal world scheduler to be
391efficient.
392
393----
394
Jens Wiklander5507c022021-05-14 15:12:56 +0200395.. _notifications:
396
397Notifications
398*************
399
400There are two kinds of notifications that secure world can use to make
401normal world aware of some event.
402
4031. Synchronous notifications delivered with ``OPTEE_RPC_CMD_NOTIFICATION``
404 using the ``OPTEE_RPC_NOTIFICATION_SEND`` parameter.
4052. Asynchronous notifications delivered with a combination of a non-secure
406 interrupt and a fast call from the non-secure interrupt handler.
407
408Secure world can wait in normal for a notification to arrive. This allows
409the calling thread to sleep instead of spinning when waiting for something.
410This happens for instance when a thread waits for a mutex to become
411available.
412
413Synchronous notifications are limited by depending on RPC for delivery, this
414is only usable from a normal thread context. Secure interrupt handler or
415other atomic context cannot use synchronous notifications due to this.
416
417Asynchrononous notifications uses a platform specific way of triggering a
418non-secure interrupt. This is done with ``itr_raise_pi()`` in a way
419suitable for a secure interrupt handler or another atomic context. This is
420useful when using a top half and bottom half kind of design in a device
421driver. The top half is done in the secure interrupt handler which then
422triggers normal world to make a yielding call into secure world to do the
423bottom half processing.
424
425.. uml::
426 :align: center
427 :caption: Top half, bottom half example
428
429 participant "OP-TEE OS\ninterrupt handler" as sec_itr
430 participant "OP-TEE OS\nfastcall handler" as fastcall
431 participant "Interrupt\ncontroller" as itc
432 participant "Normal World\ninterrupt handler" as ns_itr
433 participant "Normal World\nthread" as ns_thr
434 participant "OP-TEE OS\nyielding do bottom half" as bottom
435
436 itc --> sec_itr : Secure interrupt
437 activate sec_itr
438 sec_itr -> sec_itr : Top half processing
439 sec_itr --> itc : Trigger NS interrupt
440 itc --> ns_itr : Non-secure interrupt
441 activate ns_itr
442 sec_itr --> itc: End of interrupt
443 deactivate sec_itr
444 ns_itr -> fastcall ++: Get notification
445 fastcall -> ns_itr --: Return notification
446 alt Do bottom half notifcation
447 ns_itr --> ns_thr : Wake thread
448 activate ns_thr
449 ns_itr --> itc: End of interrupt
450 deactivate ns_itr
451 ns_thr -> bottom ++: Do bottom half
452 bottom -> bottom : Process bottom half
453 bottom -> ns_thr --: Done
454 deactivate ns_thr
455 else Some other notification
456 end
457
458.. uml::
459 :align: center
460 :caption: Synchronous example
461
462 participant "OP-TEE OS\nthread 1" as sec_thr1
463 participant "Normal World\nthread 1" as ns_thr1
464 participant "OP-TEE OS\nthread 2" as sec_thr2
465 participant "Normal World\nthread 2" as ns_thr2
466
467 activate ns_thr1
468 ns_thr1 -> sec_thr1 ++ : Invoke
469 sec_thr1 -> sec_thr1 : Lock mutex
470 sec_thr1 -> sec_thr1 : Process
471 activate ns_thr2
472 ns_thr2 -> sec_thr2 ++: Invoke
473 sec_thr2 -> ns_thr2 -- : RPC: Wait for mutex
474 ns_thr2 -> ns_thr2 : Wait for notifcation
475 deactivate ns_thr2
476 sec_thr1 -> sec_thr1 : Unlock mutex
477 sec_thr1 -> ns_thr1 -- : RPC: Notify mutex unlocked
478 ns_thr1 --> ns_thr2 : Notify mutex unlocked
479 activate ns_thr2
480 ns_thr1 -> sec_thr1 ++ : Return from RPC
481 sec_thr1 -> sec_thr1 : Process
482 sec_thr1 -> ns_thr1 -- : Return from Invoke
483 deactivate ns_thr1
484 ns_thr2 -> sec_thr2 ++ : Return from RPC
485 sec_thr2 -> sec_thr2 : Lock mutex
486 sec_thr2 -> sec_thr2 : Process
487 sec_thr2 -> sec_thr2 : Unlock mutex
488 sec_thr2 -> sec_thr2 : Process
489 sec_thr2 -> ns_thr2 -- : Return from Invoke
490 deactivate ns_thr2
491
492Notifications are identified with a value, allocated as:
493
4940 - 63
495 Mixed asynchronous and synchronous range
49664 - Max
497 Synchronous only range
498
499If the **Max** value is smaller than 63, then there's only the mixed range.
500
501If asynchronous notifications are enabled then is the value 0 reserved for
502signalling the a driver need a bootom half call, that is the yielding call
503``OPTEE_MSG_CMD_DO_BOTTOM_HALF``.
504
505The rest of the asynchronous notification values are managed with two
506functions ``notif_alloc_async_value()`` and ``notif_free_async_value()``.
507
508----
509
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200510.. _memory_objects:
511
512Memory objects
513**************
514A memory object, **MOBJ**, describes a piece of memory. The interface provided
515is mostly abstract when it comes to using the MOBJ to populate translation
516tables etc. There are different kinds of MOBJs describing:
517
518 - Physically contiguous memory
519 - created with ``mobj_phys_alloc(...)``.
520
521 - Virtual memory
522 - one instance with the name ``mobj_virt`` available.
523 - spans the entire virtual address space.
524
525 - Physically contiguous memory allocated from a ``tee_mm_pool_t *``
526 - created with ``mobj_mm_alloc(...)``.
527
528 - Paged memory
529 - created with ``mobj_paged_alloc(...)``.
530 - only contains the supplied size and makes ``mobj_is_paged(...)``
531 return true if supplied as argument.
532
533 - Secure copy paged shared memory
534 - created with ``mobj_seccpy_shm_alloc(...)``.
535 - makes ``mobj_is_paged(...)`` and ``mobj_is_secure(...)`` return true
536 if supplied as argument.
537
538----
539
540.. _mmu:
541
542MMU
543***
544Translation tables
545==================
Jens Wiklander18b953f2021-05-14 13:00:21 +0200546
547OP-TEE supports two translation table formats:
548
5491. Short-descriptor translation table format, available on ARMv7-A and
550 ARMv8-A AArch32
5512. Long-descriptor translation format, available on ARMv7-A with LPAE and
552 ARMv8-A
553
554ARMv7-A without LPAE (Large Physical Address Extension) must use the
555short-descriptor translation table format only. ARMv8-A AArch64 must use
556the long-descriptor translation format only.
557
558Translation table format is a static build time configuration option,
559``CFG_WITH_LPAE``. The design around the translation table handling has
560been centered around these factors:
561
5621. Share translation tables between CPUs when possible to save memory
563 and simplify paging
5642. Support non-global CPU specific mappings to allow executing different
565 TAs in parallel.
566
567Short-descriptor translation table format
568-----------------------------------------
569
570Several L1 translation tables are used, one large spanning 4 GiB and two or
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200571more small tables spanning 32 MiB. The large translation table handles kernel
572mode mapping and matches all addresses not covered by the small translation
573tables. The small translation tables are assigned per thread and covers the
574mapping of the virtual memory space for one TA context.
575
576Memory space between small and large translation table is configured by TTBRC.
577TTBR1 always points to the large translation table. TTBR0 points to the a small
578translation table when user mapping is active and to the large translation table
579when no user mapping is currently active. For details about registers etc,
580please refer to a Technical Reference Manual for your architecture, for example
581`Cortex-A53 TRM`_.
582
583The translation tables has certain alignment constraints, the alignment (of the
584physical address) has to be the same as the size of the translation table. The
585translation tables are statically allocated to avoid fragmentation of memory due
586to the alignment constraints.
587
588Each thread has one small L1 translation table of its own. Each TA context has a
589compact representation of its L1 translation table. The compact representation
590is used to initialize the thread specific L1 translation table when the TA
591context is activated.
592
593.. graphviz::
Jens Wiklander18b953f2021-05-14 13:00:21 +0200594 :align: center
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200595
596 digraph xlat_table {
597 graph [
598 rankdir = "LR"
599 ];
600 node [
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200601 shape = "ellipse"
602 ];
603 edge [
604 ];
605 "node_ttb" [
606 label = "<f0> TTBR0 | <f1> TTBR1"
607 shape = "record"
608 ];
609 "node_large_l1" [
610 label = "<f0> Large L1\nSpans 4 GiB"
611 shape = "record"
612 ];
613 "node_small_l1" [
614 label = "Small L1\nSpans 32 MiB\nper entry | <f0> 0 | <f1> 1 | ... | <fn> n"
615 shape = "record"
616 ];
617
618 "node_ttb":f0 -> "node_small_l1":f0 [ label = "Thread 0 ctx active" ];
619 "node_ttb":f0 -> "node_small_l1":f1 [ label = "Thread 1 ctx active" ];
620 "node_ttb":f0 -> "node_small_l1":fn [ label = "Thread n ctx active" ];
621 "node_ttb":f0 -> "node_large_l1" [ label="No active ctx" ];
622 "node_ttb":f1 -> "node_large_l1";
623 }
624
Jens Wiklander18b953f2021-05-14 13:00:21 +0200625Long-descriptor translation table format
626----------------------------------------
627
628Each CPU is assigned a L1 translation table which is programmed into
629Translation Table Base Register 0 (``TTBR0`` or ``TTBR0_EL1`` as
630appropriate).
631
632L1 and L2 translation tables are statically allocated and initialized at
633boot. Normally there is only one shared L2 table, but with ASLR enabled the
634virtual address space used for the shared mapping may need to use two
635tables. An unused entry in the L1 table is selected to point to the per
636thread L2 table. With ASLR configured this means that different per thread
637entry may be selected each time the system boots. Note that this entry will
638only point to a table when the per thread mapping is activated.
639
640The L2 translation tables in their turn point to L3 tables which use the
641small page granularity of 4 KiB. The shared mappings has the L3 tables
642initialized too at boot, but the per thread L3 tables are dynamic and are
643only assigned when the mapping is activated.
644
645.. graphviz::
646 :align: center
647 :caption: Example translation table setup with 4GiB virtual address space
648 with L3 tables excluded
649
650 digraph xlat_table {
651 graph [ rankdir = "LR" ];
652 node [ ];
653 edge [ ];
654
655 "ttbr0" [
656 label = "TTBR0"
657 shape = "record"
658 ];
659 "node_l1" [
660 label = "<h> Per CPU L1 table | <f0> 0 | <f1> 1 | <f2> 2 | <f3> 3"
661 shape = "record"
662 ];
663 "shared_l2_n" [
664 label = "<h> Shared L2 table n | 0 | ... | 512"
665 shape = "record"
666 ]
667 "shared_l2_m" [
668 label = "<h> Shared L2 table m | 0 | ... | 512"
669 shape = "record"
670 ]
671 "per_thread_l2" [
672 label = "<h> Per thread L2 table | 0 | ... | 512"
673 shape = "record"
674 ]
675 "ttbr0" -> "node_l1":h;
676 "node_l1":f2 -> "shared_l2_n":h;
677 "node_l1":f3 -> "shared_l2_m":h;
678 "node_l1":f0 -> "per_thread_l2":h;
679 }
680
681
Jens Wiklander03b05a02019-02-25 13:44:38 +0100682Page table cache
683================
684Page tables used to map TAs are managed with the page table cache. When the
685context of a TA is unmapped, all its page tables are released with a call
686to ``pgt_free()``. All page tables needed when mapping a TA are allocated
687using ``pgt_alloc()``.
688
689A fixed maximum number of translation tables are available in a pool. One
690thread may execute a TA which needs all or almost all tables. This can
691block TAs from being executed by other threads. To ensure that all TAs
692eventually will be permitted to execute ``pgt_alloc()`` temporarily frees
693eventual tables allocated before waiting for tables to become available.
694
695The page table cache behaves differently depending on configuration
696options.
697
698Without paging (``CFG_WITH_PAGER=n``)
699-------------------------------------
700This is the easiest configuration. All page tables are statically allocated
701in the ``.nozi.pgt_cache`` section. ``pgt_alloc()`` allocates tables from the
702free-list and ``pgt_free()`` returns the tables directly to the free-list.
703
704With paging enabled (``CFG_WITH_PAGER=y``)
705------------------------------------------
706
707Page tables are allocated as zero initialized locked pages during boot
708using ``tee_pager_alloc()``. Locked pages are populated with physical pages
709on demand from the pager. The physical page can be released when not needed
710any longer with ``tee_pager_release_phys()``.
711
712With ``CFG_WITH_LPAE=y`` each translation table has the same size as a
713physical page which makes it easy to release the physical page when the
714translation table isn't needed any longer. With the short-descriptor table
715format (``CFG_WITH_LPAE=n``) it becomes more complicated as four
716translation tables are stored in each page. Additional bookkeeping is used
717to tell when the page for used by four separate translation tables can be
718released.
719
720With paging of user TA enabled (``CFG_PAGED_USER_TA=y``)
721--------------------------------------------------------
722With paging of user TAs enabled a cache of recently used translation tables
723is used. This can save us from a storm of page faults when restoring the
724mappings of a recently unmapped TA. Which translation tables should be
725cached is indicated with reference counting by the pager on used tables.
726When a table needs to be forcefully freed
727``tee_pager_pgt_save_and_release_entries()`` is called to let the pager
728know that the table can't be used any longer.
729
730When a mapping in a TA is removed it also needs to be purged from cached
731tables with ``pgt_flush_ctx_range()`` to prevent old mappings from being
732accidentally reused.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200733
734Switching to user mode
735======================
736This section only applies with following configuration flags:
737
738 - ``CFG_WITH_LPAE=n``
739 - ``CFG_CORE_UNMAP_CORE_AT_EL0=y``
740
741When switching to user mode only a minimal kernel mode mapping is kept. This is
742achieved by selecting a zeroed out big L1 translation in TTBR1 when
743transitioning to user mode. When returning back to kernel mode the original L1
744translation table is restored in TTBR1.
745
746Switching to normal world
747=========================
748When switching to normal world either via a foreign interrupt (see
749:ref:`native_foreign_irqs` or RPC there is a chance that secure world will
750resume execution on a different CPU. This means that the new CPU need to be
751configured with the context of the currently active TA. This is solved by always
Jens Wiklanderddde3a82019-02-25 12:46:18 +0100752setting the TA context in the CPU when resuming execution.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200753
754----
755
756.. _pager:
757
758Pager
759*****
760OP-TEE currently requires >256 KiB RAM for OP-TEE kernel memory. This is not a
761problem if OP-TEE uses TrustZone protected DDR, but for security reasons OP-TEE
762may need to use TrustZone protected SRAM instead. The amount of available SRAM
763varies between platforms, from just a few KiB up to over 512 KiB. Platforms with
764just a few KiB of SRAM cannot be expected to be able to run a complete TEE
765solution in SRAM. But those with 128 to 256 KiB of SRAM can be expected to have
766a capable TEE solution in SRAM. The pager provides a solution to this by demand
767paging parts of OP-TEE using virtual memory.
768
769Secure memory
770=============
771TrustZone protected SRAM is generally considered more secure than TrustZone
772protected DRAM as there is usually more attack vectors on DRAM. The attack
773vectors are hardware dependent and can be different for different platforms.
774
775Backing store
776=============
777TrustZone protected DRAM or in some cases non-secure DRAM is used as backing
778store. The data in the backing store is integrity protected with one hash
779(SHA-256) per page (4KiB). Readonly pages are not encrypted since the OP-TEE
780binary itself is not encrypted.
781
782Partitioning of memory
783======================
784The code that handles demand paging must always be available as it would
785otherwise lead to deadlock. The virtual memory is partitioned as:
786
787 +--------------+-------------------+
788 | Type | Sections |
789 +==============+===================+
790 | unpaged | | text |
791 | | | rodata |
792 | | | data |
793 | | | bss |
794 | | | heap1 |
795 | | | nozi |
796 | | | heap2 |
797 +--------------+-------------------+
798 | init / paged | | text_init |
799 | | | rodata_init |
800 +--------------+-------------------+
801 | paged | | text_pageable |
802 | | | rodata_pageable |
803 +--------------+-------------------+
804 | demand alloc | |
805 +--------------+-------------------+
806
807Where ``nozi`` stands for "not zero initialized", this section contains entry
808stacks (thread stack when TEE pager is not enabled) and translation tables (TEE
809pager cached translation table when the pager is enabled and LPAE MMU is used).
810
811The ``init`` area is available when OP-TEE is initializing and contains
812everything that is needed to initialize the pager. After the pager has been
813initialized this area will be used for demand paged instead.
814
815The ``demand alloc`` area is a special area where the pages are allocated and
816removed from the pager on demand. Those pages are returned when OP-TEE does not
817need them any longer. The thread stacks currently belongs this area. This means
818that when a stack is not used the physical pages can be used by the pager for
819better performance.
820
821The technique to gather code in the different area is based on compiling all
822functions and data into separate sections. The unpaged text and rodata is then
823gathered by linking all object files with ``--gc-sections`` to eliminate
824sections that are outside the dependency graph of the entry functions for
825unpaged functions. A script analyzes this ELF file and generates the bits of the
826final link script. The process is repeated for init text and rodata. What is
827not "unpaged" or "init" becomes "paged".
828
829Partitioning of the binary
830==========================
831.. note::
832 The struct definitions provided in this section are explicitly covered by
833 the following dual license:
834
835 .. code-block:: none
836
837 SPDX-License-Identifier: (BSD-2-Clause OR GPL-2.0)
838
839The binary is partitioned into four parts as:
840
841
842 +----------+
843 | Binary |
844 +==========+
845 | Header |
846 +----------+
847 | Init |
848 +----------+
849 | Hashes |
850 +----------+
851 | Pageable |
852 +----------+
853
854The header is defined as:
855
856.. code-block:: c
857
858 #define OPTEE_MAGIC 0x4554504f
859 #define OPTEE_VERSION 1
860 #define OPTEE_ARCH_ARM32 0
861 #define OPTEE_ARCH_ARM64 1
862
863 struct optee_header {
864 uint32_t magic;
865 uint8_t version;
866 uint8_t arch;
867 uint16_t flags;
868 uint32_t init_size;
869 uint32_t init_load_addr_hi;
870 uint32_t init_load_addr_lo;
871 uint32_t init_mem_usage;
872 uint32_t paged_size;
873 };
874
875The header is only used by the loader of OP-TEE, not OP-TEE itself. To
876initialize OP-TEE the loader loads the complete binary into memory and copies
877what follows the header and the following ``init_size`` bytes to
878``(init_load_addr_hi << 32 | init_load_addr_lo)``. ``init_mem_usage`` is used by
879the loader to be able to check that there is enough physical memory available
880for OP-TEE to be able to initialize at all. The loader supplies in ``r0/x0`` the
881address of the first byte following what was not copied and jumps to the load
882address to start OP-TEE.
883
884In addition to overall binary with partitions inside described as above, three
885extra binaries are generated simultaneously during build process for loaders who
886support loading separate binaries:
887
888 +-----------+
889 | v2 binary |
890 +===========+
891 | Header |
892 +-----------+
893
894 +-----------+
895 | v2 binary |
896 +===========+
897 | Init |
898 +-----------+
899 | Hashes |
900 +-----------+
901
902 +-----------+
903 | v2 binary |
904 +===========+
905 | Pageable |
906 +-----------+
907
908In this case, loaders load header binary first to get image list and information
909of each image; and then load each of them into specific load address assigned in
910structure. These binaries are named with `v2` suffix to distinguish from the
911existing binaries. Header format is updated to help loaders loading binaries
912efficiently:
913
914.. code-block:: c
915
916 #define OPTEE_IMAGE_ID_PAGER 0
917 #define OPTEE_IMAGE_ID_PAGED 1
918
919 struct optee_image {
920 uint32_t load_addr_hi;
921 uint32_t load_addr_lo;
922 uint32_t image_id;
923 uint32_t size;
924 };
925
926 struct optee_header_v2 {
927 uint32_t magic;
928 uint8_t version;
929 uint8_t arch;
930 uint16_t flags;
931 uint32_t nb_images;
932 struct optee_image optee_image[];
933 };
934
935Magic number and architecture are identical as original. Version is increased to
936two. ``load_addr_hi`` and ``load_addr_lo`` may be ``0xFFFFFFFF`` for pageable
937binary since pageable part may get loaded by loader into dynamic available
938position. ``image_id`` indicates how loader handles current binary. Loaders who
939don't support separate loading just ignore all v2 binaries.
940
941Initializing the pager
942======================
943The pager is initialized as early as possible during boot in order to minimize
944the "init" area. The global variable ``tee_mm_vcore`` describes the virtual
945memory range that is covered by the level 2 translation table supplied to
946``tee_pager_init(...)``.
947
948Assign pageable areas
949---------------------
950A virtual memory range to be handled by the pager is registered with a call to
951``tee_pager_add_core_area()``.
952
953.. code-block:: c
954
955 bool tee_pager_add_area(tee_mm_entry_t *mm,
956 uint32_t flags,
957 const void *store,
958 const void *hashes);
959
960which takes a pointer to ``tee_mm_entry_t`` to tell the range, flags to tell how
961memory should be mapped (readonly, execute etc), and pointers to backing store
962and hashes of the pages.
963
964Assign physical pages
965---------------------
966Physical SRAM pages are supplied by calling ``tee_pager_add_pages(...)``
967
968.. code-block:: c
969
970 void tee_pager_add_pages(tee_vaddr_t vaddr,
971 size_t npages,
972 bool unmap);
973
974``tee_pager_add_pages(...)`` takes the physical address stored in the entry
975mapping the virtual address ``vaddr`` and ``npages`` entries after that and uses
976it to map new pages when needed. The unmap parameter tells whether the pages
977should be unmapped immediately since they does not contain initialized data or
978be kept mapped until they need to be recycled. The pages in the "init" area are
979supplied with ``unmap == false`` since those page have valid content and are in
980use.
981
982Invocation
983==========
984The pager is invoked as part of the abort handler. A pool of physical pages are
985used to map different virtual addresses. When a new virtual address needs to be
986mapped a free physical page is mapped at the new address, if a free physical
987page cannot be found the oldest physical page is selected instead. When the page
988is mapped new data is copied from backing store and the hash of the page is
989verified. If it is OK the pager returns from the exception to resume the
990execution.
991
Jens Wiklanderaecf4412019-02-26 12:33:14 +0100992Data structures
993===============
994.. figure:: ../images/core/tee_pager_area.png
995 :figclass: align-center
996
997 How the main pager data structures relates to each other
998
999``struct tee_pager_area``
1000-------------------------
1001This is a central data structure when handling paged
1002memory ranges. It's defined as:
1003
1004.. code-block:: c
1005
1006 struct tee_pager_area {
1007 struct fobj *fobj;
1008 size_t fobj_pgoffs;
1009 enum tee_pager_area_type type;
1010 uint32_t flags;
1011 vaddr_t base;
1012 size_t size;
1013 struct pgt *pgt;
1014 TAILQ_ENTRY(tee_pager_area) link;
1015 TAILQ_ENTRY(tee_pager_area) fobj_link;
1016 };
1017
1018Where ``base`` and ``size`` tells the memory range and ``fobj`` and
1019``fobj_pgoffs`` holds the content. A ``struct tee_pager_area`` can only use
1020``struct fobj`` and one ``struct pgt`` (translation table) so memory ranges
1021spanning multiple fobjs or pgts are split into multiple areas.
1022
1023``struct fobj``
1024---------------
1025This is a polymorph object, using different implmentations depending on how
1026it's initialized. It's defines as:
1027
1028.. code-block:: c
1029
1030 struct fobj_ops {
1031 void (*free)(struct fobj *fobj);
1032 TEE_Result (*load_page)(struct fobj *fobj, unsigned int page_idx,
1033 void *va);
1034 TEE_Result (*save_page)(struct fobj *fobj, unsigned int page_idx,
1035 const void *va);
1036 };
1037
1038 struct fobj {
1039 const struct fobj_ops *ops;
1040 unsigned int num_pages;
1041 struct refcount refc;
1042 struct tee_pager_area_head areas;
1043 };
1044
1045:``num_pages``: Tells how many pages this ``fobj`` covers.
1046:``refc``: A reference counter, everyone referring to a ``fobj`` need to
1047 increase and decrease this as needed.
1048:``areas``: A list of areas using this ``fobj``, traversed when making
1049 a virtual page unavailable.
1050
1051``struct tee_pager_pmem``
1052-------------------------
1053This structure represents a physical page. It's defined as:
1054
1055.. code-block:: c
1056
1057 struct tee_pager_pmem {
1058 unsigned int flags;
1059 unsigned int fobj_pgidx;
1060 struct fobj *fobj;
1061 void *va_alias;
1062 TAILQ_ENTRY(tee_pager_pmem) link;
1063 };
1064
1065:``PMEM_FLAG_DIRTY``: Bit is set in ``flags`` when the page is mapped
1066 read/write at at least one location.
1067:``PMEM_FLAG_HIDDEN``: Bit is set in ``flags`` when the page is hidden, that
1068 is, not accessible anywhere.
1069:``fobj_pgidx``: The page at this index in the ``fobj`` is used in this
1070 physical page.
1071:``fobj``: The ``fobj`` backing this page.
1072:``va_alias``: Virtual address where this physical page is updated
1073 when loading it from backing store or when writing it
1074 back.
1075
1076All ``struct tee_pager_pmem`` are stored either in the global list
1077``tee_pager_pmem_head`` or in ``tee_pager_lock_pmem_head``. The latter is
1078used by pages which are mapped and then locked in memory on demand. The
1079pages are returned back to ``tee_pager_pmem_head`` when the pages are
1080exlicitly released with a call to ``tee_pager_release_phys()``.
1081
1082A physical page can be used by more than one ``tee_pager_area``
1083simultaneously. This is also know as shared secure memory and will appear
1084as such for both read-only and read-write mappings.
1085
1086When a page is hidden it's unmapped from all translation tables and the
1087``PMEM_FLAG_HIDDEN`` bit is set, but kept in memory. When a physical page
1088is released it's also unmapped from all translation tables and it's content
1089is written back to storage, then the ``fobj`` field is set to ``NULL`` to
1090note the physical page as unused.
1091
1092Note that when ``struct tee_pager_pmem`` references a ``fobj`` it doesn't
1093update the reference counter since it's already guaranteed to be available
1094due the ``struct tee_pager_area`` which must reference the ``fobj`` too.
1095
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001096Paging of user TA
1097=================
1098Paging of user TAs can optionally be enabled with ``CFG_PAGED_USER_TA=y``.
1099Paging of user TAs is analogous to paging of OP-TEE kernel parts but with a few
1100differences:
1101
1102 - Read/write pages are paged in addition to read-only pages
1103 - Page tables are managed dynamically
1104
1105``tee_pager_add_uta_area(...)`` is used to setup initial read/write mapping
1106needed when populating the TA. When the TA is fully populated and relocated
1107``tee_pager_set_uta_area_attr(...)`` changes the mapping of the area to strict
1108permissions used when the TA is running.
1109
Jens Wiklanderaecf4412019-02-26 12:33:14 +01001110Paging shared secure memory
1111---------------------------
1112Shared secure memory is achieved by letting several ``tee_pager_area``
1113using the same backing ``fobj``. When a ``tee_pager_area`` is allocated and
1114assigned a ``fobj`` it's also added to a list for ``tee_pager_areas`` using
1115this ``fobj``. This helps when a physical page is released.
1116
1117When a fault occurs first a matching ``tee_pager_area`` is located. Then
1118``tee_pager_pmem_head`` is searched to see if a physical page already holds
1119the page of the ``fobj`` needed. If so the ``pgt`` is updated to map the
1120physical page at the appropriate locatation. If no physical page was holding
1121the page a new physical page is allocated, initialized and finally mapped.
1122
1123In order to make as few updates to mappings as possible changes to less
1124restricted, no access -> read-only or read-only to read-write, is done only
1125for the virtual address was used when the page fault occurred. Changes in
1126the other direction has to be done in all translation tables used to map
1127the physical page.
1128
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001129----
1130
1131.. _stacks:
1132
1133Stacks
1134******
1135Different stacks are used during different stages. The stacks are:
1136
1137 - **Secure monitor stack** (128 bytes), bound to the CPU. Only available if
1138 OP-TEE is compiled with a secure monitor always the case if the target is
1139 Armv7-A but never for Armv8-A.
1140
1141 - **Temp stack** (small ~1KB), bound to the CPU. Used when transitioning
1142 from one state to another. Interrupts are always disabled when using this
1143 stack, aborts are fatal when using the temp stack.
1144
1145 - **Abort stack** (medium ~2KB), bound to the CPU. Used when trapping a data
1146 or pre-fetch abort. Aborts from user space are never fatal the TA is only
1147 killed. Aborts from kernel mode are used by the pager to do the demand
1148 paging, if pager is disabled all kernel mode aborts are fatal.
1149
1150 - **Thread stack** (large ~8KB), not bound to the CPU instead used by the
1151 current thread/task. Interrupts are usually enabled when using this stack.
1152
1153Notes for Armv7-A/AArch32
1154 .. list-table::
1155 :header-rows: 1
1156 :widths: 1 5
1157
1158 * - Stack
1159 - Comment
1160
1161 * - Temp
1162 - Assigned to ``SP_SVC`` during entry/exit, always assigned to
1163 ``SP_IRQ`` and ``SP_FIQ``
1164
1165 * - Abort
1166 - Always assigned to ``SP_ABT``
1167
1168 * - Thread
1169 - Assigned to ``SP_SVC`` while a thread is active
1170
1171Notes for AArch64
1172 There are only two stack pointers, ``SP_EL1`` and ``SP_EL0``, available for
1173 OP-TEE in AArch64. When an exception is received stack pointer is always
1174 ``SP_EL1`` which is used temporarily while assigning an appropriate stack
1175 pointer for ``SP_EL0``. ``SP_EL1`` is always assigned the value of
1176 ``thread_core_local[cpu_id]``. This structure has some spare space for
1177 temporary storage of registers and also keeps the relevant stack pointers.
1178 In general when we talk about assigning a stack pointer to the CPU below we
1179 mean ``SP_EL0``.
1180
1181Boot
1182====
1183During early boot the CPU is configured with the temp stack which is used until
1184OP-TEE exits to normal world the first time.
1185
1186Notes for AArch64
1187 ``SPSEL`` is always ``0`` on entry/exit to have ``SP_EL0`` acting as stack
1188 pointer.
1189
1190Normal entry
1191============
1192Each time OP-TEE is entered from normal world the temp stack is used as the
1193initial stack. For fast calls, this is the only stack used. For normal calls an
1194empty thread slot is selected and the CPU switches to that stack.
1195
1196Normal exit
1197===========
1198Normal exit occurs when a thread has finished its task and the thread is freed.
1199When the main thread function, ``tee_entry_std(...)``, returns interrupts are
1200disabled and the CPU switches to the temp stack instead. The thread is freed and
1201OP-TEE exits to normal world.
1202
1203RPC exit
1204========
1205RPC exit occurs when OP-TEE need some service from normal world. RPC can
1206currently only be performed with a thread is in running state. RPC is initiated
1207with a call to ``thread_rpc(...)`` which saves the state in a way that when the
1208thread is restored it will continue at the next instruction as if this function
1209did a normal return. CPU switches to use the temp stack before returning to
1210normal world.
1211
1212Foreign interrupt exit
1213======================
1214Foreign interrupt exit occurs when OP-TEE receives a foreign interrupt. For Arm
1215GICv2 mode, foreign interrupt is sent as IRQ which is always handled in normal
1216world. Foreign interrupt exit is similar to RPC exit but it is
1217``thread_irq_handler(...)`` and ``elx_irq(...)`` (respectively for
1218Armv7-A/Aarch32 and for Aarch64) that saves the thread state instead. The thread
1219is resumed in the same way though. For Arm GICv3 mode, foreign interrupt is sent
1220as FIQ which could be handled by either secure world (EL3 in AArch64) or normal
1221world. This mode is not supported yet.
1222
1223Notes for Armv7-A/AArch32
1224 SP_IRQ is initialized to temp stack instead of a separate stack. Prior to
1225 exiting to normal world CPU state is changed to SVC and temp stack is
1226 selected.
1227
1228Notes for AArch64
1229 ``SP_EL0`` is assigned temp stack and is selected during IRQ processing. The
1230 original ``SP_EL0`` is saved in the thread context to be restored when
1231 resuming.
1232
1233Resume entry
1234============
1235OP-TEE is entered using the temp stack in the same way as for normal entry. The
1236thread to resume is looked up and the state is restored to resume execution. The
1237procedure to resume from an RPC exit or an foreign interrupt exit is exactly the
1238same.
1239
1240Syscall
1241=======
1242Syscall's are executed using the thread stack.
1243
1244Notes for Armv7-A/AArch32
1245 Nothing special ``SP_SVC`` is already set with thread stack.
1246
1247Notes for syscall AArch64
1248 Early in the exception processing the original ``SP_EL0`` is saved in
1249 ``struct thread_svc_regs`` in case the TA is executed in AArch64. Current
1250 thread stack is assigned to ``SP_EL0`` which is then selected. When
1251 returning ``SP_EL0`` is assigned what is in ``struct thread_svc_regs``. This
1252 allows ``tee_svc_sys_return_helper(...)`` having the syscall exception
1253 handler return directly to ``thread_unwind_user_mode(...)``.
1254
1255----
1256
1257.. _shared_memory:
1258
1259Shared Memory
1260*************
1261Shared Memory is a block of memory that is shared between the non-secure and the
1262secure world. It is used to transfer data between both worlds.
1263
Etienne Carriere9c600252019-03-11 11:01:48 +01001264The shared memory is allocated and managed by the non-secure world, i.e. the
1265Linux OP-TEE driver. Secure world only considers the individual shared buffers,
1266not their pool. Each shared memory is referenced with associated attributes:
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001267
Etienne Carriere9c600252019-03-11 11:01:48 +01001268 - Buffer start address and byte size,
1269 - Cache attributes of the shared memory buffer,
1270 - List of chunks if mapped from noncontiguous pages.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001271
Etienne Carriere9c600252019-03-11 11:01:48 +01001272Shared memory buffer references manipulated must fit inside one of the
1273shared memory areas known from the OP-TEE core. OP-TEE supports two kinds
Jerome Forissier7fa91cf2020-07-30 16:03:59 +02001274of shared memory areas: an area for contiguous buffers and an area for
1275noncontiguous buffers. At least one has to be enabled.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001276
Etienne Carriere9c600252019-03-11 11:01:48 +01001277Contiguous shared buffers
1278=========================
Etienne Carriere9c600252019-03-11 11:01:48 +01001279Configuration directives ``CFG_SHMEM_START`` and ``CFG_SHMEM_SIZE``
1280define a share memory area where shared memory buffers are contiguous.
1281Generic memory layout registers it as the ``MEM_AREA_NSEC_SHM`` memory area.
1282
1283The non-secure world issues ``OPTEE_SMC_GET_SHM_CONFIG`` to retrieve contiguous
1284shared memory area configuration:
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001285
1286 - Physical address of the start of the pool
1287 - Size of the pool
1288 - Whether or not the memory is cached
1289
Jens Wiklandera70d2f42019-04-25 12:40:49 +02001290Contiguous shared memory (also known as static or reserved shared memory)
1291is enabled with the configuration flag ``CFG_CORE_RESERVED_SHM=y``.
1292
Etienne Carriere9c600252019-03-11 11:01:48 +01001293Noncontiguous shared buffers
1294============================
Etienne Carriere9c600252019-03-11 11:01:48 +01001295To benefit from noncontiguous shared memory buffers, secure world register
1296dynamic shared memory areas and non-secure world must register noncontiguous
1297buffers prior to referring to them using the OP-TEE API.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001298
Etienne Carriere9c600252019-03-11 11:01:48 +01001299The OP-TEE core generic boot sequence discovers dynamic shared areas from the
1300device tree and/or areas explicitly registered by the platform.
1301
1302Non-secure side needs to register buffers as 4kByte chunks lists into OP-TEE
1303core using the ``OPTEE_MSG_CMD_REGISTER_SHM`` API prior referencing to them
1304using the OP-TEE invocation API.
1305
Jens Wiklandera70d2f42019-04-25 12:40:49 +02001306Noncontiguous shared memory (also known as dynamic shared memory) is
1307enabled with the configuration flag ``CFG_CORE_DYN_SHM=y``.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001308
Jerome Forissier7fa91cf2020-07-30 16:03:59 +02001309For performance reasons, the TEE Client Library (``libteec``) uses
1310noncontiguous shared memory when available since it avoids copies in some
1311situations.
1312
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001313Shared Memory Chunk Allocation
1314==============================
1315It is the Linux kernel driver for OP-TEE that is responsible for allocating
1316chunks of shared memory. OP-TEE linux kernel driver relies on linux kernel
1317generic allocation support (``CONFIG_GENERIC_ALLOCATION``) to allocation/release
1318of shared memory physical chunks. OP-TEE linux kernel driver relies on linux
1319kernel dma-buf support (``CONFIG_DMA_SHARED_BUFFER``) to track shared memory
1320buffers references.
1321
Jens Wiklander5b145172021-06-30 10:15:41 +02001322Registering shared memory
1323=========================
1324
1325Only dynamic or physically non-contiguous shared memory needs to be
1326registered. Static or physically contiguous shared memory is already known
1327to OP-TEE OS.
1328
1329SMC based OP-TEE MSG ABI
1330------------------------
1331
1332With the SMC based OP-TEE MSG ABI there are a few exceptions where memory
1333doesn't need to be shared before it can be accessed from OP-TEE OS. These
1334are:
1335
13361. When issuing the SMC ``OPTEE_SMC_CALL_WITH_ARG`` where the physical
1337 address of the supplied ``struct optee_msg_arg`` is passed in one of the
1338 registers.
13392. When issuing the SMC ``OPTEE_SMC_CALL_RETURN_FROM_RPC`` as a return from
1340 the request ``OPTEE_SMC_RETURN_RPC_ALLOC`` to allocate memory. This RPC
1341 return is combined with an implicit registration of shared memory. The
1342 registration is ended with a ``OPTEE_SMC_RETURN_RPC_FREE`` request.
1343
1344.. uml::
1345 :align: center
1346 :caption: Register shared memory example
1347
1348 participant "Normal World\nOS Kernel" as ns
1349 participant "Secure World\nOP-TEE OS" as sec
1350
1351 ns -> sec : OPTEE_MSG_CMD_REGISTER_SHM(Cookie, memory)
1352 sec -> sec : Register shared memory passed
1353 sec -> ns : Return
1354
1355.. uml::
1356 :align: center
1357 :caption: Unregister shared memory example
1358
1359 participant "Normal World\nOS Kernel" as ns
1360 participant "Secure World\nOP-TEE OS" as sec
1361
1362 ns -> sec : OPTEE_MSG_CMD_UNREGISTER_SHM(Cookie)
1363 sec -> sec : Unregister shared memory
1364 sec -> ns : Return
1365
1366FF-A based OP-TEE MSG ABI
1367-------------------------
1368
1369With the FF-A based OP-TEE MSG ABI memory must always be registered before
1370it can be used by OP-TEE OS. This case can potentially also involve another
1371component in secure world, SPMC at ``S-EL2`` a secure hypervisor which
1372controls which memory OP-TEE OS can see or use.
1373
1374In the case where there are no SPMC at ``S-EL2`` OP-TEE OS will take care
1375of that part of the communication with normal world. This means that for
1376normal world communication with OP-TEE OS is the same regardless of the
1377presence of a secure hypervisor.
1378
1379Registration of shared memory is a two step procedure. It's first
1380registered with a call to the SPMC which returns a cookie or global memory
1381handle. This cookie is later used when calling OP-TEE OS, if the cookie
1382isn't already known to OP-TEE OS it will ask the SPMC to make the memory
1383available. This lazy second step is a way of saving an extra round trip to
1384secure world.
1385
1386.. uml::
1387 :align: center
1388 :caption: Register shared memory example
1389
1390 participant "Normal World\nOS Kernel" as ns
1391 participant "Secure World\nSPMC" as spmc
1392 participant "Secure World\nOP-TEE OS" as sec
1393
1394 ns -> spmc : FFA_MEM_SHARE(memory)
1395 spmc -> spmc : Register shared memory passed
1396 spmc -> ns : Return cookie
1397
1398.. uml::
1399 :align: center
1400 :caption: Calling OP-TEE OS with shared memory
1401
1402 participant "Normal World\nOS Kernel" as ns
1403 participant "Secure World\nSPMC" as spmc
1404 participant "Secure World\nOP-TEE OS" as sec
1405
1406 ns -> sec: OPTEE_FFA_YIELDING_CALL_WITH_ARG(cookie)
1407 alt cookie not known
1408 sec -> spmc : FFA_MEM_RETRIEVE_REQ(cookie)
1409 spmc -> sec : Return memory description
1410 sec -> sec : Register shared memory
1411 end
1412 sec -> sec : Process the yielding call
1413 sec -> ns : Return
1414
1415Unregistration of shared memory is also done in two steps. First with a
1416call to OP-TEE and then with a call to the SPMC. If the lazy second
1417step of shared memory has not been done, then OP-TEE OS doesn't need
1418to interact with the SPMC.
1419
1420.. uml::
1421 :align: center
1422 :caption: Unregister shared memroy
1423
1424 participant "Normal World\nOS Kernel" as ns
1425 participant "Secure World\nSPMC" as spmc
1426 participant "Secure World\nOP-TEE OS" as sec
1427
1428 ns -> sec: OPTEE_FFA_UNREGISTER_SHM(cookie)
1429 alt cookie known
1430 sec -> sec : Unregister shared memory
1431 sec -> spmc : FFA_MEM_RELINQUISH(cookie)
1432 spmc -> sec : Return
1433 end
1434 sec -> ns : Return
1435
1436 ns -> spmc : FFA_MEM_RECLAIM(cookie)
1437 spmc -> spmc : Unregister shared memory
1438 spmc -> ns : Return
1439
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001440Using shared memory
1441===================
1442From the Client Application
1443 The client application can ask for shared memory allocation using the
1444 GlobalPlatform Client API function ``TEEC_AllocateSharedMemory(...)``. The
Etienne Carriere9c600252019-03-11 11:01:48 +01001445 client application can also register a memory through the GlobalPlatform
1446 Client API function ``TEEC_RegisterSharedMemory(...)``. The shared memory
1447 reference can then be used as parameter when invoking a trusted application.
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001448
1449From the Linux Driver
1450 Occasionally the Linux kernel driver needs to allocate shared memory for the
1451 communication with secure world, for example when using buffers of type
1452 ``TEEC_TempMemoryReference``.
1453
1454From OP-TEE core
1455 In case OP-TEE core needs information from TEE supplicant (dynamic TA
1456 loading, REE time request,...), shared memory must be allocated. Allocation
1457 depends on the use case. OP-TEE core asks for the following shared memory
1458 allocation:
1459
1460 - ``optee_msg_arg`` structure, used to pass the arguments to the
1461 non-secure world, where the allocation will be done by sending a
1462 ``OPTEE_SMC_RPC_FUNC_ALLOC`` message.
1463
1464 - In some cases, a payload might be needed for storing the result from
1465 TEE supplicant, for example when loading a Trusted Application. This
1466 type of allocation will be done by sending the message
1467 ``OPTEE_MSG_RPC_CMD_SHM_ALLOC(OPTEE_MSG_RPC_SHM_TYPE_APPL,...)``,
1468 which then will return:
1469
1470 - the physical address of the shared memory
1471 - a handle to the memory, that later on will be used later on when
1472 freeing this memory.
1473
1474From TEE Supplicant
1475 TEE supplicant is also working with shared memory, used to exchange data
1476 between normal and secure worlds. TEE supplicant receives a memory address
1477 from the OP-TEE core, used to store the data. This is for example the case
1478 when a Trusted Application is loaded. In this case, TEE supplicant must
1479 register the provided shared memory in the same way a client application
1480 would do, involving the Linux driver.
1481
1482----
1483
1484.. _smc:
1485
1486SMC
1487***
1488SMC Interface
1489=============
1490OP-TEE's SMC interface is defined in two levels using optee_smc.h_ and
1491optee_msg.h_. The former file defines SMC identifiers and what is passed in the
1492registers for each SMC. The latter file defines the OP-TEE Message protocol
1493which is not restricted to only SMC even if that currently is the only option
1494available.
1495
1496SMC communication
1497=================
1498The main structure used for the SMC communication is defined in ``struct
1499optee_msg_arg`` (in optee_msg.h_). If we are looking into the source code, we
1500could see that communication mainly is achieved using ``optee_msg_arg`` and
1501``thread_smc_args`` (in thread.h_), where ``optee_msg_arg`` could be seen as the
1502main structure. What will happen is that the :ref:`linux_kernel` driver will get
1503the parameters either from :ref:`optee_client` or directly from an internal
1504service in Linux kernel. The TEE driver will populate the struct
1505``optee_msg_arg`` with the parameters plus some additional bookkeeping
1506information. Parameters for the SMC are passed in registers 1 to 7, register 0
1507holds the SMC id which among other things tells whether it is a standard or a
1508fast call.
1509
1510----
1511
1512.. _thread_handling:
1513
1514Thread handling
1515***************
1516OP-TEE core uses a couple of threads to be able to support running jobs in
1517parallel (not fully enabled!). There are handlers for different purposes. In
1518thread.c_ you will find a function called ``thread_init_primary(...)`` which
1519assigns ``init_handlers`` (functions) that should be called when OP-TEE core
1520receives standard or fast calls, FIQ and PSCI calls. There are default handlers
1521for these services, but the platform can decide if they want to implement their
1522own platform specific handlers instead.
1523
1524Synchronization primitives
1525==========================
1526OP-TEE has three primitives for synchronization of threads and CPUs:
1527*spin-lock*, *mutex*, and *condvar*.
1528
1529Spin-lock
1530 A spin-lock is represented as an ``unsigned int``. This is the most
1531 primitive lock. Interrupts should be disabled before attempting to take a
1532 spin-lock and should remain disabled until the lock is released. A spin-lock
1533 is initialized with ``SPINLOCK_UNLOCK``.
1534
1535 .. list-table:: Spin lock functions
1536 :header-rows: 1
1537 :widths: 1 5
1538
1539 * - Function
1540 - Purpose
1541
1542 * - ``cpu_spin_lock(...)``
1543 - Locks a spin-lock
1544
1545 * - ``cpu_spin_trylock(...)``
1546 - Locks a spin-lock if unlocked and returns ``0`` else the spin-lock
1547 is unchanged and the function returns ``!0``
1548
1549 * - ``cpu_spin_unlock(...)``
1550 - Unlocks a spin-lock
1551
1552Mutex
1553 A mutex is represented by ``struct mutex``. A mutex can be locked and
1554 unlocked with interrupts enabled or disabled, but only from a normal thread.
1555 A mutex cannot be used in an interrupt handler, abort handler or before a
1556 thread has been selected for the CPU. A mutex is initialized with either
1557 ``MUTEX_INITIALIZER`` or ``mutex_init(...)``.
1558
1559 .. list-table:: Mutex functions
1560 :header-rows: 1
1561 :widths: 1 5
1562
1563 * - Function
1564 - Purpose
1565
1566 * - ``mutex_lock(...)``
1567 - Locks a mutex. If the mutex is unlocked this is a fast operation,
1568 else the function issues an RPC to wait in normal world.
1569
1570 * - ``mutex_unlock(...)``
1571 - Unlocks a mutex. If there is no waiters this is a fast operation,
1572 else the function issues an RPC to wake up a waiter in normal world.
1573
1574 * - ``mutex_trylock(...)``
1575 - Locks a mutex if unlocked and returns ``true`` else the mutex is
1576 unchanged and the function returns ``false``.
1577
1578 * - ``mutex_destroy(...)``
1579 - Asserts that the mutex is unlocked and there is no waiters, after
1580 this the memory used by the mutex can be freed.
1581
1582 When a mutex is locked it is owned by the thread calling ``mutex_lock(...)``
1583 or ``mutex_trylock(...)``, the mutex may only be unlocked by the thread
1584 owning the mutex. A thread should not exit to TA user space when holding a
1585 mutex.
1586
1587Condvar
1588 A condvar is represented by ``struct condvar``. A condvar is similar to a
1589 ``pthread_condvar_t`` in the pthreads standard, only less advanced.
1590 Condition variables are used to wait for some condition to be fulfilled and
1591 are always used together a mutex. Once a condition variable has been used
1592 together with a certain mutex, it must only be used with that mutex until
1593 destroyed. A condvar is initialized with ``CONDVAR_INITIALIZER`` or
1594 ``condvar_init(...)``.
1595
1596 .. list-table:: Condvar functions
1597 :header-rows: 1
1598 :widths: 1 5
1599
1600 * - Function
1601 - Purpose
1602
1603 * - ``condvar_wait(...)``
1604 - Atomically unlocks the supplied mutex and waits in normal world via
1605 an RPC for the condition variable to be signaled, when the function
1606 returns the mutex is locked again.
1607
1608 * - ``condvar_signal(...)``
1609 - Wakes up one waiter of the condition variable (waiting in
1610 ``condvar_wait(...)``).
1611
1612 * - ``condvar_broadcast(...)``
1613 - Wake up all waiters of the condition variable.
1614
1615 The caller of ``condvar_signal(...)`` or ``condvar_broadcast(...)`` should
1616 hold the mutex associated with the condition variable to guarantee that a
1617 waiter does not miss the signal.
1618
1619.. _core/arch/arm/kernel/thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1620.. _optee_msg.h: https://github.com/OP-TEE/optee_os/blob/master/core/include/optee_msg.h
1621.. _optee_smc.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/sm/optee_smc.h
1622.. _thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1623.. _thread.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/kernel/thread.h
1624
1625.. _ARM_DEN0028A_SMC_Calling_Convention: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
1626.. _Cortex-A53 TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf
1627.. _drivers/tee/optee: https://github.com/torvalds/linux/tree/master/drivers/tee/optee
1628.. _Trusted Firmware A: https://github.com/ARM-software/arm-trusted-firmware