blob: 5822b1a89f9556ed3b5b428e15b510b505b9e519 [file] [log] [blame]
Joakim Bech8e5c5b32018-10-25 08:18:32 +02001.. _core:
2
3####
4Core
5####
6
7.. _interrupt_handling:
8
9Interrupt handling
10******************
11This section describes how :ref:`optee_os` handles switches of world execution
12context based on :ref:`SMC` exceptions and interrupt notifications Interrupt
13notifications are IRQ/FIQ exceptions which may also imply switching of world
14execution context: normal world to secure world, or secure world to normal
15world.
16
17Use cases of world context switch
18=================================
19This section lists all the cases where optee_os is involved in world context
20switches. Optee_os executes in the secure world. World switch is done by the
21cores secure monitor level/mode, referred below as the Monitor.
22
23When the normal world invokes the secure world, the normal world executes a SMC
24instruction. The SMC exception is always trapped by the Monitor. If the related
25service targets the trusted OS, the Monitor will switch to optee_os world
26execution. When the secure world returns to the normal world, optee_os executes
27a SMC that is caught by the Monitor which switches back to the normal world.
28
29When a secure interrupt is signaled by the Arm GIC, it shall reach the optee_os
30interrupt exception vector. If the secure world is executing, optee_os will
31handle straight the interrupt from its exception vector. If the normal world is
32executing when the secure interrupt raises, the Monitor vector must handle the
33exception and invoke the optee_os to serve the interrupt.
34
35When a non-secure interrupt is signaled by the Arm GIC, it shall reach the
36normal world interrupt exception vector. If the normal world is executing, it
37will handle straight the exception from its exception vector. If the secure
38world is executing when the non-secure interrupt raises, optee_os will
39temporarily return back to normal world via the Monitor to let normal world
40serve the interrupt.
41
42Core exception vectors
43======================
44Monitor vector is ``VBAR_EL3`` in AArch64 and ``MVBAR`` in Armv7-A/AArch32.
45Monitor can be reached while normal world or secure world is executing. The
46executing secure state is known to the Monitor through the ``SCR_NS``.
47
48Monitor can be reached from a SMC exception, an IRQ or FIQ exception (so-called
49interrupts) and from asynchronous aborts. Obviously monitor aborts (data,
50prefetch, undef) are local to the Monitor execution.
51
52The Monitor can be external to optee_os (case ``CFG_WITH_ARM_TRUSTED_FW=y``).
53If not, provides a local secure monitor ``core/arch/arm/sm``. Armv7-A platforms
54should use the optee_os secure monitor. Armv8-A platforms are likely to rely on
55an `Trusted Firmware A`_.
56
57When executing outside the Monitor, the system is executing either in the
58normal world (``SCR_NS=1``) or in the secure world (``SCR_NS=0``). Each world
59owns its own exception vector table (state vector):
60
61 - ``VBAR_EL2`` or ``VBAR_EL1`` non-secure or ``VBAR_EL1`` secure for
62 AArch64.
63 - ``HVBAR`` or ``VBAR`` non-secure or ``VBAR`` secure for Armv7-A and
64 AArch32.
65
66All SMC exceptions are trapped in the Monitor vector. IRQ/FIQ exceptions can be
67trapped either in the Monitor vector or in the state vector of the executing
68world.
69
70When the normal world is executing, the system is configured to route:
71
72 - secure interrupts to the Monitor that will forward to optee_os
73 - non-secure interrupts to the executing world exception vector.
74
75When the secure world is executing, the system is configured to route:
76
77 - secure and non-secure interrupts to the executing optee_os exception
78 vector. optee_os shall forward the non-secure interrupts to the normal
79 world.
80
81Optee_os non-secure interrupts are always trapped in the state vector of the
82executing world. This is reflected by a static value of ``SCR_(IRQ|FIQ)``.
83
84.. _native_foreign_irqs:
85
86Native and foreign interrupts
87=============================
88Two types of interrupt are defined in optee_os:
89
90 - **Native interrupt** - The interrupt handled by optee_os (for example:
91 secure interrupt)
92 - **Foreign interrupt** - The interrupt not handled by optee_os (for
93 example: non-secure interrupt which is handled by normal world)
94
95For Arm **GICv2** mode, native interrupt is sent as FIQ and foreign interrupt
96is sent as IRQ. For Arm **GICv3** mode, foreign interrupt is sent as FIQ which
97could be handled by either secure world (aarch32 Monitor mode or aarch64 EL3)
98or normal world. Arm GICv3 mode can be enabled by setting ``CFG_ARM_GICV3=y``.
99For clarity, this document mainly chooses the GICv2 convention and refers the
100IRQ as optee_os foreign interrupts, and FIQ as optee_os native interrupts.
101Native interrupts must be securely routed to optee_os. Foreign interrupts, when
102trapped during secure world execution might need to be efficiently routed to
103the normal world.
104
105Normal World invokes optee_os using SMC
106=======================================
107
108**Entering the Secure Monitor**
109
110The monitor manages all entries and exits of secure world. To enter secure
111world from normal world the monitor saves the state of normal world (general
112purpose registers and system registers which are not banked) and restores the
113previous state of secure world. Then a return from exception is performed and
114the restored secure state is resumed. Exit from secure world to normal world is
115the reverse.
116
117Some general purpose registers are not saved and restored on entry and exit,
118those are used to pass parameters between secure and normal world (see
119ARM_DEN0028A_SMC_Calling_Convention_ for details).
120
121**Entry and exit of Trusted OS**
122
123On entry and exit of Trusted OS each CPU is uses a separate entry stack and runs
124with IRQ and FIQ blocked. SMCs are categorised in two flavors: **fast** and
125**standard**.
126
127 - For **fast** SMCs, optee_os will execute on the entry stack with IRQ/FIQ
128 blocked until the execution returns to normal world.
129
130 - For **standard** SMCs, optee_os will at some point execute the requested
131 service with interrupts unblocked. In order to handle interrupts, mainly
132 forwarding of foreign interrupts, optee_os assigns a trusted thread
133 (`core/arch/arm/kernel/thread.c`_) to the SMC request. The trusted thread
134 stores the execution context of the requested service. This context can be
135 suspended and resumed as the requested service executes and is
136 interrupted. The trusted thread is released only once the service
137 execution returns with a completion status.
138
139 For **standard** SMCs, optee_os allocates or resumes a trusted thread then
140 unblock the IRQ/FIQ lines. When the optee_os needs to invoke the normal
141 world from a foreign interrupt or a remote service call, optee_os blocks
142 IRQ/FIQ and suspends the trusted thread. When suspending, optee_os gets
143 back to the entry stack.
144
145 - **Both** fast and standard SMC end on the entry stack with IRQ/FIQ blocked
146 and optee_os invokes the Monitor through a SMC to return to the normal
147 world.
148
149.. figure:: ../images/core/interrupt_handling/tee_invoke.png
150 :figclass: align-center
151
152 SMC entry to secure world
153
154Deliver non-secure interrupts to Normal World
155=============================================
156This section uses the Arm GICv1/v2 conventions: IRQ signals non-secure
157interrupts while FIQ signals secure interrupts. On a GICv3 configuration, one
158should exchange IRQ and FIQ in this section.
159
160**Forward a Foreign Interrupt from Secure World to Normal World**
161
162When an IRQ is received in secure world as an IRQ exception then secure world:
163
164 1. Saves trusted thread context (entire state of all processor modes for
165 Armv7-A)
166
167 2. Blocks (masks) all interrupts (IRQ and FIQ)
168
169 3. Switches to entry stack
170
171 4. Issues an SMC with a value to indicates to normal world that an IRQ has
172 been delivered and last SMC call should be continued
173
174The monitor restores normal world context with a return code indicating that an
175IRQ is about to be delivered. Normal world issues a new SMC indicating that it
176should continue last SMC.
177
178The monitor restores secure world context which locates the previously saved
179context and checks that it is a return from IRQ that is requested before
180restoring the context and lets the secure world IRQ handler return from
181exception where the execution would be resumed.
182
183Note that the monitor itself does not know/care that it has just forwarded an
184IRQ to normal world. The bookkeeping is done in the trusted thread handling in
185Trusted OS. Normal world is responsible to decide when the secure world thread
186should resume execution (for details, see :ref:`thread_handling`).
187
188.. figure:: ../images/core/interrupt_handling/irq.png
189 :figclass: align-center
190
191 IRQ received in secure world and forwarded to normal world
192
193**Deliver a non-secure interrupt to normal world when ``SCR_NS`` is set**
194
195Since ``SCR_IRQ`` is cleared, an IRQ will be delivered using the state vector
196(``VBAR``) in the normal world. The IRQ is received as any other exception by
197normal world, the monitor and the Trusted OS are not involved at all.
198
199Deliver secure interrupts to Secure World
200=========================================
201This section uses the Arm GICv1/v2 conventions: FIQ signals secure interrupts
202while IRQ signals non-secure interrupts. On a GICv3 configuration, one should
203exchange IRQ and FIQ in this section. A FIQ can be received during two different
204states, either in normal world (``SCR_NS`` is set) or in secure world
205(``SCR_NS`` is cleared). When the secure monitor is active (Armv8-A EL3 or
206Armv7-A Monitor mode) FIQ is masked. FIQ reception in the two different states
207is described below.
208
209**Deliver FIQ to secure world when SCR_NS is set**
210
211When the monitor gets an FIQ exception it:
212
213 1. Saves normal world context and restores secure world context from last
214 secure world exit (which will have IRQ and FIQ blocked)
215 2. Clears ``SCR_FIQ`` when clearing ``SCR_NS``
216 3. Sets FIQ as parameter to secure world entry
217 4. Does a return from exception into secure context
218 5. Secure world unmasks FIQs because of the FIQ parameter
219 6. FIQ is received as in exception using the state vector
220 7. The state vector handle returns from exception in secure world
221 8. Secure world issues an SMC to return to normal world
222 9. Monitor saves secure world context and restores normal world context
223 10. Does a return from exception into restored context
224
225.. figure:: ../images/core/interrupt_handling/fiq.png
226 :figclass: align-center
227
228 FIQ received when SCR_NS is set
229
230.. figure:: ../images/core/interrupt_handling/irq_fiq.png
231 :figclass: align-center
232
233 FIQ received while processing an IRQ forwarded from secure world
234
235**Deliver FIQ to secure world when SCR_NS is cleared**
236
237Since ``SCR_FIQ`` is cleared when ``SCR_NS`` is cleared a FIQ will be delivered
238using the state vector (``VBAR``) in secure world. The FIQ is received as any
239other exception by Trusted OS, the monitor is not involved at all.
240
241Trusted thread scheduling
242=========================
243**Trusted thread for standard services**
244
245OP-TEE standard services are carried through standard SMC. Execution of these
246services can be interrupted by foreign interrupts. To suspend and restore the
247service execution, optee_os assigns a trusted thread at standard SMCs entry.
248
249The trusted thread terminates when optee_os returns to the normal world with a
250service completion status.
251
252A trusted thread execution can be interrupted by a native interrupt. In this
253case the native interrupt is handled by the interrupt exception handlers and
254once served, optee_os returns to the execution trusted thread.
255
256A trusted thread execution can be interrupted by a foreign interrupt. In this
257case, optee_os suspends the trusted thread and invokes the normal world through
258the Monitor (optee_os so-called RPC services). The trusted threads will resume
259only once normal world invokes the optee_os with the RPC service status.
260
261A trusted thread execution can lead optee_os to invoke a service in normal
262world: access a file, get the REE current time, etc. The trusted thread is
263suspended/resumed during remote service execution.
264
265**Scheduling considerations**
266
267When a trusted thread is interrupted by a foreign interrupt and when optee_os
268invokes a normal world service, the normal world gets the opportunity to
269reschedule the running applications. The trusted thread will resume only once
270the client application is scheduled back. Thus, a trusted thread execution
271follows the scheduling of the normal world caller context.
272
273Optee_os does not implement any thread scheduling. Each trusted thread is
274expected to track a service that is invoked from the normal world and should
275return to it with an execution status.
276
277The OP-TEE Linux driver (as implemented in `drivers/tee/optee`_ since Linux
278kernel 4.12) is designed so that the Linux thread invoking OP-TEE gets assigned
279a trusted thread on TEE side. The execution of the trusted thread is tied to the
280execution of the caller Linux thread which is under the Linux kernel scheduling
281decision. This means trusted threads are scheduled by the Linux kernel.
282
283**Trusted thread constraints**
284
285TEE core handles a static number of trusted threads, see ``CFG_NUM_THREADS``.
286
287Trusted threads are only expensive on memory constrained system, mainly
288regarding the execution stack size.
289
290On SMP systems, optee_os can execute several trusted threads in parallel if the
291normal world supports scheduling of processes. Even on UP systems, supporting
292several trusted threads in optee_os helps normal world scheduler to be
293efficient.
294
295----
296
297.. _memory_objects:
298
299Memory objects
300**************
301A memory object, **MOBJ**, describes a piece of memory. The interface provided
302is mostly abstract when it comes to using the MOBJ to populate translation
303tables etc. There are different kinds of MOBJs describing:
304
305 - Physically contiguous memory
306 - created with ``mobj_phys_alloc(...)``.
307
308 - Virtual memory
309 - one instance with the name ``mobj_virt`` available.
310 - spans the entire virtual address space.
311
312 - Physically contiguous memory allocated from a ``tee_mm_pool_t *``
313 - created with ``mobj_mm_alloc(...)``.
314
315 - Paged memory
316 - created with ``mobj_paged_alloc(...)``.
317 - only contains the supplied size and makes ``mobj_is_paged(...)``
318 return true if supplied as argument.
319
320 - Secure copy paged shared memory
321 - created with ``mobj_seccpy_shm_alloc(...)``.
322 - makes ``mobj_is_paged(...)`` and ``mobj_is_secure(...)`` return true
323 if supplied as argument.
324
325----
326
327.. _mmu:
328
329MMU
330***
331Translation tables
332==================
333OP-TEE uses several L1 translation tables, one large spanning 4 GiB and two or
334more small tables spanning 32 MiB. The large translation table handles kernel
335mode mapping and matches all addresses not covered by the small translation
336tables. The small translation tables are assigned per thread and covers the
337mapping of the virtual memory space for one TA context.
338
339Memory space between small and large translation table is configured by TTBRC.
340TTBR1 always points to the large translation table. TTBR0 points to the a small
341translation table when user mapping is active and to the large translation table
342when no user mapping is currently active. For details about registers etc,
343please refer to a Technical Reference Manual for your architecture, for example
344`Cortex-A53 TRM`_.
345
346The translation tables has certain alignment constraints, the alignment (of the
347physical address) has to be the same as the size of the translation table. The
348translation tables are statically allocated to avoid fragmentation of memory due
349to the alignment constraints.
350
351Each thread has one small L1 translation table of its own. Each TA context has a
352compact representation of its L1 translation table. The compact representation
353is used to initialize the thread specific L1 translation table when the TA
354context is activated.
355
356.. graphviz::
357
358 digraph xlat_table {
359 graph [
360 rankdir = "LR"
361 ];
362 node [
363 fontsize = "16"
364 shape = "ellipse"
365 ];
366 edge [
367 ];
368 "node_ttb" [
369 label = "<f0> TTBR0 | <f1> TTBR1"
370 shape = "record"
371 ];
372 "node_large_l1" [
373 label = "<f0> Large L1\nSpans 4 GiB"
374 shape = "record"
375 ];
376 "node_small_l1" [
377 label = "Small L1\nSpans 32 MiB\nper entry | <f0> 0 | <f1> 1 | ... | <fn> n"
378 shape = "record"
379 ];
380
381 "node_ttb":f0 -> "node_small_l1":f0 [ label = "Thread 0 ctx active" ];
382 "node_ttb":f0 -> "node_small_l1":f1 [ label = "Thread 1 ctx active" ];
383 "node_ttb":f0 -> "node_small_l1":fn [ label = "Thread n ctx active" ];
384 "node_ttb":f0 -> "node_large_l1" [ label="No active ctx" ];
385 "node_ttb":f1 -> "node_large_l1";
386 }
387
Jens Wiklander03b05a02019-02-25 13:44:38 +0100388Page table cache
389================
390Page tables used to map TAs are managed with the page table cache. When the
391context of a TA is unmapped, all its page tables are released with a call
392to ``pgt_free()``. All page tables needed when mapping a TA are allocated
393using ``pgt_alloc()``.
394
395A fixed maximum number of translation tables are available in a pool. One
396thread may execute a TA which needs all or almost all tables. This can
397block TAs from being executed by other threads. To ensure that all TAs
398eventually will be permitted to execute ``pgt_alloc()`` temporarily frees
399eventual tables allocated before waiting for tables to become available.
400
401The page table cache behaves differently depending on configuration
402options.
403
404Without paging (``CFG_WITH_PAGER=n``)
405-------------------------------------
406This is the easiest configuration. All page tables are statically allocated
407in the ``.nozi.pgt_cache`` section. ``pgt_alloc()`` allocates tables from the
408free-list and ``pgt_free()`` returns the tables directly to the free-list.
409
410With paging enabled (``CFG_WITH_PAGER=y``)
411------------------------------------------
412
413Page tables are allocated as zero initialized locked pages during boot
414using ``tee_pager_alloc()``. Locked pages are populated with physical pages
415on demand from the pager. The physical page can be released when not needed
416any longer with ``tee_pager_release_phys()``.
417
418With ``CFG_WITH_LPAE=y`` each translation table has the same size as a
419physical page which makes it easy to release the physical page when the
420translation table isn't needed any longer. With the short-descriptor table
421format (``CFG_WITH_LPAE=n``) it becomes more complicated as four
422translation tables are stored in each page. Additional bookkeeping is used
423to tell when the page for used by four separate translation tables can be
424released.
425
426With paging of user TA enabled (``CFG_PAGED_USER_TA=y``)
427--------------------------------------------------------
428With paging of user TAs enabled a cache of recently used translation tables
429is used. This can save us from a storm of page faults when restoring the
430mappings of a recently unmapped TA. Which translation tables should be
431cached is indicated with reference counting by the pager on used tables.
432When a table needs to be forcefully freed
433``tee_pager_pgt_save_and_release_entries()`` is called to let the pager
434know that the table can't be used any longer.
435
436When a mapping in a TA is removed it also needs to be purged from cached
437tables with ``pgt_flush_ctx_range()`` to prevent old mappings from being
438accidentally reused.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200439
440Switching to user mode
441======================
442This section only applies with following configuration flags:
443
444 - ``CFG_WITH_LPAE=n``
445 - ``CFG_CORE_UNMAP_CORE_AT_EL0=y``
446
447When switching to user mode only a minimal kernel mode mapping is kept. This is
448achieved by selecting a zeroed out big L1 translation in TTBR1 when
449transitioning to user mode. When returning back to kernel mode the original L1
450translation table is restored in TTBR1.
451
452Switching to normal world
453=========================
454When switching to normal world either via a foreign interrupt (see
455:ref:`native_foreign_irqs` or RPC there is a chance that secure world will
456resume execution on a different CPU. This means that the new CPU need to be
457configured with the context of the currently active TA. This is solved by always
Jens Wiklanderddde3a82019-02-25 12:46:18 +0100458setting the TA context in the CPU when resuming execution.
Joakim Bech8e5c5b32018-10-25 08:18:32 +0200459
460----
461
462.. _pager:
463
464Pager
465*****
466OP-TEE currently requires >256 KiB RAM for OP-TEE kernel memory. This is not a
467problem if OP-TEE uses TrustZone protected DDR, but for security reasons OP-TEE
468may need to use TrustZone protected SRAM instead. The amount of available SRAM
469varies between platforms, from just a few KiB up to over 512 KiB. Platforms with
470just a few KiB of SRAM cannot be expected to be able to run a complete TEE
471solution in SRAM. But those with 128 to 256 KiB of SRAM can be expected to have
472a capable TEE solution in SRAM. The pager provides a solution to this by demand
473paging parts of OP-TEE using virtual memory.
474
475Secure memory
476=============
477TrustZone protected SRAM is generally considered more secure than TrustZone
478protected DRAM as there is usually more attack vectors on DRAM. The attack
479vectors are hardware dependent and can be different for different platforms.
480
481Backing store
482=============
483TrustZone protected DRAM or in some cases non-secure DRAM is used as backing
484store. The data in the backing store is integrity protected with one hash
485(SHA-256) per page (4KiB). Readonly pages are not encrypted since the OP-TEE
486binary itself is not encrypted.
487
488Partitioning of memory
489======================
490The code that handles demand paging must always be available as it would
491otherwise lead to deadlock. The virtual memory is partitioned as:
492
493 +--------------+-------------------+
494 | Type | Sections |
495 +==============+===================+
496 | unpaged | | text |
497 | | | rodata |
498 | | | data |
499 | | | bss |
500 | | | heap1 |
501 | | | nozi |
502 | | | heap2 |
503 +--------------+-------------------+
504 | init / paged | | text_init |
505 | | | rodata_init |
506 +--------------+-------------------+
507 | paged | | text_pageable |
508 | | | rodata_pageable |
509 +--------------+-------------------+
510 | demand alloc | |
511 +--------------+-------------------+
512
513Where ``nozi`` stands for "not zero initialized", this section contains entry
514stacks (thread stack when TEE pager is not enabled) and translation tables (TEE
515pager cached translation table when the pager is enabled and LPAE MMU is used).
516
517The ``init`` area is available when OP-TEE is initializing and contains
518everything that is needed to initialize the pager. After the pager has been
519initialized this area will be used for demand paged instead.
520
521The ``demand alloc`` area is a special area where the pages are allocated and
522removed from the pager on demand. Those pages are returned when OP-TEE does not
523need them any longer. The thread stacks currently belongs this area. This means
524that when a stack is not used the physical pages can be used by the pager for
525better performance.
526
527The technique to gather code in the different area is based on compiling all
528functions and data into separate sections. The unpaged text and rodata is then
529gathered by linking all object files with ``--gc-sections`` to eliminate
530sections that are outside the dependency graph of the entry functions for
531unpaged functions. A script analyzes this ELF file and generates the bits of the
532final link script. The process is repeated for init text and rodata. What is
533not "unpaged" or "init" becomes "paged".
534
535Partitioning of the binary
536==========================
537.. note::
538 The struct definitions provided in this section are explicitly covered by
539 the following dual license:
540
541 .. code-block:: none
542
543 SPDX-License-Identifier: (BSD-2-Clause OR GPL-2.0)
544
545The binary is partitioned into four parts as:
546
547
548 +----------+
549 | Binary |
550 +==========+
551 | Header |
552 +----------+
553 | Init |
554 +----------+
555 | Hashes |
556 +----------+
557 | Pageable |
558 +----------+
559
560The header is defined as:
561
562.. code-block:: c
563
564 #define OPTEE_MAGIC 0x4554504f
565 #define OPTEE_VERSION 1
566 #define OPTEE_ARCH_ARM32 0
567 #define OPTEE_ARCH_ARM64 1
568
569 struct optee_header {
570 uint32_t magic;
571 uint8_t version;
572 uint8_t arch;
573 uint16_t flags;
574 uint32_t init_size;
575 uint32_t init_load_addr_hi;
576 uint32_t init_load_addr_lo;
577 uint32_t init_mem_usage;
578 uint32_t paged_size;
579 };
580
581The header is only used by the loader of OP-TEE, not OP-TEE itself. To
582initialize OP-TEE the loader loads the complete binary into memory and copies
583what follows the header and the following ``init_size`` bytes to
584``(init_load_addr_hi << 32 | init_load_addr_lo)``. ``init_mem_usage`` is used by
585the loader to be able to check that there is enough physical memory available
586for OP-TEE to be able to initialize at all. The loader supplies in ``r0/x0`` the
587address of the first byte following what was not copied and jumps to the load
588address to start OP-TEE.
589
590In addition to overall binary with partitions inside described as above, three
591extra binaries are generated simultaneously during build process for loaders who
592support loading separate binaries:
593
594 +-----------+
595 | v2 binary |
596 +===========+
597 | Header |
598 +-----------+
599
600 +-----------+
601 | v2 binary |
602 +===========+
603 | Init |
604 +-----------+
605 | Hashes |
606 +-----------+
607
608 +-----------+
609 | v2 binary |
610 +===========+
611 | Pageable |
612 +-----------+
613
614In this case, loaders load header binary first to get image list and information
615of each image; and then load each of them into specific load address assigned in
616structure. These binaries are named with `v2` suffix to distinguish from the
617existing binaries. Header format is updated to help loaders loading binaries
618efficiently:
619
620.. code-block:: c
621
622 #define OPTEE_IMAGE_ID_PAGER 0
623 #define OPTEE_IMAGE_ID_PAGED 1
624
625 struct optee_image {
626 uint32_t load_addr_hi;
627 uint32_t load_addr_lo;
628 uint32_t image_id;
629 uint32_t size;
630 };
631
632 struct optee_header_v2 {
633 uint32_t magic;
634 uint8_t version;
635 uint8_t arch;
636 uint16_t flags;
637 uint32_t nb_images;
638 struct optee_image optee_image[];
639 };
640
641Magic number and architecture are identical as original. Version is increased to
642two. ``load_addr_hi`` and ``load_addr_lo`` may be ``0xFFFFFFFF`` for pageable
643binary since pageable part may get loaded by loader into dynamic available
644position. ``image_id`` indicates how loader handles current binary. Loaders who
645don't support separate loading just ignore all v2 binaries.
646
647Initializing the pager
648======================
649The pager is initialized as early as possible during boot in order to minimize
650the "init" area. The global variable ``tee_mm_vcore`` describes the virtual
651memory range that is covered by the level 2 translation table supplied to
652``tee_pager_init(...)``.
653
654Assign pageable areas
655---------------------
656A virtual memory range to be handled by the pager is registered with a call to
657``tee_pager_add_core_area()``.
658
659.. code-block:: c
660
661 bool tee_pager_add_area(tee_mm_entry_t *mm,
662 uint32_t flags,
663 const void *store,
664 const void *hashes);
665
666which takes a pointer to ``tee_mm_entry_t`` to tell the range, flags to tell how
667memory should be mapped (readonly, execute etc), and pointers to backing store
668and hashes of the pages.
669
670Assign physical pages
671---------------------
672Physical SRAM pages are supplied by calling ``tee_pager_add_pages(...)``
673
674.. code-block:: c
675
676 void tee_pager_add_pages(tee_vaddr_t vaddr,
677 size_t npages,
678 bool unmap);
679
680``tee_pager_add_pages(...)`` takes the physical address stored in the entry
681mapping the virtual address ``vaddr`` and ``npages`` entries after that and uses
682it to map new pages when needed. The unmap parameter tells whether the pages
683should be unmapped immediately since they does not contain initialized data or
684be kept mapped until they need to be recycled. The pages in the "init" area are
685supplied with ``unmap == false`` since those page have valid content and are in
686use.
687
688Invocation
689==========
690The pager is invoked as part of the abort handler. A pool of physical pages are
691used to map different virtual addresses. When a new virtual address needs to be
692mapped a free physical page is mapped at the new address, if a free physical
693page cannot be found the oldest physical page is selected instead. When the page
694is mapped new data is copied from backing store and the hash of the page is
695verified. If it is OK the pager returns from the exception to resume the
696execution.
697
698Paging of user TA
699=================
700Paging of user TAs can optionally be enabled with ``CFG_PAGED_USER_TA=y``.
701Paging of user TAs is analogous to paging of OP-TEE kernel parts but with a few
702differences:
703
704 - Read/write pages are paged in addition to read-only pages
705 - Page tables are managed dynamically
706
707``tee_pager_add_uta_area(...)`` is used to setup initial read/write mapping
708needed when populating the TA. When the TA is fully populated and relocated
709``tee_pager_set_uta_area_attr(...)`` changes the mapping of the area to strict
710permissions used when the TA is running.
711
712----
713
714.. _stacks:
715
716Stacks
717******
718Different stacks are used during different stages. The stacks are:
719
720 - **Secure monitor stack** (128 bytes), bound to the CPU. Only available if
721 OP-TEE is compiled with a secure monitor always the case if the target is
722 Armv7-A but never for Armv8-A.
723
724 - **Temp stack** (small ~1KB), bound to the CPU. Used when transitioning
725 from one state to another. Interrupts are always disabled when using this
726 stack, aborts are fatal when using the temp stack.
727
728 - **Abort stack** (medium ~2KB), bound to the CPU. Used when trapping a data
729 or pre-fetch abort. Aborts from user space are never fatal the TA is only
730 killed. Aborts from kernel mode are used by the pager to do the demand
731 paging, if pager is disabled all kernel mode aborts are fatal.
732
733 - **Thread stack** (large ~8KB), not bound to the CPU instead used by the
734 current thread/task. Interrupts are usually enabled when using this stack.
735
736Notes for Armv7-A/AArch32
737 .. list-table::
738 :header-rows: 1
739 :widths: 1 5
740
741 * - Stack
742 - Comment
743
744 * - Temp
745 - Assigned to ``SP_SVC`` during entry/exit, always assigned to
746 ``SP_IRQ`` and ``SP_FIQ``
747
748 * - Abort
749 - Always assigned to ``SP_ABT``
750
751 * - Thread
752 - Assigned to ``SP_SVC`` while a thread is active
753
754Notes for AArch64
755 There are only two stack pointers, ``SP_EL1`` and ``SP_EL0``, available for
756 OP-TEE in AArch64. When an exception is received stack pointer is always
757 ``SP_EL1`` which is used temporarily while assigning an appropriate stack
758 pointer for ``SP_EL0``. ``SP_EL1`` is always assigned the value of
759 ``thread_core_local[cpu_id]``. This structure has some spare space for
760 temporary storage of registers and also keeps the relevant stack pointers.
761 In general when we talk about assigning a stack pointer to the CPU below we
762 mean ``SP_EL0``.
763
764Boot
765====
766During early boot the CPU is configured with the temp stack which is used until
767OP-TEE exits to normal world the first time.
768
769Notes for AArch64
770 ``SPSEL`` is always ``0`` on entry/exit to have ``SP_EL0`` acting as stack
771 pointer.
772
773Normal entry
774============
775Each time OP-TEE is entered from normal world the temp stack is used as the
776initial stack. For fast calls, this is the only stack used. For normal calls an
777empty thread slot is selected and the CPU switches to that stack.
778
779Normal exit
780===========
781Normal exit occurs when a thread has finished its task and the thread is freed.
782When the main thread function, ``tee_entry_std(...)``, returns interrupts are
783disabled and the CPU switches to the temp stack instead. The thread is freed and
784OP-TEE exits to normal world.
785
786RPC exit
787========
788RPC exit occurs when OP-TEE need some service from normal world. RPC can
789currently only be performed with a thread is in running state. RPC is initiated
790with a call to ``thread_rpc(...)`` which saves the state in a way that when the
791thread is restored it will continue at the next instruction as if this function
792did a normal return. CPU switches to use the temp stack before returning to
793normal world.
794
795Foreign interrupt exit
796======================
797Foreign interrupt exit occurs when OP-TEE receives a foreign interrupt. For Arm
798GICv2 mode, foreign interrupt is sent as IRQ which is always handled in normal
799world. Foreign interrupt exit is similar to RPC exit but it is
800``thread_irq_handler(...)`` and ``elx_irq(...)`` (respectively for
801Armv7-A/Aarch32 and for Aarch64) that saves the thread state instead. The thread
802is resumed in the same way though. For Arm GICv3 mode, foreign interrupt is sent
803as FIQ which could be handled by either secure world (EL3 in AArch64) or normal
804world. This mode is not supported yet.
805
806Notes for Armv7-A/AArch32
807 SP_IRQ is initialized to temp stack instead of a separate stack. Prior to
808 exiting to normal world CPU state is changed to SVC and temp stack is
809 selected.
810
811Notes for AArch64
812 ``SP_EL0`` is assigned temp stack and is selected during IRQ processing. The
813 original ``SP_EL0`` is saved in the thread context to be restored when
814 resuming.
815
816Resume entry
817============
818OP-TEE is entered using the temp stack in the same way as for normal entry. The
819thread to resume is looked up and the state is restored to resume execution. The
820procedure to resume from an RPC exit or an foreign interrupt exit is exactly the
821same.
822
823Syscall
824=======
825Syscall's are executed using the thread stack.
826
827Notes for Armv7-A/AArch32
828 Nothing special ``SP_SVC`` is already set with thread stack.
829
830Notes for syscall AArch64
831 Early in the exception processing the original ``SP_EL0`` is saved in
832 ``struct thread_svc_regs`` in case the TA is executed in AArch64. Current
833 thread stack is assigned to ``SP_EL0`` which is then selected. When
834 returning ``SP_EL0`` is assigned what is in ``struct thread_svc_regs``. This
835 allows ``tee_svc_sys_return_helper(...)`` having the syscall exception
836 handler return directly to ``thread_unwind_user_mode(...)``.
837
838----
839
840.. _shared_memory:
841
842Shared Memory
843*************
844Shared Memory is a block of memory that is shared between the non-secure and the
845secure world. It is used to transfer data between both worlds.
846
847Shared Memory Allocation
848========================
849The shared memory is allocated by the Linux driver from a pool ``struct
850shm_pool``, the pool contains:
851
852 - The physical address of the start of the pool
853 - The size of the pool
854 - Whether or not the memory is cached
855 - List of chunk of memory allocated.
856
857.. note::
858 - The shared memory pool is physically contiguous.
859 - The shared memory area is **not secure** as it is used by both non-secure
860 and secure world.
861
862Shared Memory Configuration
863===========================
864It is the Linux kernel driver for OP-TEE that is responsible for initializing
865the shared memory pool, given information provided by the OP-TEE core. The
866Linux driver issues a SMC call ``OPTEE_SMC_GET_SHM_CONFIG`` to retrieve the
867information
868
869 - Physical address of the start of the pool
870 - Size of the pool
871 - Whether or not the memory is cached
872
873The shared memory pool configuration is platform specific. The memory mapping,
874including the area ``MEM_AREA_NSEC_SHM`` (shared memory with non-secure world),
875is retrieved by calling the platform-specific function ``bootcfg_get_memory()``.
876Please refer to this function and the area type ``MEM_AREA_NSEC_SHM`` to see the
877configuration for the platform of interest. The Linux driver will then
878initialize the shared memory pool accordingly.
879
880.. todo::
881
882 Joakim: bootcfg_get_memory(...) is no longer in our code. Text needs update.
883
884Shared Memory Chunk Allocation
885==============================
886It is the Linux kernel driver for OP-TEE that is responsible for allocating
887chunks of shared memory. OP-TEE linux kernel driver relies on linux kernel
888generic allocation support (``CONFIG_GENERIC_ALLOCATION``) to allocation/release
889of shared memory physical chunks. OP-TEE linux kernel driver relies on linux
890kernel dma-buf support (``CONFIG_DMA_SHARED_BUFFER``) to track shared memory
891buffers references.
892
893Using shared memory
894===================
895From the Client Application
896 The client application can ask for shared memory allocation using the
897 GlobalPlatform Client API function ``TEEC_AllocateSharedMemory(...)``. The
898 client application can also provide shared memory through the GlobalPlatform
899 Client API function ``TEEC_RegisterSharedMemory(...)``. In such a case, the
900 provided memory must be physically contiguous, since OP-TEE core, who does
901 not handle scatter-gather memory, is able to use the provided range of
902 memory addresses. Note that the reference count of a shared memory chunk is
903 incremented when shared memory is registered, and initialized to 1 on
904 allocation.
905
906From the Linux Driver
907 Occasionally the Linux kernel driver needs to allocate shared memory for the
908 communication with secure world, for example when using buffers of type
909 ``TEEC_TempMemoryReference``.
910
911From OP-TEE core
912 In case OP-TEE core needs information from TEE supplicant (dynamic TA
913 loading, REE time request,...), shared memory must be allocated. Allocation
914 depends on the use case. OP-TEE core asks for the following shared memory
915 allocation:
916
917 - ``optee_msg_arg`` structure, used to pass the arguments to the
918 non-secure world, where the allocation will be done by sending a
919 ``OPTEE_SMC_RPC_FUNC_ALLOC`` message.
920
921 - In some cases, a payload might be needed for storing the result from
922 TEE supplicant, for example when loading a Trusted Application. This
923 type of allocation will be done by sending the message
924 ``OPTEE_MSG_RPC_CMD_SHM_ALLOC(OPTEE_MSG_RPC_SHM_TYPE_APPL,...)``,
925 which then will return:
926
927 - the physical address of the shared memory
928 - a handle to the memory, that later on will be used later on when
929 freeing this memory.
930
931From TEE Supplicant
932 TEE supplicant is also working with shared memory, used to exchange data
933 between normal and secure worlds. TEE supplicant receives a memory address
934 from the OP-TEE core, used to store the data. This is for example the case
935 when a Trusted Application is loaded. In this case, TEE supplicant must
936 register the provided shared memory in the same way a client application
937 would do, involving the Linux driver.
938
939----
940
941.. _smc:
942
943SMC
944***
945SMC Interface
946=============
947OP-TEE's SMC interface is defined in two levels using optee_smc.h_ and
948optee_msg.h_. The former file defines SMC identifiers and what is passed in the
949registers for each SMC. The latter file defines the OP-TEE Message protocol
950which is not restricted to only SMC even if that currently is the only option
951available.
952
953SMC communication
954=================
955The main structure used for the SMC communication is defined in ``struct
956optee_msg_arg`` (in optee_msg.h_). If we are looking into the source code, we
957could see that communication mainly is achieved using ``optee_msg_arg`` and
958``thread_smc_args`` (in thread.h_), where ``optee_msg_arg`` could be seen as the
959main structure. What will happen is that the :ref:`linux_kernel` driver will get
960the parameters either from :ref:`optee_client` or directly from an internal
961service in Linux kernel. The TEE driver will populate the struct
962``optee_msg_arg`` with the parameters plus some additional bookkeeping
963information. Parameters for the SMC are passed in registers 1 to 7, register 0
964holds the SMC id which among other things tells whether it is a standard or a
965fast call.
966
967----
968
969.. _thread_handling:
970
971Thread handling
972***************
973OP-TEE core uses a couple of threads to be able to support running jobs in
974parallel (not fully enabled!). There are handlers for different purposes. In
975thread.c_ you will find a function called ``thread_init_primary(...)`` which
976assigns ``init_handlers`` (functions) that should be called when OP-TEE core
977receives standard or fast calls, FIQ and PSCI calls. There are default handlers
978for these services, but the platform can decide if they want to implement their
979own platform specific handlers instead.
980
981Synchronization primitives
982==========================
983OP-TEE has three primitives for synchronization of threads and CPUs:
984*spin-lock*, *mutex*, and *condvar*.
985
986Spin-lock
987 A spin-lock is represented as an ``unsigned int``. This is the most
988 primitive lock. Interrupts should be disabled before attempting to take a
989 spin-lock and should remain disabled until the lock is released. A spin-lock
990 is initialized with ``SPINLOCK_UNLOCK``.
991
992 .. list-table:: Spin lock functions
993 :header-rows: 1
994 :widths: 1 5
995
996 * - Function
997 - Purpose
998
999 * - ``cpu_spin_lock(...)``
1000 - Locks a spin-lock
1001
1002 * - ``cpu_spin_trylock(...)``
1003 - Locks a spin-lock if unlocked and returns ``0`` else the spin-lock
1004 is unchanged and the function returns ``!0``
1005
1006 * - ``cpu_spin_unlock(...)``
1007 - Unlocks a spin-lock
1008
1009Mutex
1010 A mutex is represented by ``struct mutex``. A mutex can be locked and
1011 unlocked with interrupts enabled or disabled, but only from a normal thread.
1012 A mutex cannot be used in an interrupt handler, abort handler or before a
1013 thread has been selected for the CPU. A mutex is initialized with either
1014 ``MUTEX_INITIALIZER`` or ``mutex_init(...)``.
1015
1016 .. list-table:: Mutex functions
1017 :header-rows: 1
1018 :widths: 1 5
1019
1020 * - Function
1021 - Purpose
1022
1023 * - ``mutex_lock(...)``
1024 - Locks a mutex. If the mutex is unlocked this is a fast operation,
1025 else the function issues an RPC to wait in normal world.
1026
1027 * - ``mutex_unlock(...)``
1028 - Unlocks a mutex. If there is no waiters this is a fast operation,
1029 else the function issues an RPC to wake up a waiter in normal world.
1030
1031 * - ``mutex_trylock(...)``
1032 - Locks a mutex if unlocked and returns ``true`` else the mutex is
1033 unchanged and the function returns ``false``.
1034
1035 * - ``mutex_destroy(...)``
1036 - Asserts that the mutex is unlocked and there is no waiters, after
1037 this the memory used by the mutex can be freed.
1038
1039 When a mutex is locked it is owned by the thread calling ``mutex_lock(...)``
1040 or ``mutex_trylock(...)``, the mutex may only be unlocked by the thread
1041 owning the mutex. A thread should not exit to TA user space when holding a
1042 mutex.
1043
1044Condvar
1045 A condvar is represented by ``struct condvar``. A condvar is similar to a
1046 ``pthread_condvar_t`` in the pthreads standard, only less advanced.
1047 Condition variables are used to wait for some condition to be fulfilled and
1048 are always used together a mutex. Once a condition variable has been used
1049 together with a certain mutex, it must only be used with that mutex until
1050 destroyed. A condvar is initialized with ``CONDVAR_INITIALIZER`` or
1051 ``condvar_init(...)``.
1052
1053 .. list-table:: Condvar functions
1054 :header-rows: 1
1055 :widths: 1 5
1056
1057 * - Function
1058 - Purpose
1059
1060 * - ``condvar_wait(...)``
1061 - Atomically unlocks the supplied mutex and waits in normal world via
1062 an RPC for the condition variable to be signaled, when the function
1063 returns the mutex is locked again.
1064
1065 * - ``condvar_signal(...)``
1066 - Wakes up one waiter of the condition variable (waiting in
1067 ``condvar_wait(...)``).
1068
1069 * - ``condvar_broadcast(...)``
1070 - Wake up all waiters of the condition variable.
1071
1072 The caller of ``condvar_signal(...)`` or ``condvar_broadcast(...)`` should
1073 hold the mutex associated with the condition variable to guarantee that a
1074 waiter does not miss the signal.
1075
1076.. _core/arch/arm/kernel/thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1077.. _optee_msg.h: https://github.com/OP-TEE/optee_os/blob/master/core/include/optee_msg.h
1078.. _optee_smc.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/sm/optee_smc.h
1079.. _thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c
1080.. _thread.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/kernel/thread.h
1081
1082.. _ARM_DEN0028A_SMC_Calling_Convention: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
1083.. _Cortex-A53 TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf
1084.. _drivers/tee/optee: https://github.com/torvalds/linux/tree/master/drivers/tee/optee
1085.. _Trusted Firmware A: https://github.com/ARM-software/arm-trusted-firmware