Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 1 | .. _core: |
| 2 | |
| 3 | #### |
| 4 | Core |
| 5 | #### |
| 6 | |
| 7 | .. _interrupt_handling: |
| 8 | |
| 9 | Interrupt handling |
| 10 | ****************** |
| 11 | This section describes how :ref:`optee_os` handles switches of world execution |
Joakim Bech | 1e50686 | 2019-06-24 10:00:51 +0200 | [diff] [blame] | 12 | context based on :ref:`SMC` exceptions and interrupt notifications. Interrupt |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 13 | notifications are IRQ/FIQ exceptions which may also imply switching of world |
| 14 | execution context: normal world to secure world, or secure world to normal |
| 15 | world. |
| 16 | |
| 17 | Use cases of world context switch |
| 18 | ================================= |
| 19 | This section lists all the cases where optee_os is involved in world context |
| 20 | switches. Optee_os executes in the secure world. World switch is done by the |
Joakim Bech | eb39780 | 2019-09-13 11:45:06 +0200 | [diff] [blame^] | 21 | core's secure monitor level/mode, referred below as the Monitor. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 22 | |
| 23 | When the normal world invokes the secure world, the normal world executes a SMC |
| 24 | instruction. The SMC exception is always trapped by the Monitor. If the related |
| 25 | service targets the trusted OS, the Monitor will switch to optee_os world |
| 26 | execution. When the secure world returns to the normal world, optee_os executes |
| 27 | a SMC that is caught by the Monitor which switches back to the normal world. |
| 28 | |
| 29 | When a secure interrupt is signaled by the Arm GIC, it shall reach the optee_os |
| 30 | interrupt exception vector. If the secure world is executing, optee_os will |
| 31 | handle straight the interrupt from its exception vector. If the normal world is |
| 32 | executing when the secure interrupt raises, the Monitor vector must handle the |
| 33 | exception and invoke the optee_os to serve the interrupt. |
| 34 | |
| 35 | When a non-secure interrupt is signaled by the Arm GIC, it shall reach the |
| 36 | normal world interrupt exception vector. If the normal world is executing, it |
| 37 | will handle straight the exception from its exception vector. If the secure |
| 38 | world is executing when the non-secure interrupt raises, optee_os will |
| 39 | temporarily return back to normal world via the Monitor to let normal world |
| 40 | serve the interrupt. |
| 41 | |
| 42 | Core exception vectors |
| 43 | ====================== |
| 44 | Monitor vector is ``VBAR_EL3`` in AArch64 and ``MVBAR`` in Armv7-A/AArch32. |
| 45 | Monitor can be reached while normal world or secure world is executing. The |
| 46 | executing secure state is known to the Monitor through the ``SCR_NS``. |
| 47 | |
| 48 | Monitor can be reached from a SMC exception, an IRQ or FIQ exception (so-called |
| 49 | interrupts) and from asynchronous aborts. Obviously monitor aborts (data, |
| 50 | prefetch, undef) are local to the Monitor execution. |
| 51 | |
| 52 | The Monitor can be external to optee_os (case ``CFG_WITH_ARM_TRUSTED_FW=y``). |
| 53 | If not, provides a local secure monitor ``core/arch/arm/sm``. Armv7-A platforms |
| 54 | should use the optee_os secure monitor. Armv8-A platforms are likely to rely on |
| 55 | an `Trusted Firmware A`_. |
| 56 | |
| 57 | When executing outside the Monitor, the system is executing either in the |
| 58 | normal world (``SCR_NS=1``) or in the secure world (``SCR_NS=0``). Each world |
| 59 | owns its own exception vector table (state vector): |
| 60 | |
| 61 | - ``VBAR_EL2`` or ``VBAR_EL1`` non-secure or ``VBAR_EL1`` secure for |
| 62 | AArch64. |
| 63 | - ``HVBAR`` or ``VBAR`` non-secure or ``VBAR`` secure for Armv7-A and |
| 64 | AArch32. |
| 65 | |
| 66 | All SMC exceptions are trapped in the Monitor vector. IRQ/FIQ exceptions can be |
| 67 | trapped either in the Monitor vector or in the state vector of the executing |
| 68 | world. |
| 69 | |
| 70 | When the normal world is executing, the system is configured to route: |
| 71 | |
| 72 | - secure interrupts to the Monitor that will forward to optee_os |
| 73 | - non-secure interrupts to the executing world exception vector. |
| 74 | |
| 75 | When the secure world is executing, the system is configured to route: |
| 76 | |
| 77 | - secure and non-secure interrupts to the executing optee_os exception |
| 78 | vector. optee_os shall forward the non-secure interrupts to the normal |
| 79 | world. |
| 80 | |
| 81 | Optee_os non-secure interrupts are always trapped in the state vector of the |
| 82 | executing world. This is reflected by a static value of ``SCR_(IRQ|FIQ)``. |
| 83 | |
| 84 | .. _native_foreign_irqs: |
| 85 | |
| 86 | Native and foreign interrupts |
| 87 | ============================= |
| 88 | Two types of interrupt are defined in optee_os: |
| 89 | |
| 90 | - **Native interrupt** - The interrupt handled by optee_os (for example: |
| 91 | secure interrupt) |
| 92 | - **Foreign interrupt** - The interrupt not handled by optee_os (for |
| 93 | example: non-secure interrupt which is handled by normal world) |
| 94 | |
| 95 | For Arm **GICv2** mode, native interrupt is sent as FIQ and foreign interrupt |
| 96 | is sent as IRQ. For Arm **GICv3** mode, foreign interrupt is sent as FIQ which |
| 97 | could be handled by either secure world (aarch32 Monitor mode or aarch64 EL3) |
| 98 | or normal world. Arm GICv3 mode can be enabled by setting ``CFG_ARM_GICV3=y``. |
| 99 | For clarity, this document mainly chooses the GICv2 convention and refers the |
| 100 | IRQ as optee_os foreign interrupts, and FIQ as optee_os native interrupts. |
| 101 | Native interrupts must be securely routed to optee_os. Foreign interrupts, when |
| 102 | trapped during secure world execution might need to be efficiently routed to |
| 103 | the normal world. |
| 104 | |
| 105 | Normal World invokes optee_os using SMC |
| 106 | ======================================= |
| 107 | |
| 108 | **Entering the Secure Monitor** |
| 109 | |
| 110 | The monitor manages all entries and exits of secure world. To enter secure |
| 111 | world from normal world the monitor saves the state of normal world (general |
| 112 | purpose registers and system registers which are not banked) and restores the |
| 113 | previous state of secure world. Then a return from exception is performed and |
| 114 | the restored secure state is resumed. Exit from secure world to normal world is |
| 115 | the reverse. |
| 116 | |
| 117 | Some general purpose registers are not saved and restored on entry and exit, |
| 118 | those are used to pass parameters between secure and normal world (see |
| 119 | ARM_DEN0028A_SMC_Calling_Convention_ for details). |
| 120 | |
| 121 | **Entry and exit of Trusted OS** |
| 122 | |
| 123 | On entry and exit of Trusted OS each CPU is uses a separate entry stack and runs |
| 124 | with IRQ and FIQ blocked. SMCs are categorised in two flavors: **fast** and |
| 125 | **standard**. |
| 126 | |
| 127 | - For **fast** SMCs, optee_os will execute on the entry stack with IRQ/FIQ |
| 128 | blocked until the execution returns to normal world. |
| 129 | |
| 130 | - For **standard** SMCs, optee_os will at some point execute the requested |
| 131 | service with interrupts unblocked. In order to handle interrupts, mainly |
| 132 | forwarding of foreign interrupts, optee_os assigns a trusted thread |
| 133 | (`core/arch/arm/kernel/thread.c`_) to the SMC request. The trusted thread |
| 134 | stores the execution context of the requested service. This context can be |
| 135 | suspended and resumed as the requested service executes and is |
| 136 | interrupted. The trusted thread is released only once the service |
| 137 | execution returns with a completion status. |
| 138 | |
| 139 | For **standard** SMCs, optee_os allocates or resumes a trusted thread then |
| 140 | unblock the IRQ/FIQ lines. When the optee_os needs to invoke the normal |
| 141 | world from a foreign interrupt or a remote service call, optee_os blocks |
| 142 | IRQ/FIQ and suspends the trusted thread. When suspending, optee_os gets |
| 143 | back to the entry stack. |
| 144 | |
| 145 | - **Both** fast and standard SMC end on the entry stack with IRQ/FIQ blocked |
| 146 | and optee_os invokes the Monitor through a SMC to return to the normal |
| 147 | world. |
| 148 | |
| 149 | .. figure:: ../images/core/interrupt_handling/tee_invoke.png |
| 150 | :figclass: align-center |
| 151 | |
| 152 | SMC entry to secure world |
| 153 | |
| 154 | Deliver non-secure interrupts to Normal World |
| 155 | ============================================= |
| 156 | This section uses the Arm GICv1/v2 conventions: IRQ signals non-secure |
| 157 | interrupts while FIQ signals secure interrupts. On a GICv3 configuration, one |
| 158 | should exchange IRQ and FIQ in this section. |
| 159 | |
| 160 | **Forward a Foreign Interrupt from Secure World to Normal World** |
| 161 | |
| 162 | When an IRQ is received in secure world as an IRQ exception then secure world: |
| 163 | |
| 164 | 1. Saves trusted thread context (entire state of all processor modes for |
| 165 | Armv7-A) |
| 166 | |
| 167 | 2. Blocks (masks) all interrupts (IRQ and FIQ) |
| 168 | |
| 169 | 3. Switches to entry stack |
| 170 | |
| 171 | 4. Issues an SMC with a value to indicates to normal world that an IRQ has |
| 172 | been delivered and last SMC call should be continued |
| 173 | |
| 174 | The monitor restores normal world context with a return code indicating that an |
| 175 | IRQ is about to be delivered. Normal world issues a new SMC indicating that it |
| 176 | should continue last SMC. |
| 177 | |
| 178 | The monitor restores secure world context which locates the previously saved |
| 179 | context and checks that it is a return from IRQ that is requested before |
| 180 | restoring the context and lets the secure world IRQ handler return from |
| 181 | exception where the execution would be resumed. |
| 182 | |
| 183 | Note that the monitor itself does not know/care that it has just forwarded an |
| 184 | IRQ to normal world. The bookkeeping is done in the trusted thread handling in |
| 185 | Trusted OS. Normal world is responsible to decide when the secure world thread |
| 186 | should resume execution (for details, see :ref:`thread_handling`). |
| 187 | |
| 188 | .. figure:: ../images/core/interrupt_handling/irq.png |
| 189 | :figclass: align-center |
| 190 | |
| 191 | IRQ received in secure world and forwarded to normal world |
| 192 | |
| 193 | **Deliver a non-secure interrupt to normal world when ``SCR_NS`` is set** |
| 194 | |
| 195 | Since ``SCR_IRQ`` is cleared, an IRQ will be delivered using the state vector |
| 196 | (``VBAR``) in the normal world. The IRQ is received as any other exception by |
| 197 | normal world, the monitor and the Trusted OS are not involved at all. |
| 198 | |
| 199 | Deliver secure interrupts to Secure World |
| 200 | ========================================= |
| 201 | This section uses the Arm GICv1/v2 conventions: FIQ signals secure interrupts |
| 202 | while IRQ signals non-secure interrupts. On a GICv3 configuration, one should |
| 203 | exchange IRQ and FIQ in this section. A FIQ can be received during two different |
| 204 | states, either in normal world (``SCR_NS`` is set) or in secure world |
| 205 | (``SCR_NS`` is cleared). When the secure monitor is active (Armv8-A EL3 or |
| 206 | Armv7-A Monitor mode) FIQ is masked. FIQ reception in the two different states |
| 207 | is described below. |
| 208 | |
| 209 | **Deliver FIQ to secure world when SCR_NS is set** |
| 210 | |
| 211 | When the monitor gets an FIQ exception it: |
| 212 | |
| 213 | 1. Saves normal world context and restores secure world context from last |
| 214 | secure world exit (which will have IRQ and FIQ blocked) |
| 215 | 2. Clears ``SCR_FIQ`` when clearing ``SCR_NS`` |
| 216 | 3. Sets “FIQ” as parameter to secure world entry |
| 217 | 4. Does a return from exception into secure context |
| 218 | 5. Secure world unmasks FIQs because of the “FIQ” parameter |
| 219 | 6. FIQ is received as in exception using the state vector |
| 220 | 7. The state vector handle returns from exception in secure world |
| 221 | 8. Secure world issues an SMC to return to normal world |
| 222 | 9. Monitor saves secure world context and restores normal world context |
| 223 | 10. Does a return from exception into restored context |
| 224 | |
| 225 | .. figure:: ../images/core/interrupt_handling/fiq.png |
| 226 | :figclass: align-center |
| 227 | |
| 228 | FIQ received when SCR_NS is set |
| 229 | |
| 230 | .. figure:: ../images/core/interrupt_handling/irq_fiq.png |
| 231 | :figclass: align-center |
| 232 | |
| 233 | FIQ received while processing an IRQ forwarded from secure world |
| 234 | |
| 235 | **Deliver FIQ to secure world when SCR_NS is cleared** |
| 236 | |
| 237 | Since ``SCR_FIQ`` is cleared when ``SCR_NS`` is cleared a FIQ will be delivered |
| 238 | using the state vector (``VBAR``) in secure world. The FIQ is received as any |
| 239 | other exception by Trusted OS, the monitor is not involved at all. |
| 240 | |
| 241 | Trusted thread scheduling |
| 242 | ========================= |
| 243 | **Trusted thread for standard services** |
| 244 | |
| 245 | OP-TEE standard services are carried through standard SMC. Execution of these |
| 246 | services can be interrupted by foreign interrupts. To suspend and restore the |
| 247 | service execution, optee_os assigns a trusted thread at standard SMCs entry. |
| 248 | |
| 249 | The trusted thread terminates when optee_os returns to the normal world with a |
| 250 | service completion status. |
| 251 | |
| 252 | A trusted thread execution can be interrupted by a native interrupt. In this |
| 253 | case the native interrupt is handled by the interrupt exception handlers and |
| 254 | once served, optee_os returns to the execution trusted thread. |
| 255 | |
| 256 | A trusted thread execution can be interrupted by a foreign interrupt. In this |
| 257 | case, optee_os suspends the trusted thread and invokes the normal world through |
| 258 | the Monitor (optee_os so-called RPC services). The trusted threads will resume |
| 259 | only once normal world invokes the optee_os with the RPC service status. |
| 260 | |
| 261 | A trusted thread execution can lead optee_os to invoke a service in normal |
| 262 | world: access a file, get the REE current time, etc. The trusted thread is |
| 263 | suspended/resumed during remote service execution. |
| 264 | |
| 265 | **Scheduling considerations** |
| 266 | |
| 267 | When a trusted thread is interrupted by a foreign interrupt and when optee_os |
| 268 | invokes a normal world service, the normal world gets the opportunity to |
| 269 | reschedule the running applications. The trusted thread will resume only once |
| 270 | the client application is scheduled back. Thus, a trusted thread execution |
| 271 | follows the scheduling of the normal world caller context. |
| 272 | |
| 273 | Optee_os does not implement any thread scheduling. Each trusted thread is |
| 274 | expected to track a service that is invoked from the normal world and should |
| 275 | return to it with an execution status. |
| 276 | |
| 277 | The OP-TEE Linux driver (as implemented in `drivers/tee/optee`_ since Linux |
| 278 | kernel 4.12) is designed so that the Linux thread invoking OP-TEE gets assigned |
| 279 | a trusted thread on TEE side. The execution of the trusted thread is tied to the |
| 280 | execution of the caller Linux thread which is under the Linux kernel scheduling |
| 281 | decision. This means trusted threads are scheduled by the Linux kernel. |
| 282 | |
| 283 | **Trusted thread constraints** |
| 284 | |
| 285 | TEE core handles a static number of trusted threads, see ``CFG_NUM_THREADS``. |
| 286 | |
| 287 | Trusted threads are only expensive on memory constrained system, mainly |
| 288 | regarding the execution stack size. |
| 289 | |
| 290 | On SMP systems, optee_os can execute several trusted threads in parallel if the |
| 291 | normal world supports scheduling of processes. Even on UP systems, supporting |
| 292 | several trusted threads in optee_os helps normal world scheduler to be |
| 293 | efficient. |
| 294 | |
| 295 | ---- |
| 296 | |
| 297 | .. _memory_objects: |
| 298 | |
| 299 | Memory objects |
| 300 | ************** |
| 301 | A memory object, **MOBJ**, describes a piece of memory. The interface provided |
| 302 | is mostly abstract when it comes to using the MOBJ to populate translation |
| 303 | tables etc. There are different kinds of MOBJs describing: |
| 304 | |
| 305 | - Physically contiguous memory |
| 306 | - created with ``mobj_phys_alloc(...)``. |
| 307 | |
| 308 | - Virtual memory |
| 309 | - one instance with the name ``mobj_virt`` available. |
| 310 | - spans the entire virtual address space. |
| 311 | |
| 312 | - Physically contiguous memory allocated from a ``tee_mm_pool_t *`` |
| 313 | - created with ``mobj_mm_alloc(...)``. |
| 314 | |
| 315 | - Paged memory |
| 316 | - created with ``mobj_paged_alloc(...)``. |
| 317 | - only contains the supplied size and makes ``mobj_is_paged(...)`` |
| 318 | return true if supplied as argument. |
| 319 | |
| 320 | - Secure copy paged shared memory |
| 321 | - created with ``mobj_seccpy_shm_alloc(...)``. |
| 322 | - makes ``mobj_is_paged(...)`` and ``mobj_is_secure(...)`` return true |
| 323 | if supplied as argument. |
| 324 | |
| 325 | ---- |
| 326 | |
| 327 | .. _mmu: |
| 328 | |
| 329 | MMU |
| 330 | *** |
| 331 | Translation tables |
| 332 | ================== |
| 333 | OP-TEE uses several L1 translation tables, one large spanning 4 GiB and two or |
| 334 | more small tables spanning 32 MiB. The large translation table handles kernel |
| 335 | mode mapping and matches all addresses not covered by the small translation |
| 336 | tables. The small translation tables are assigned per thread and covers the |
| 337 | mapping of the virtual memory space for one TA context. |
| 338 | |
| 339 | Memory space between small and large translation table is configured by TTBRC. |
| 340 | TTBR1 always points to the large translation table. TTBR0 points to the a small |
| 341 | translation table when user mapping is active and to the large translation table |
| 342 | when no user mapping is currently active. For details about registers etc, |
| 343 | please refer to a Technical Reference Manual for your architecture, for example |
| 344 | `Cortex-A53 TRM`_. |
| 345 | |
| 346 | The translation tables has certain alignment constraints, the alignment (of the |
| 347 | physical address) has to be the same as the size of the translation table. The |
| 348 | translation tables are statically allocated to avoid fragmentation of memory due |
| 349 | to the alignment constraints. |
| 350 | |
| 351 | Each thread has one small L1 translation table of its own. Each TA context has a |
| 352 | compact representation of its L1 translation table. The compact representation |
| 353 | is used to initialize the thread specific L1 translation table when the TA |
| 354 | context is activated. |
| 355 | |
| 356 | .. graphviz:: |
| 357 | |
| 358 | digraph xlat_table { |
| 359 | graph [ |
| 360 | rankdir = "LR" |
| 361 | ]; |
| 362 | node [ |
| 363 | fontsize = "16" |
| 364 | shape = "ellipse" |
| 365 | ]; |
| 366 | edge [ |
| 367 | ]; |
| 368 | "node_ttb" [ |
| 369 | label = "<f0> TTBR0 | <f1> TTBR1" |
| 370 | shape = "record" |
| 371 | ]; |
| 372 | "node_large_l1" [ |
| 373 | label = "<f0> Large L1\nSpans 4 GiB" |
| 374 | shape = "record" |
| 375 | ]; |
| 376 | "node_small_l1" [ |
| 377 | label = "Small L1\nSpans 32 MiB\nper entry | <f0> 0 | <f1> 1 | ... | <fn> n" |
| 378 | shape = "record" |
| 379 | ]; |
| 380 | |
| 381 | "node_ttb":f0 -> "node_small_l1":f0 [ label = "Thread 0 ctx active" ]; |
| 382 | "node_ttb":f0 -> "node_small_l1":f1 [ label = "Thread 1 ctx active" ]; |
| 383 | "node_ttb":f0 -> "node_small_l1":fn [ label = "Thread n ctx active" ]; |
| 384 | "node_ttb":f0 -> "node_large_l1" [ label="No active ctx" ]; |
| 385 | "node_ttb":f1 -> "node_large_l1"; |
| 386 | } |
| 387 | |
Jens Wiklander | 03b05a0 | 2019-02-25 13:44:38 +0100 | [diff] [blame] | 388 | Page table cache |
| 389 | ================ |
| 390 | Page tables used to map TAs are managed with the page table cache. When the |
| 391 | context of a TA is unmapped, all its page tables are released with a call |
| 392 | to ``pgt_free()``. All page tables needed when mapping a TA are allocated |
| 393 | using ``pgt_alloc()``. |
| 394 | |
| 395 | A fixed maximum number of translation tables are available in a pool. One |
| 396 | thread may execute a TA which needs all or almost all tables. This can |
| 397 | block TAs from being executed by other threads. To ensure that all TAs |
| 398 | eventually will be permitted to execute ``pgt_alloc()`` temporarily frees |
| 399 | eventual tables allocated before waiting for tables to become available. |
| 400 | |
| 401 | The page table cache behaves differently depending on configuration |
| 402 | options. |
| 403 | |
| 404 | Without paging (``CFG_WITH_PAGER=n``) |
| 405 | ------------------------------------- |
| 406 | This is the easiest configuration. All page tables are statically allocated |
| 407 | in the ``.nozi.pgt_cache`` section. ``pgt_alloc()`` allocates tables from the |
| 408 | free-list and ``pgt_free()`` returns the tables directly to the free-list. |
| 409 | |
| 410 | With paging enabled (``CFG_WITH_PAGER=y``) |
| 411 | ------------------------------------------ |
| 412 | |
| 413 | Page tables are allocated as zero initialized locked pages during boot |
| 414 | using ``tee_pager_alloc()``. Locked pages are populated with physical pages |
| 415 | on demand from the pager. The physical page can be released when not needed |
| 416 | any longer with ``tee_pager_release_phys()``. |
| 417 | |
| 418 | With ``CFG_WITH_LPAE=y`` each translation table has the same size as a |
| 419 | physical page which makes it easy to release the physical page when the |
| 420 | translation table isn't needed any longer. With the short-descriptor table |
| 421 | format (``CFG_WITH_LPAE=n``) it becomes more complicated as four |
| 422 | translation tables are stored in each page. Additional bookkeeping is used |
| 423 | to tell when the page for used by four separate translation tables can be |
| 424 | released. |
| 425 | |
| 426 | With paging of user TA enabled (``CFG_PAGED_USER_TA=y``) |
| 427 | -------------------------------------------------------- |
| 428 | With paging of user TAs enabled a cache of recently used translation tables |
| 429 | is used. This can save us from a storm of page faults when restoring the |
| 430 | mappings of a recently unmapped TA. Which translation tables should be |
| 431 | cached is indicated with reference counting by the pager on used tables. |
| 432 | When a table needs to be forcefully freed |
| 433 | ``tee_pager_pgt_save_and_release_entries()`` is called to let the pager |
| 434 | know that the table can't be used any longer. |
| 435 | |
| 436 | When a mapping in a TA is removed it also needs to be purged from cached |
| 437 | tables with ``pgt_flush_ctx_range()`` to prevent old mappings from being |
| 438 | accidentally reused. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 439 | |
| 440 | Switching to user mode |
| 441 | ====================== |
| 442 | This section only applies with following configuration flags: |
| 443 | |
| 444 | - ``CFG_WITH_LPAE=n`` |
| 445 | - ``CFG_CORE_UNMAP_CORE_AT_EL0=y`` |
| 446 | |
| 447 | When switching to user mode only a minimal kernel mode mapping is kept. This is |
| 448 | achieved by selecting a zeroed out big L1 translation in TTBR1 when |
| 449 | transitioning to user mode. When returning back to kernel mode the original L1 |
| 450 | translation table is restored in TTBR1. |
| 451 | |
| 452 | Switching to normal world |
| 453 | ========================= |
| 454 | When switching to normal world either via a foreign interrupt (see |
| 455 | :ref:`native_foreign_irqs` or RPC there is a chance that secure world will |
| 456 | resume execution on a different CPU. This means that the new CPU need to be |
| 457 | configured with the context of the currently active TA. This is solved by always |
Jens Wiklander | ddde3a8 | 2019-02-25 12:46:18 +0100 | [diff] [blame] | 458 | setting the TA context in the CPU when resuming execution. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 459 | |
| 460 | ---- |
| 461 | |
| 462 | .. _pager: |
| 463 | |
| 464 | Pager |
| 465 | ***** |
| 466 | OP-TEE currently requires >256 KiB RAM for OP-TEE kernel memory. This is not a |
| 467 | problem if OP-TEE uses TrustZone protected DDR, but for security reasons OP-TEE |
| 468 | may need to use TrustZone protected SRAM instead. The amount of available SRAM |
| 469 | varies between platforms, from just a few KiB up to over 512 KiB. Platforms with |
| 470 | just a few KiB of SRAM cannot be expected to be able to run a complete TEE |
| 471 | solution in SRAM. But those with 128 to 256 KiB of SRAM can be expected to have |
| 472 | a capable TEE solution in SRAM. The pager provides a solution to this by demand |
| 473 | paging parts of OP-TEE using virtual memory. |
| 474 | |
| 475 | Secure memory |
| 476 | ============= |
| 477 | TrustZone protected SRAM is generally considered more secure than TrustZone |
| 478 | protected DRAM as there is usually more attack vectors on DRAM. The attack |
| 479 | vectors are hardware dependent and can be different for different platforms. |
| 480 | |
| 481 | Backing store |
| 482 | ============= |
| 483 | TrustZone protected DRAM or in some cases non-secure DRAM is used as backing |
| 484 | store. The data in the backing store is integrity protected with one hash |
| 485 | (SHA-256) per page (4KiB). Readonly pages are not encrypted since the OP-TEE |
| 486 | binary itself is not encrypted. |
| 487 | |
| 488 | Partitioning of memory |
| 489 | ====================== |
| 490 | The code that handles demand paging must always be available as it would |
| 491 | otherwise lead to deadlock. The virtual memory is partitioned as: |
| 492 | |
| 493 | +--------------+-------------------+ |
| 494 | | Type | Sections | |
| 495 | +==============+===================+ |
| 496 | | unpaged | | text | |
| 497 | | | | rodata | |
| 498 | | | | data | |
| 499 | | | | bss | |
| 500 | | | | heap1 | |
| 501 | | | | nozi | |
| 502 | | | | heap2 | |
| 503 | +--------------+-------------------+ |
| 504 | | init / paged | | text_init | |
| 505 | | | | rodata_init | |
| 506 | +--------------+-------------------+ |
| 507 | | paged | | text_pageable | |
| 508 | | | | rodata_pageable | |
| 509 | +--------------+-------------------+ |
| 510 | | demand alloc | | |
| 511 | +--------------+-------------------+ |
| 512 | |
| 513 | Where ``nozi`` stands for "not zero initialized", this section contains entry |
| 514 | stacks (thread stack when TEE pager is not enabled) and translation tables (TEE |
| 515 | pager cached translation table when the pager is enabled and LPAE MMU is used). |
| 516 | |
| 517 | The ``init`` area is available when OP-TEE is initializing and contains |
| 518 | everything that is needed to initialize the pager. After the pager has been |
| 519 | initialized this area will be used for demand paged instead. |
| 520 | |
| 521 | The ``demand alloc`` area is a special area where the pages are allocated and |
| 522 | removed from the pager on demand. Those pages are returned when OP-TEE does not |
| 523 | need them any longer. The thread stacks currently belongs this area. This means |
| 524 | that when a stack is not used the physical pages can be used by the pager for |
| 525 | better performance. |
| 526 | |
| 527 | The technique to gather code in the different area is based on compiling all |
| 528 | functions and data into separate sections. The unpaged text and rodata is then |
| 529 | gathered by linking all object files with ``--gc-sections`` to eliminate |
| 530 | sections that are outside the dependency graph of the entry functions for |
| 531 | unpaged functions. A script analyzes this ELF file and generates the bits of the |
| 532 | final link script. The process is repeated for init text and rodata. What is |
| 533 | not "unpaged" or "init" becomes "paged". |
| 534 | |
| 535 | Partitioning of the binary |
| 536 | ========================== |
| 537 | .. note:: |
| 538 | The struct definitions provided in this section are explicitly covered by |
| 539 | the following dual license: |
| 540 | |
| 541 | .. code-block:: none |
| 542 | |
| 543 | SPDX-License-Identifier: (BSD-2-Clause OR GPL-2.0) |
| 544 | |
| 545 | The binary is partitioned into four parts as: |
| 546 | |
| 547 | |
| 548 | +----------+ |
| 549 | | Binary | |
| 550 | +==========+ |
| 551 | | Header | |
| 552 | +----------+ |
| 553 | | Init | |
| 554 | +----------+ |
| 555 | | Hashes | |
| 556 | +----------+ |
| 557 | | Pageable | |
| 558 | +----------+ |
| 559 | |
| 560 | The header is defined as: |
| 561 | |
| 562 | .. code-block:: c |
| 563 | |
| 564 | #define OPTEE_MAGIC 0x4554504f |
| 565 | #define OPTEE_VERSION 1 |
| 566 | #define OPTEE_ARCH_ARM32 0 |
| 567 | #define OPTEE_ARCH_ARM64 1 |
| 568 | |
| 569 | struct optee_header { |
| 570 | uint32_t magic; |
| 571 | uint8_t version; |
| 572 | uint8_t arch; |
| 573 | uint16_t flags; |
| 574 | uint32_t init_size; |
| 575 | uint32_t init_load_addr_hi; |
| 576 | uint32_t init_load_addr_lo; |
| 577 | uint32_t init_mem_usage; |
| 578 | uint32_t paged_size; |
| 579 | }; |
| 580 | |
| 581 | The header is only used by the loader of OP-TEE, not OP-TEE itself. To |
| 582 | initialize OP-TEE the loader loads the complete binary into memory and copies |
| 583 | what follows the header and the following ``init_size`` bytes to |
| 584 | ``(init_load_addr_hi << 32 | init_load_addr_lo)``. ``init_mem_usage`` is used by |
| 585 | the loader to be able to check that there is enough physical memory available |
| 586 | for OP-TEE to be able to initialize at all. The loader supplies in ``r0/x0`` the |
| 587 | address of the first byte following what was not copied and jumps to the load |
| 588 | address to start OP-TEE. |
| 589 | |
| 590 | In addition to overall binary with partitions inside described as above, three |
| 591 | extra binaries are generated simultaneously during build process for loaders who |
| 592 | support loading separate binaries: |
| 593 | |
| 594 | +-----------+ |
| 595 | | v2 binary | |
| 596 | +===========+ |
| 597 | | Header | |
| 598 | +-----------+ |
| 599 | |
| 600 | +-----------+ |
| 601 | | v2 binary | |
| 602 | +===========+ |
| 603 | | Init | |
| 604 | +-----------+ |
| 605 | | Hashes | |
| 606 | +-----------+ |
| 607 | |
| 608 | +-----------+ |
| 609 | | v2 binary | |
| 610 | +===========+ |
| 611 | | Pageable | |
| 612 | +-----------+ |
| 613 | |
| 614 | In this case, loaders load header binary first to get image list and information |
| 615 | of each image; and then load each of them into specific load address assigned in |
| 616 | structure. These binaries are named with `v2` suffix to distinguish from the |
| 617 | existing binaries. Header format is updated to help loaders loading binaries |
| 618 | efficiently: |
| 619 | |
| 620 | .. code-block:: c |
| 621 | |
| 622 | #define OPTEE_IMAGE_ID_PAGER 0 |
| 623 | #define OPTEE_IMAGE_ID_PAGED 1 |
| 624 | |
| 625 | struct optee_image { |
| 626 | uint32_t load_addr_hi; |
| 627 | uint32_t load_addr_lo; |
| 628 | uint32_t image_id; |
| 629 | uint32_t size; |
| 630 | }; |
| 631 | |
| 632 | struct optee_header_v2 { |
| 633 | uint32_t magic; |
| 634 | uint8_t version; |
| 635 | uint8_t arch; |
| 636 | uint16_t flags; |
| 637 | uint32_t nb_images; |
| 638 | struct optee_image optee_image[]; |
| 639 | }; |
| 640 | |
| 641 | Magic number and architecture are identical as original. Version is increased to |
| 642 | two. ``load_addr_hi`` and ``load_addr_lo`` may be ``0xFFFFFFFF`` for pageable |
| 643 | binary since pageable part may get loaded by loader into dynamic available |
| 644 | position. ``image_id`` indicates how loader handles current binary. Loaders who |
| 645 | don't support separate loading just ignore all v2 binaries. |
| 646 | |
| 647 | Initializing the pager |
| 648 | ====================== |
| 649 | The pager is initialized as early as possible during boot in order to minimize |
| 650 | the "init" area. The global variable ``tee_mm_vcore`` describes the virtual |
| 651 | memory range that is covered by the level 2 translation table supplied to |
| 652 | ``tee_pager_init(...)``. |
| 653 | |
| 654 | Assign pageable areas |
| 655 | --------------------- |
| 656 | A virtual memory range to be handled by the pager is registered with a call to |
| 657 | ``tee_pager_add_core_area()``. |
| 658 | |
| 659 | .. code-block:: c |
| 660 | |
| 661 | bool tee_pager_add_area(tee_mm_entry_t *mm, |
| 662 | uint32_t flags, |
| 663 | const void *store, |
| 664 | const void *hashes); |
| 665 | |
| 666 | which takes a pointer to ``tee_mm_entry_t`` to tell the range, flags to tell how |
| 667 | memory should be mapped (readonly, execute etc), and pointers to backing store |
| 668 | and hashes of the pages. |
| 669 | |
| 670 | Assign physical pages |
| 671 | --------------------- |
| 672 | Physical SRAM pages are supplied by calling ``tee_pager_add_pages(...)`` |
| 673 | |
| 674 | .. code-block:: c |
| 675 | |
| 676 | void tee_pager_add_pages(tee_vaddr_t vaddr, |
| 677 | size_t npages, |
| 678 | bool unmap); |
| 679 | |
| 680 | ``tee_pager_add_pages(...)`` takes the physical address stored in the entry |
| 681 | mapping the virtual address ``vaddr`` and ``npages`` entries after that and uses |
| 682 | it to map new pages when needed. The unmap parameter tells whether the pages |
| 683 | should be unmapped immediately since they does not contain initialized data or |
| 684 | be kept mapped until they need to be recycled. The pages in the "init" area are |
| 685 | supplied with ``unmap == false`` since those page have valid content and are in |
| 686 | use. |
| 687 | |
| 688 | Invocation |
| 689 | ========== |
| 690 | The pager is invoked as part of the abort handler. A pool of physical pages are |
| 691 | used to map different virtual addresses. When a new virtual address needs to be |
| 692 | mapped a free physical page is mapped at the new address, if a free physical |
| 693 | page cannot be found the oldest physical page is selected instead. When the page |
| 694 | is mapped new data is copied from backing store and the hash of the page is |
| 695 | verified. If it is OK the pager returns from the exception to resume the |
| 696 | execution. |
| 697 | |
Jens Wiklander | aecf441 | 2019-02-26 12:33:14 +0100 | [diff] [blame] | 698 | Data structures |
| 699 | =============== |
| 700 | .. figure:: ../images/core/tee_pager_area.png |
| 701 | :figclass: align-center |
| 702 | |
| 703 | How the main pager data structures relates to each other |
| 704 | |
| 705 | ``struct tee_pager_area`` |
| 706 | ------------------------- |
| 707 | This is a central data structure when handling paged |
| 708 | memory ranges. It's defined as: |
| 709 | |
| 710 | .. code-block:: c |
| 711 | |
| 712 | struct tee_pager_area { |
| 713 | struct fobj *fobj; |
| 714 | size_t fobj_pgoffs; |
| 715 | enum tee_pager_area_type type; |
| 716 | uint32_t flags; |
| 717 | vaddr_t base; |
| 718 | size_t size; |
| 719 | struct pgt *pgt; |
| 720 | TAILQ_ENTRY(tee_pager_area) link; |
| 721 | TAILQ_ENTRY(tee_pager_area) fobj_link; |
| 722 | }; |
| 723 | |
| 724 | Where ``base`` and ``size`` tells the memory range and ``fobj`` and |
| 725 | ``fobj_pgoffs`` holds the content. A ``struct tee_pager_area`` can only use |
| 726 | ``struct fobj`` and one ``struct pgt`` (translation table) so memory ranges |
| 727 | spanning multiple fobjs or pgts are split into multiple areas. |
| 728 | |
| 729 | ``struct fobj`` |
| 730 | --------------- |
| 731 | This is a polymorph object, using different implmentations depending on how |
| 732 | it's initialized. It's defines as: |
| 733 | |
| 734 | .. code-block:: c |
| 735 | |
| 736 | struct fobj_ops { |
| 737 | void (*free)(struct fobj *fobj); |
| 738 | TEE_Result (*load_page)(struct fobj *fobj, unsigned int page_idx, |
| 739 | void *va); |
| 740 | TEE_Result (*save_page)(struct fobj *fobj, unsigned int page_idx, |
| 741 | const void *va); |
| 742 | }; |
| 743 | |
| 744 | struct fobj { |
| 745 | const struct fobj_ops *ops; |
| 746 | unsigned int num_pages; |
| 747 | struct refcount refc; |
| 748 | struct tee_pager_area_head areas; |
| 749 | }; |
| 750 | |
| 751 | :``num_pages``: Tells how many pages this ``fobj`` covers. |
| 752 | :``refc``: A reference counter, everyone referring to a ``fobj`` need to |
| 753 | increase and decrease this as needed. |
| 754 | :``areas``: A list of areas using this ``fobj``, traversed when making |
| 755 | a virtual page unavailable. |
| 756 | |
| 757 | ``struct tee_pager_pmem`` |
| 758 | ------------------------- |
| 759 | This structure represents a physical page. It's defined as: |
| 760 | |
| 761 | .. code-block:: c |
| 762 | |
| 763 | struct tee_pager_pmem { |
| 764 | unsigned int flags; |
| 765 | unsigned int fobj_pgidx; |
| 766 | struct fobj *fobj; |
| 767 | void *va_alias; |
| 768 | TAILQ_ENTRY(tee_pager_pmem) link; |
| 769 | }; |
| 770 | |
| 771 | :``PMEM_FLAG_DIRTY``: Bit is set in ``flags`` when the page is mapped |
| 772 | read/write at at least one location. |
| 773 | :``PMEM_FLAG_HIDDEN``: Bit is set in ``flags`` when the page is hidden, that |
| 774 | is, not accessible anywhere. |
| 775 | :``fobj_pgidx``: The page at this index in the ``fobj`` is used in this |
| 776 | physical page. |
| 777 | :``fobj``: The ``fobj`` backing this page. |
| 778 | :``va_alias``: Virtual address where this physical page is updated |
| 779 | when loading it from backing store or when writing it |
| 780 | back. |
| 781 | |
| 782 | All ``struct tee_pager_pmem`` are stored either in the global list |
| 783 | ``tee_pager_pmem_head`` or in ``tee_pager_lock_pmem_head``. The latter is |
| 784 | used by pages which are mapped and then locked in memory on demand. The |
| 785 | pages are returned back to ``tee_pager_pmem_head`` when the pages are |
| 786 | exlicitly released with a call to ``tee_pager_release_phys()``. |
| 787 | |
| 788 | A physical page can be used by more than one ``tee_pager_area`` |
| 789 | simultaneously. This is also know as shared secure memory and will appear |
| 790 | as such for both read-only and read-write mappings. |
| 791 | |
| 792 | When a page is hidden it's unmapped from all translation tables and the |
| 793 | ``PMEM_FLAG_HIDDEN`` bit is set, but kept in memory. When a physical page |
| 794 | is released it's also unmapped from all translation tables and it's content |
| 795 | is written back to storage, then the ``fobj`` field is set to ``NULL`` to |
| 796 | note the physical page as unused. |
| 797 | |
| 798 | Note that when ``struct tee_pager_pmem`` references a ``fobj`` it doesn't |
| 799 | update the reference counter since it's already guaranteed to be available |
| 800 | due the ``struct tee_pager_area`` which must reference the ``fobj`` too. |
| 801 | |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 802 | Paging of user TA |
| 803 | ================= |
| 804 | Paging of user TAs can optionally be enabled with ``CFG_PAGED_USER_TA=y``. |
| 805 | Paging of user TAs is analogous to paging of OP-TEE kernel parts but with a few |
| 806 | differences: |
| 807 | |
| 808 | - Read/write pages are paged in addition to read-only pages |
| 809 | - Page tables are managed dynamically |
| 810 | |
| 811 | ``tee_pager_add_uta_area(...)`` is used to setup initial read/write mapping |
| 812 | needed when populating the TA. When the TA is fully populated and relocated |
| 813 | ``tee_pager_set_uta_area_attr(...)`` changes the mapping of the area to strict |
| 814 | permissions used when the TA is running. |
| 815 | |
Jens Wiklander | aecf441 | 2019-02-26 12:33:14 +0100 | [diff] [blame] | 816 | Paging shared secure memory |
| 817 | --------------------------- |
| 818 | Shared secure memory is achieved by letting several ``tee_pager_area`` |
| 819 | using the same backing ``fobj``. When a ``tee_pager_area`` is allocated and |
| 820 | assigned a ``fobj`` it's also added to a list for ``tee_pager_areas`` using |
| 821 | this ``fobj``. This helps when a physical page is released. |
| 822 | |
| 823 | When a fault occurs first a matching ``tee_pager_area`` is located. Then |
| 824 | ``tee_pager_pmem_head`` is searched to see if a physical page already holds |
| 825 | the page of the ``fobj`` needed. If so the ``pgt`` is updated to map the |
| 826 | physical page at the appropriate locatation. If no physical page was holding |
| 827 | the page a new physical page is allocated, initialized and finally mapped. |
| 828 | |
| 829 | In order to make as few updates to mappings as possible changes to less |
| 830 | restricted, no access -> read-only or read-only to read-write, is done only |
| 831 | for the virtual address was used when the page fault occurred. Changes in |
| 832 | the other direction has to be done in all translation tables used to map |
| 833 | the physical page. |
| 834 | |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 835 | ---- |
| 836 | |
| 837 | .. _stacks: |
| 838 | |
| 839 | Stacks |
| 840 | ****** |
| 841 | Different stacks are used during different stages. The stacks are: |
| 842 | |
| 843 | - **Secure monitor stack** (128 bytes), bound to the CPU. Only available if |
| 844 | OP-TEE is compiled with a secure monitor always the case if the target is |
| 845 | Armv7-A but never for Armv8-A. |
| 846 | |
| 847 | - **Temp stack** (small ~1KB), bound to the CPU. Used when transitioning |
| 848 | from one state to another. Interrupts are always disabled when using this |
| 849 | stack, aborts are fatal when using the temp stack. |
| 850 | |
| 851 | - **Abort stack** (medium ~2KB), bound to the CPU. Used when trapping a data |
| 852 | or pre-fetch abort. Aborts from user space are never fatal the TA is only |
| 853 | killed. Aborts from kernel mode are used by the pager to do the demand |
| 854 | paging, if pager is disabled all kernel mode aborts are fatal. |
| 855 | |
| 856 | - **Thread stack** (large ~8KB), not bound to the CPU instead used by the |
| 857 | current thread/task. Interrupts are usually enabled when using this stack. |
| 858 | |
| 859 | Notes for Armv7-A/AArch32 |
| 860 | .. list-table:: |
| 861 | :header-rows: 1 |
| 862 | :widths: 1 5 |
| 863 | |
| 864 | * - Stack |
| 865 | - Comment |
| 866 | |
| 867 | * - Temp |
| 868 | - Assigned to ``SP_SVC`` during entry/exit, always assigned to |
| 869 | ``SP_IRQ`` and ``SP_FIQ`` |
| 870 | |
| 871 | * - Abort |
| 872 | - Always assigned to ``SP_ABT`` |
| 873 | |
| 874 | * - Thread |
| 875 | - Assigned to ``SP_SVC`` while a thread is active |
| 876 | |
| 877 | Notes for AArch64 |
| 878 | There are only two stack pointers, ``SP_EL1`` and ``SP_EL0``, available for |
| 879 | OP-TEE in AArch64. When an exception is received stack pointer is always |
| 880 | ``SP_EL1`` which is used temporarily while assigning an appropriate stack |
| 881 | pointer for ``SP_EL0``. ``SP_EL1`` is always assigned the value of |
| 882 | ``thread_core_local[cpu_id]``. This structure has some spare space for |
| 883 | temporary storage of registers and also keeps the relevant stack pointers. |
| 884 | In general when we talk about assigning a stack pointer to the CPU below we |
| 885 | mean ``SP_EL0``. |
| 886 | |
| 887 | Boot |
| 888 | ==== |
| 889 | During early boot the CPU is configured with the temp stack which is used until |
| 890 | OP-TEE exits to normal world the first time. |
| 891 | |
| 892 | Notes for AArch64 |
| 893 | ``SPSEL`` is always ``0`` on entry/exit to have ``SP_EL0`` acting as stack |
| 894 | pointer. |
| 895 | |
| 896 | Normal entry |
| 897 | ============ |
| 898 | Each time OP-TEE is entered from normal world the temp stack is used as the |
| 899 | initial stack. For fast calls, this is the only stack used. For normal calls an |
| 900 | empty thread slot is selected and the CPU switches to that stack. |
| 901 | |
| 902 | Normal exit |
| 903 | =========== |
| 904 | Normal exit occurs when a thread has finished its task and the thread is freed. |
| 905 | When the main thread function, ``tee_entry_std(...)``, returns interrupts are |
| 906 | disabled and the CPU switches to the temp stack instead. The thread is freed and |
| 907 | OP-TEE exits to normal world. |
| 908 | |
| 909 | RPC exit |
| 910 | ======== |
| 911 | RPC exit occurs when OP-TEE need some service from normal world. RPC can |
| 912 | currently only be performed with a thread is in running state. RPC is initiated |
| 913 | with a call to ``thread_rpc(...)`` which saves the state in a way that when the |
| 914 | thread is restored it will continue at the next instruction as if this function |
| 915 | did a normal return. CPU switches to use the temp stack before returning to |
| 916 | normal world. |
| 917 | |
| 918 | Foreign interrupt exit |
| 919 | ====================== |
| 920 | Foreign interrupt exit occurs when OP-TEE receives a foreign interrupt. For Arm |
| 921 | GICv2 mode, foreign interrupt is sent as IRQ which is always handled in normal |
| 922 | world. Foreign interrupt exit is similar to RPC exit but it is |
| 923 | ``thread_irq_handler(...)`` and ``elx_irq(...)`` (respectively for |
| 924 | Armv7-A/Aarch32 and for Aarch64) that saves the thread state instead. The thread |
| 925 | is resumed in the same way though. For Arm GICv3 mode, foreign interrupt is sent |
| 926 | as FIQ which could be handled by either secure world (EL3 in AArch64) or normal |
| 927 | world. This mode is not supported yet. |
| 928 | |
| 929 | Notes for Armv7-A/AArch32 |
| 930 | SP_IRQ is initialized to temp stack instead of a separate stack. Prior to |
| 931 | exiting to normal world CPU state is changed to SVC and temp stack is |
| 932 | selected. |
| 933 | |
| 934 | Notes for AArch64 |
| 935 | ``SP_EL0`` is assigned temp stack and is selected during IRQ processing. The |
| 936 | original ``SP_EL0`` is saved in the thread context to be restored when |
| 937 | resuming. |
| 938 | |
| 939 | Resume entry |
| 940 | ============ |
| 941 | OP-TEE is entered using the temp stack in the same way as for normal entry. The |
| 942 | thread to resume is looked up and the state is restored to resume execution. The |
| 943 | procedure to resume from an RPC exit or an foreign interrupt exit is exactly the |
| 944 | same. |
| 945 | |
| 946 | Syscall |
| 947 | ======= |
| 948 | Syscall's are executed using the thread stack. |
| 949 | |
| 950 | Notes for Armv7-A/AArch32 |
| 951 | Nothing special ``SP_SVC`` is already set with thread stack. |
| 952 | |
| 953 | Notes for syscall AArch64 |
| 954 | Early in the exception processing the original ``SP_EL0`` is saved in |
| 955 | ``struct thread_svc_regs`` in case the TA is executed in AArch64. Current |
| 956 | thread stack is assigned to ``SP_EL0`` which is then selected. When |
| 957 | returning ``SP_EL0`` is assigned what is in ``struct thread_svc_regs``. This |
| 958 | allows ``tee_svc_sys_return_helper(...)`` having the syscall exception |
| 959 | handler return directly to ``thread_unwind_user_mode(...)``. |
| 960 | |
| 961 | ---- |
| 962 | |
| 963 | .. _shared_memory: |
| 964 | |
| 965 | Shared Memory |
| 966 | ************* |
| 967 | Shared Memory is a block of memory that is shared between the non-secure and the |
| 968 | secure world. It is used to transfer data between both worlds. |
| 969 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 970 | The shared memory is allocated and managed by the non-secure world, i.e. the |
| 971 | Linux OP-TEE driver. Secure world only considers the individual shared buffers, |
| 972 | not their pool. Each shared memory is referenced with associated attributes: |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 973 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 974 | - Buffer start address and byte size, |
| 975 | - Cache attributes of the shared memory buffer, |
| 976 | - List of chunks if mapped from noncontiguous pages. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 977 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 978 | Shared memory buffer references manipulated must fit inside one of the |
| 979 | shared memory areas known from the OP-TEE core. OP-TEE supports two kinds |
| 980 | of shared memory areas: a mandatory area for contiguous buffers |
| 981 | an optional extra memory areas for noncontiguous buffers. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 982 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 983 | Contiguous shared buffers |
| 984 | ========================= |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 985 | Configuration directives ``CFG_SHMEM_START`` and ``CFG_SHMEM_SIZE`` |
| 986 | define a share memory area where shared memory buffers are contiguous. |
| 987 | Generic memory layout registers it as the ``MEM_AREA_NSEC_SHM`` memory area. |
| 988 | |
| 989 | The non-secure world issues ``OPTEE_SMC_GET_SHM_CONFIG`` to retrieve contiguous |
| 990 | shared memory area configuration: |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 991 | |
| 992 | - Physical address of the start of the pool |
| 993 | - Size of the pool |
| 994 | - Whether or not the memory is cached |
| 995 | |
Jens Wiklander | a70d2f4 | 2019-04-25 12:40:49 +0200 | [diff] [blame] | 996 | Contiguous shared memory (also known as static or reserved shared memory) |
| 997 | is enabled with the configuration flag ``CFG_CORE_RESERVED_SHM=y``. |
| 998 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 999 | Noncontiguous shared buffers |
| 1000 | ============================ |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 1001 | To benefit from noncontiguous shared memory buffers, secure world register |
| 1002 | dynamic shared memory areas and non-secure world must register noncontiguous |
| 1003 | buffers prior to referring to them using the OP-TEE API. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 1004 | |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 1005 | The OP-TEE core generic boot sequence discovers dynamic shared areas from the |
| 1006 | device tree and/or areas explicitly registered by the platform. |
| 1007 | |
| 1008 | Non-secure side needs to register buffers as 4kByte chunks lists into OP-TEE |
| 1009 | core using the ``OPTEE_MSG_CMD_REGISTER_SHM`` API prior referencing to them |
| 1010 | using the OP-TEE invocation API. |
| 1011 | |
Jens Wiklander | a70d2f4 | 2019-04-25 12:40:49 +0200 | [diff] [blame] | 1012 | Noncontiguous shared memory (also known as dynamic shared memory) is |
| 1013 | enabled with the configuration flag ``CFG_CORE_DYN_SHM=y``. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 1014 | |
| 1015 | Shared Memory Chunk Allocation |
| 1016 | ============================== |
| 1017 | It is the Linux kernel driver for OP-TEE that is responsible for allocating |
| 1018 | chunks of shared memory. OP-TEE linux kernel driver relies on linux kernel |
| 1019 | generic allocation support (``CONFIG_GENERIC_ALLOCATION``) to allocation/release |
| 1020 | of shared memory physical chunks. OP-TEE linux kernel driver relies on linux |
| 1021 | kernel dma-buf support (``CONFIG_DMA_SHARED_BUFFER``) to track shared memory |
| 1022 | buffers references. |
| 1023 | |
| 1024 | Using shared memory |
| 1025 | =================== |
| 1026 | From the Client Application |
| 1027 | The client application can ask for shared memory allocation using the |
| 1028 | GlobalPlatform Client API function ``TEEC_AllocateSharedMemory(...)``. The |
Etienne Carriere | 9c60025 | 2019-03-11 11:01:48 +0100 | [diff] [blame] | 1029 | client application can also register a memory through the GlobalPlatform |
| 1030 | Client API function ``TEEC_RegisterSharedMemory(...)``. The shared memory |
| 1031 | reference can then be used as parameter when invoking a trusted application. |
Joakim Bech | 8e5c5b3 | 2018-10-25 08:18:32 +0200 | [diff] [blame] | 1032 | |
| 1033 | From the Linux Driver |
| 1034 | Occasionally the Linux kernel driver needs to allocate shared memory for the |
| 1035 | communication with secure world, for example when using buffers of type |
| 1036 | ``TEEC_TempMemoryReference``. |
| 1037 | |
| 1038 | From OP-TEE core |
| 1039 | In case OP-TEE core needs information from TEE supplicant (dynamic TA |
| 1040 | loading, REE time request,...), shared memory must be allocated. Allocation |
| 1041 | depends on the use case. OP-TEE core asks for the following shared memory |
| 1042 | allocation: |
| 1043 | |
| 1044 | - ``optee_msg_arg`` structure, used to pass the arguments to the |
| 1045 | non-secure world, where the allocation will be done by sending a |
| 1046 | ``OPTEE_SMC_RPC_FUNC_ALLOC`` message. |
| 1047 | |
| 1048 | - In some cases, a payload might be needed for storing the result from |
| 1049 | TEE supplicant, for example when loading a Trusted Application. This |
| 1050 | type of allocation will be done by sending the message |
| 1051 | ``OPTEE_MSG_RPC_CMD_SHM_ALLOC(OPTEE_MSG_RPC_SHM_TYPE_APPL,...)``, |
| 1052 | which then will return: |
| 1053 | |
| 1054 | - the physical address of the shared memory |
| 1055 | - a handle to the memory, that later on will be used later on when |
| 1056 | freeing this memory. |
| 1057 | |
| 1058 | From TEE Supplicant |
| 1059 | TEE supplicant is also working with shared memory, used to exchange data |
| 1060 | between normal and secure worlds. TEE supplicant receives a memory address |
| 1061 | from the OP-TEE core, used to store the data. This is for example the case |
| 1062 | when a Trusted Application is loaded. In this case, TEE supplicant must |
| 1063 | register the provided shared memory in the same way a client application |
| 1064 | would do, involving the Linux driver. |
| 1065 | |
| 1066 | ---- |
| 1067 | |
| 1068 | .. _smc: |
| 1069 | |
| 1070 | SMC |
| 1071 | *** |
| 1072 | SMC Interface |
| 1073 | ============= |
| 1074 | OP-TEE's SMC interface is defined in two levels using optee_smc.h_ and |
| 1075 | optee_msg.h_. The former file defines SMC identifiers and what is passed in the |
| 1076 | registers for each SMC. The latter file defines the OP-TEE Message protocol |
| 1077 | which is not restricted to only SMC even if that currently is the only option |
| 1078 | available. |
| 1079 | |
| 1080 | SMC communication |
| 1081 | ================= |
| 1082 | The main structure used for the SMC communication is defined in ``struct |
| 1083 | optee_msg_arg`` (in optee_msg.h_). If we are looking into the source code, we |
| 1084 | could see that communication mainly is achieved using ``optee_msg_arg`` and |
| 1085 | ``thread_smc_args`` (in thread.h_), where ``optee_msg_arg`` could be seen as the |
| 1086 | main structure. What will happen is that the :ref:`linux_kernel` driver will get |
| 1087 | the parameters either from :ref:`optee_client` or directly from an internal |
| 1088 | service in Linux kernel. The TEE driver will populate the struct |
| 1089 | ``optee_msg_arg`` with the parameters plus some additional bookkeeping |
| 1090 | information. Parameters for the SMC are passed in registers 1 to 7, register 0 |
| 1091 | holds the SMC id which among other things tells whether it is a standard or a |
| 1092 | fast call. |
| 1093 | |
| 1094 | ---- |
| 1095 | |
| 1096 | .. _thread_handling: |
| 1097 | |
| 1098 | Thread handling |
| 1099 | *************** |
| 1100 | OP-TEE core uses a couple of threads to be able to support running jobs in |
| 1101 | parallel (not fully enabled!). There are handlers for different purposes. In |
| 1102 | thread.c_ you will find a function called ``thread_init_primary(...)`` which |
| 1103 | assigns ``init_handlers`` (functions) that should be called when OP-TEE core |
| 1104 | receives standard or fast calls, FIQ and PSCI calls. There are default handlers |
| 1105 | for these services, but the platform can decide if they want to implement their |
| 1106 | own platform specific handlers instead. |
| 1107 | |
| 1108 | Synchronization primitives |
| 1109 | ========================== |
| 1110 | OP-TEE has three primitives for synchronization of threads and CPUs: |
| 1111 | *spin-lock*, *mutex*, and *condvar*. |
| 1112 | |
| 1113 | Spin-lock |
| 1114 | A spin-lock is represented as an ``unsigned int``. This is the most |
| 1115 | primitive lock. Interrupts should be disabled before attempting to take a |
| 1116 | spin-lock and should remain disabled until the lock is released. A spin-lock |
| 1117 | is initialized with ``SPINLOCK_UNLOCK``. |
| 1118 | |
| 1119 | .. list-table:: Spin lock functions |
| 1120 | :header-rows: 1 |
| 1121 | :widths: 1 5 |
| 1122 | |
| 1123 | * - Function |
| 1124 | - Purpose |
| 1125 | |
| 1126 | * - ``cpu_spin_lock(...)`` |
| 1127 | - Locks a spin-lock |
| 1128 | |
| 1129 | * - ``cpu_spin_trylock(...)`` |
| 1130 | - Locks a spin-lock if unlocked and returns ``0`` else the spin-lock |
| 1131 | is unchanged and the function returns ``!0`` |
| 1132 | |
| 1133 | * - ``cpu_spin_unlock(...)`` |
| 1134 | - Unlocks a spin-lock |
| 1135 | |
| 1136 | Mutex |
| 1137 | A mutex is represented by ``struct mutex``. A mutex can be locked and |
| 1138 | unlocked with interrupts enabled or disabled, but only from a normal thread. |
| 1139 | A mutex cannot be used in an interrupt handler, abort handler or before a |
| 1140 | thread has been selected for the CPU. A mutex is initialized with either |
| 1141 | ``MUTEX_INITIALIZER`` or ``mutex_init(...)``. |
| 1142 | |
| 1143 | .. list-table:: Mutex functions |
| 1144 | :header-rows: 1 |
| 1145 | :widths: 1 5 |
| 1146 | |
| 1147 | * - Function |
| 1148 | - Purpose |
| 1149 | |
| 1150 | * - ``mutex_lock(...)`` |
| 1151 | - Locks a mutex. If the mutex is unlocked this is a fast operation, |
| 1152 | else the function issues an RPC to wait in normal world. |
| 1153 | |
| 1154 | * - ``mutex_unlock(...)`` |
| 1155 | - Unlocks a mutex. If there is no waiters this is a fast operation, |
| 1156 | else the function issues an RPC to wake up a waiter in normal world. |
| 1157 | |
| 1158 | * - ``mutex_trylock(...)`` |
| 1159 | - Locks a mutex if unlocked and returns ``true`` else the mutex is |
| 1160 | unchanged and the function returns ``false``. |
| 1161 | |
| 1162 | * - ``mutex_destroy(...)`` |
| 1163 | - Asserts that the mutex is unlocked and there is no waiters, after |
| 1164 | this the memory used by the mutex can be freed. |
| 1165 | |
| 1166 | When a mutex is locked it is owned by the thread calling ``mutex_lock(...)`` |
| 1167 | or ``mutex_trylock(...)``, the mutex may only be unlocked by the thread |
| 1168 | owning the mutex. A thread should not exit to TA user space when holding a |
| 1169 | mutex. |
| 1170 | |
| 1171 | Condvar |
| 1172 | A condvar is represented by ``struct condvar``. A condvar is similar to a |
| 1173 | ``pthread_condvar_t`` in the pthreads standard, only less advanced. |
| 1174 | Condition variables are used to wait for some condition to be fulfilled and |
| 1175 | are always used together a mutex. Once a condition variable has been used |
| 1176 | together with a certain mutex, it must only be used with that mutex until |
| 1177 | destroyed. A condvar is initialized with ``CONDVAR_INITIALIZER`` or |
| 1178 | ``condvar_init(...)``. |
| 1179 | |
| 1180 | .. list-table:: Condvar functions |
| 1181 | :header-rows: 1 |
| 1182 | :widths: 1 5 |
| 1183 | |
| 1184 | * - Function |
| 1185 | - Purpose |
| 1186 | |
| 1187 | * - ``condvar_wait(...)`` |
| 1188 | - Atomically unlocks the supplied mutex and waits in normal world via |
| 1189 | an RPC for the condition variable to be signaled, when the function |
| 1190 | returns the mutex is locked again. |
| 1191 | |
| 1192 | * - ``condvar_signal(...)`` |
| 1193 | - Wakes up one waiter of the condition variable (waiting in |
| 1194 | ``condvar_wait(...)``). |
| 1195 | |
| 1196 | * - ``condvar_broadcast(...)`` |
| 1197 | - Wake up all waiters of the condition variable. |
| 1198 | |
| 1199 | The caller of ``condvar_signal(...)`` or ``condvar_broadcast(...)`` should |
| 1200 | hold the mutex associated with the condition variable to guarantee that a |
| 1201 | waiter does not miss the signal. |
| 1202 | |
| 1203 | .. _core/arch/arm/kernel/thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c |
| 1204 | .. _optee_msg.h: https://github.com/OP-TEE/optee_os/blob/master/core/include/optee_msg.h |
| 1205 | .. _optee_smc.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/sm/optee_smc.h |
| 1206 | .. _thread.c: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/kernel/thread.c |
| 1207 | .. _thread.h: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/include/kernel/thread.h |
| 1208 | |
| 1209 | .. _ARM_DEN0028A_SMC_Calling_Convention: http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf |
| 1210 | .. _Cortex-A53 TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf |
| 1211 | .. _drivers/tee/optee: https://github.com/torvalds/linux/tree/master/drivers/tee/optee |
| 1212 | .. _Trusted Firmware A: https://github.com/ARM-software/arm-trusted-firmware |