blob: 09cb29249ec1e71c12e7dbc39cfef57ca4de6be4 [file] [log] [blame]
Soby Mathewb4c6df42022-11-09 11:13:29 +00001.. SPDX-License-Identifier: BSD-3-Clause
2.. SPDX-FileCopyrightText: Copyright TF-RMM Contributors.
3
4.. _locking_rmm:
5
6RMM Locking Guidelines
7=========================
8
9This document outlines the locking requirements, discusses the implementation
10and provides guidelines for a deadlock free |RMM| implementation. Further, the
11document hitherto is based upon |RMM| Alpha-05 specification and is expected to
12change as the implementation proceeds.
13
14.. _locking_intro:
15
16Introduction
17-------------
18In order to meet the requirement for the |RMM| to be small, simple to reason
19about, and to co-exist with contemporary hypervisors which are already
20designed to manage system memory, the |RMM| does not include a memory allocator.
21It instead relies on an untrusted caller providing granules of memory used
22to hold both meta data to manage realms as well as code and data for realms.
23
24To maintain confidentiality and integrity of these granules, the |RMM|
25implements memory access controls by maintaining awareness of the state of each
26granule (aka Granule State, ref :ref:`locking_impl`) and enforcing rules on how
27memory granules can transition from one state to another and how a granule can
28be used depending on its state. For example, all granules that can be accessed
29by software outside the |PAR| of a realm are in a specific state, and a granule
30that holds meta data for a realm is in another specific state that prevents it
31from being used as data in a realm and accidentally corrupted by a realm, which
32could lead to internal failure in the |RMM|.
33
34Due to this complex nature of the operations supported by the |RMM|, for example
35when managing page tables for realms, the |RMM| must be able to hold locks on
36multiple objects at the same time. It is a well known fact that holding multiple
37locks at the same time can easily lead to deadlocking the system, as for example
38illustrated by the dining philosophers problem [EWD310]_. In traditional
39operating systems software such issues are avoided by defining a partial order
40on all system objects and always acquiring a lower-ordered object before a
41higher-ordered object. This solution was shown to be correct by Dijkstra
42[EWD625]_. Solutions are typically obtained by assigning an arbitrary order
43based upon certain attributes of the objects, for example by using the memory
44address of the object.
45
46Unfortunately, software such as the |RMM| cannot use these methods directly
47because the |RMM| receives an opaque pointer from the untrusted caller and it
48cannot know before locking the object if it is indeed of the expected state.
49Furthermore, MMU page tables are hierarchical data structures and operations on
50the page tables typically must be able to locate a leaf node in the hierarchy
51based on single value (a virtual address) and therefore must walk the page
52tables in their hierarchical order. This implies an order of objects in the same
53Granule State which is not known by a process executing in the |RMM| before
54holding at least one lock on object in the page table hierarchy. An obvious
55solution to these problems would be to use a single global lock for the |RMM|,
56but that would serialize all operations across all shared data structures in the
57system and severely impact performance.
58
59
60.. _locking_reqs:
61
62Requirements
63-------------
64
65To address the synchronization needs of the |RMM| described above, we must
66employ locking and lock-free mechanisms which satisfies a number of properties.
67These are discussed below:
68
69Critical Section
70*****************
71
72A critical section can be defined as a section of code within a process that
73requires access to shared resources and that must not be executed while
74another process is in a corresponding section of code [WS2001]_.
75
76Further, access to shared resources without appropriate synchronization can lead
77to **race conditions**, which can be defined as a situation in which multiple
78threads or processes read and write a shared item and the final result depends
79on the relative timing of their execution [WS2001]_.
80
81In terms of |RMM|, an access to a shared resource can be considered as a list
82of operations/instructions in program order that either reads from or writes to
83a shared memory location (e.g. the granule data structure or the memory granule
84described by the granule data structure, ref :ref:`locking_impl`). It is also
85understood that this list of operations does not execute indefinitely, but
86eventually terminates.
87
88We can now define our desired properties as follows:
89
90Mutual Exclusion
91*****************
92
93Mutual exclusion can be defined as the requirement that when one process is in a
94critical section that accesses shared resources, no other process may be in a
95critical section that accesses any of those shared resources [WS2001]_.
96
97The following example illustrates how an implementation might enforce mutual
98exclusion of critical sections using a lock on a valid granule data structure
99`struct granule *a`:
100
101.. code-block:: C
102
103 struct granule *a;
104 bool r;
105
106 r = try_lock(a);
107 if (!r) {
108 return -ERROR;
109 }
110 critical_section(a);
111 unlock(a);
112 other_work();
113
114We note that a process might fail to perform the `lock` operation on object `a`
115and return an error or successfully acquire the lock, execute the
116`critical_section()`, `unlock()` and then continue to make forward progress to
117`other_work()` function.
118
119Deadlock Avoidance
120*******************
121
122A deadlock can be defined as a situation in which two or more processes are
123unable to proceed because each is waiting for one of the others to do something
124[WS2001]_.
125
126In other words, one or more processes are trying to enter their critical
127sections but none of them make forward progress.
128
129We can then define the deadlock avoidance property as the inverse scenario:
130
131When one or more processes are trying to enter their critical sections, at least
132one of them makes forward progress.
133
134A deadlock is a fatal event if it occurs in supervisory software such as the
135|RMM|. This must be avoided as it can render the system vulnerable to exploits
136and/or unresponsive which may lead to data loss, interrupted service and
137eventually economic loss.
138
139Starvation Avoidance
140*********************
141
142Starvation can be defined as a situation in which a runnable process is
143overlooked indefinitely by the scheduler; although it is able to proceed, it is
144never chosen [WS2001]_.
145
146Then starvation avoidance can be defined as, all processes that are trying to
147enter their critical sections eventually make forward progress.
148
149Starvation must be avoided, because if one or more processes do not make forward
150progress, the PE on which the process runs will not perform useful work and
151will be lost to the user, resulting in similar issues like a deadlocked system.
152
153Nested Critical Sections
154*************************
155
156A critical section for an object may be nested within the critical section for
157another object for the same process. In other words, a process may enter more
158than one critical section at the same time.
159
160For example, if the |RMM| needs to copy data from one granule to another
161granule, and must be sure that both granules can only be modified by the process
162itself, it may be implemented in the following way:
163
164.. code-block:: C
165
166 struct granule *a;
167 struct granule *b;
168 bool r;
169
170 r = try_lock(a);
171 if (!r) {
172 return -ERROR;
173 }
174
175 /* critical section for granule a -- ENTER */
176
177 r = try_lock(b);
178 if (r) {
179 /* critical section for granule b -- ENTER */
180 b->foo = a->foo;
181 /* critical section for granule b -- EXIT */
182 unlock(b);
183 }
184
185 /* critical section for granule a -- EXIT */
186 unlock(a);
187
188.. _locking_impl:
189
190Implementation
191---------------
192
193The |RMM| maintains granule states by defining a data structure for each
194memory granule in the system. Conceptually, the data structure contains the
195following fields:
196
197* Granule State
198* Lock
199* Reference Count
200
201The Lock field provides mutual exclusion of processes executing in their
202critical sections which may access the shared granule data structure and the
203shared meta data which may be stored in the memory granule which is in one of
204the |RD|, |REC|, and Table states. Both the data structure describing
205the memory granule and the contents of the memory granule itself can be accessed
206by multiple PEs concurrently and we therefore require some concurrency protocol
207to avoid corruption of shared data structures. An alternative to using a lock
208providing mutual exclusion would be to design all operations that access shared
209data structures as lock-free algorithms, but due to the complexity of the data
210structures and the operation of the |RMM| we consider this too difficult to
211accomplish in practice.
212
213The Reference Count field is used to keep track of references between granules.
214For example, an |RD| describes a realm, and a |REC| describes an execution
215context within that realm, and therefore an |RD| must always exist when a |REC|
216exists. To prevent the |RMM| from destroying an |RD| while a |REC| still exists,
217the |RMM| holds a reference count on the |RD| for each |REC| associated with the
218same realm, and only when the all the RECs in a realm have been destroyed and
219the reference count on an |RD| drops to zero, can the |RD| be destroyed and the
220granule be repurposed for other use.
221
222Based on the above, we now describe the Granule State field and the current
223locking/refcount implementation:
224
225* **UnDelegated:** These are granules for which |RMM| does not prevent the |PAS|
226 of the granule from being changed by another agent to any value.
227 In this state, the granule content access is not protected by granule::lock,
228 as it is always subject to reads and writes from Non-Realm worlds.
229
230* **Delegated:** These are granules with memory only accessible by the |RMM|.
231 The granule content is protected by granule::lock. No reference counts are
232 held on this granule state.
233
234* **Realm Descriptor (RD):** These are granules containing meta data describing
235 a realm, and only accessible by the |RMM|. Granule content access is protected
236 by granule::lock. A reference count is also held on this granule for each
237 associated |REC| granule.
238
239* **Realm Execution Context (REC):** These are granules containing meta data
240 describing a virtual PE running in a realm, and are only accessible by the
241 |RMM|. The execution content access is not protected by granule::lock, because
242 we cannot enter a realm while holding the lock. Further, the following rules
243 apply with respect to the granule's reference counts:
244
245 - A reference count is held on this granule when a |REC| is running.
246
247 - As |REC| cannot be run on two PEs at the same time, the maximum value
248 of the reference count is one.
249
250 - When the |REC| is entered, the reference count is incremented
251 (set to 1) atomically while granule::lock is held.
252
253 - When the |REC| exits, the reference counter is released (set to 0)
254 atomically with store-release semantics without granule::lock being
255 held.
256
257 - The |RMM| can access the granule's content on the entry and exit path
258 from the |REC| while the reference is held.
259
260* **Translation Table:** These are granules containing meta data describing
261 virtual to physical address translation for the realm, accessible by the |RMM|
262 and the hardware Memory Management Unit (MMU). Granule content access is
263 protected by granule::lock, but hardware translation table walks may read the
264 RTT at any point in time. Multiple granules in this state can only be locked
265 at the same time if they are part of the same tree, and only in topological
266 order from root to leaf. The topological order of concatenated root level RTTs
267 is from the lowest address to the highest address. The complete internal
268 locking order for RTT granules is: RD -> [RTT] -> ... -> RTT. A reference
269 count is held on this granule for each entry in the RTT that refers to a
270 granule:
271
272 - Table s2tte.
273
274 - Valid s2tte.
275
276 - Valid_NS s2tte.
277
278 - Assigned s2tte.
279
280* **Data:** These are granules containing realm data, accessible by the |RMM|
281 and by the realm to which it belongs. Granule content access is not protected
282 by granule::lock, as it is always subject to reads and writes from within a
283 realm. A granule in this state is always referenced from exactly one entry in
284 an RTT granule which must be locked before locking this granule. Only a single
285 DATA granule can be locked at a time on a given PE. The complete internal
286 locking order for DATA granules is: RD -> RTT -> RTT -> ... -> DATA.
287 No reference counts are held on this granule type.
288
289
290Locking
291********
292
293The |RMM| uses spinlocks along with the object state for locking implementation.
294The lock provides similar exclusive acquire semantics known from trivial
295spinlock implementations, however also allows verification of whether the locked
296object is of an expected state.
297
298The data structure for the spinlock can be described in C as follows:
299
300.. code-block:: C
301
302 typedef struct {
303 unsigned int val;
304 } spinlock_t;
305
306This data structure can be embedded in any object that requires synchronization
307of access, such as the `struct granule` described above.
308
309The following operations are defined on spinlocks:
310
311.. code-block:: C
312 :caption: **Typical spinlock operations**
313
314 /*
315 * Locks a spinlock with acquire memory ordering semantics or goes into
316 * a tight loop (spins) and repeatedly checks the lock variable
317 * atomically until it becomes available.
318 */
319 void spinlock_acquire(spinlock_t *l);
320
321 /*
322 * Unlocks a spinlock with release memory ordering semantics. Must only
323 * be called if the calling PE already holds the lock.
324 */
325 void spinlock_release(spinlock_t *l);
326
327
328The above functions should not be directly used for locking/unlocking granules,
329instead the following should be used:
330
331.. code-block:: C
332 :caption: **Granule locking operations**
333
334 /*
335 * Acquires a lock (or spins until the lock is available), then checks
336 * if the granule is in the `expected_state`. If the `expected_state`
337 * is matched, then returns `true`. Otherwise, releases the lock and
338 * returns `false`.
339 */
340 bool granule_lock_on_state_match(struct granule *g,
341 enum granule_state expected_state);
342
343 /*
344 * Used when we're certain of the state of an object (e.g. because we
345 * hold a reference to it) or when locking objects whose reference is
346 * obtained from another object, after that objects is locked.
347 */
348 void granule_lock(struct granule *g,
349 enum granule_state expected_state);
350
351 /*
352 * Obtains a pointer to a locked granule at `addr` if `addr` is a valid
353 * granule physical address and the state of the granule at `addr` is
354 * `expected_state`.
355 */
356 struct granule *find_lock_granule(unsigned long addr,
357 enum granule_state expected_state);
358
359 /* Find two granules and lock them in order of their address. */
360 return_code_t find_lock_two_granules(unsigned long addr1,
361 enum granule_state expected_state1,
362 struct granule **g1,
363 unsigned long addr2,
364 enum granule_state expected_state2,
365 struct granule **g2);
366
367 /*
368 * Obtain a pointer to a locked granule at `addr` which is unused
369 * (refcount = 0), if `addr` is a valid granule physical address and the
370 * state of the granule at `addr` is `expected_state`.
371 */
372 struct granule *find_lock_unused_granule(unsigned long addr,
373 enum granule_state
374 expected_state);
375
376.. code-block:: C
377 :caption: **Granule unlocking operations**
378
379 /*
380 * Release a spinlock held on a granule. Must only be called if the
381 * calling PE already holds the lock.
382 */
383 void granule_unlock(struct granule *g);
384
385 /*
386 * Sets the state and releases a spinlock held on a granule. Must only
387 * be called if the calling PE already holds the lock.
388 */
389 void granule_unlock_transition(struct granule *g,
390 enum granule_state new_state);
391
392
393Reference Counting
394*******************
395
396The reference count is implemented using the **refcount** variable within the
397granule structure to keep track of the references in between granules. For
398example, the refcount is used to prevent changes to the attributes of a parent
399granule which is referenced by child granules, ie. a parent with refcount not
400equal to zero.
401
402Race conditions on the refcount variable are avoided by either locking the
403granule before accessing the variable or by lock-free mechanisms such as
404Single-Copy Atomic operations along with ARM weakly ordered
405ACQUIRE/RELEASE/RELAXED memory semantics to synchronize shared resources.
406
407The following operations are defined on refcount:
408
409.. code-block:: C
410 :caption: **Read a refcount value**
411
412 /*
413 * Single-copy atomic read of refcount variable with RELAXED memory
414 * ordering semantics. Use this function if lock-free access to the
415 * refcount is required with relaxed memory ordering constraints applied
416 * at that point.
417 */
418 unsigned long granule_refcount_read_relaxed(struct granule *g);
419
420 /*
421 * Single-copy atomic read of refcount variable with ACQUIRE memory
422 * ordering semantics. Use this function if lock-free access to the
423 * refcount is required with acquire memory ordering constraints applied
424 * at that point.
425 */
426 unsigned long granule_refcount_read_acquire(struct granule *g);
427
428.. code-block:: C
429 :caption: **Increment a refcount value**
430
431 /*
432 * Increments the granule refcount. Must be called with the granule
433 * lock held.
434 */
435 void __granule_get(struct granule *g);
436
437 /*
438 * Increments the granule refcount by `val`. Must be called with the
439 * granule lock held.
440 */
441 void __granule_refcount_inc(struct granule *g, unsigned long val);
442
443 /* Atomically increments the reference counter of the granule.*/
444 void atomic_granule_get(struct granule *g);
445
446
447.. code-block:: C
448 :caption: **Decrement a refcount value**
449
450 /*
451 * Decrements the granule refcount. Must be called with the granule
452 * lock held.
453 */
454 void __granule_put(struct granule *g);
455
456 /*
457 * Decrements the granule refcount by `val`. Asserts if refcount can
458 * become negative. Must be called with the granule lock held.
459 */
460 void __granule_refcount_dec(struct granule *g, unsigned long val);
461
462 /* Atomically decrements the reference counter of the granule. */
463 void atomic_granule_put(struct granule *g);
464
465 /*
466 * Atomically decrements the reference counter of the granule. Stores to
467 * memory with RELEASE semantics.
468 */
469 void atomic_granule_put_release(struct granule *g);
470
471.. code-block:: C
472 :caption: **Directly access refcount value**
473
474 /*
475 * Directly reads/writes the refcount variable. Must be called with the
476 * granule lock held.
477 */
478 granule->refcount;
479
480.. _locking_guidelines:
481
482Guidelines
483-----------
484
485In order to meet the :ref:`locking_reqs` discussed above, this section
486stipulates some locking and lock-free algorithm implementation guidelines for
487developers.
488
489Mutual Exclusion
490*****************
491
492The spinlock, acquire/release and atomic operations provide trivial mutual
493exclusion implementations for |RMM|. However, the following general guidelines
494should be taken into consideration:
495
496 - Appropriate deadlock avoidance techniques should be incorporated when
497 using multiple locks.
498
499 - Lock-free access to shared resources should be atomic.
500
501 - Memory ordering constraints should be used prudently to avoid
502 performance degradation. For e.g. on an unlocked granule (e.g. REC),
503 prior to the refcount update, if there are associated memory
504 operations, then the update should be done with release semantics.
505 However, if there are no associated memory accesses to the granule
506 prior to the refcount update then release semantics will not be
507 required.
508
509
510Deadlock Avoidance
511******************
512
513Deadlock avoidance is provided by defining a partial order on all objects in the
514system where the locking operation will eventually fail if the caller tries to
515acquire a lock of a different state object than expected. This means that no
516two processes will be expected to acquire locks in a different order than the
517defined partial order, and we can rely on the same reasoning for deadlock
518avoidance as shown by Dijkstra [EWD625]_.
519
520To establish this partial order, the objects referenced by |RMM| can be
521classified into two categories:
522
523#. **External**: A granule state belongs to the `external` class iff _any_
524 parameter in _any_ RMI command is an address of a granule which is expected
525 to be in that state. The following granule states are `external`:
526
527 - GRANULE_STATE_NS
528 - GRANULE_STATE_DELEGATED
529 - GRANULE_STATE_RD
530 - GRANULE_STATE_REC
AlexeiFedorov037add62024-10-30 15:53:05 +0000531 - DEV_GRANULE_STATE_NS
532 - DEV_GRANULE_STATE_DELEGATED
Soby Mathewb4c6df42022-11-09 11:13:29 +0000533
534#. **Internal**: A granule state belongs to the `internal` class iff it is not
Soby Mathewe4cb5172025-05-01 16:10:09 +0100535 an `external`. These are objects which are referenced from another
Soby Mathewb4c6df42022-11-09 11:13:29 +0000536 object after that object is locked. Each `internal` object should be
537 referenced from exactly one place. The following granule states are
538 `internal`:
539
540 - GRANULE_STATE_RTT
541 - GRANULE_STATE_DATA
AlexeiFedorov037add62024-10-30 15:53:05 +0000542 - DEV_GRANULE_STATE_MAPPED
Soby Mathewb4c6df42022-11-09 11:13:29 +0000543
544We now state the locking guidelines for |RMM| as:
545
546#. Granules expected to be in an `external` state must be locked before locking
547 any granules in an `internal` state.
548
549#. Granules expected to be in an `external` state must be locked in order of
550 their physical address, starting with the lowest address.
551
AlexeiFedorovd2e93932025-01-13 17:24:37 +0000552#. Memory granules expected to be in an `external` state must be locked before
553 locking any device memory granules in `external` state.
554
Soby Mathewb4c6df42022-11-09 11:13:29 +0000555#. Once a granule expected to be in an `external` state has been locked, its
556 state must be checked against the expected state. If these do not match, the
557 granule must be unlocked and no further granules may be locked within the
558 currently-executing RMM command.
559
560#. Granules in an `internal` state must be locked in order of state:
561
562 - `RTT`
563 - `DATA`
564
565#. Granules in the same `internal` state must be locked in the
566 :ref:`locking_impl` defined order for that specific state.
567
568#. A granule's state can be changed iff the granule is locked and the reference
569 count is zero.
570
571Starvation Avoidance
572********************
573
574Currently, the lock-free implementation for RMI.REC.Enter provides Starvation
575Avoidance in |RMM|. However, for the locking implementation, Starvation
576Avoidance is yet to be accomplished. This can be added by a ticket or MCS style
577locking implementation [MCS]_.
578
579Nested Critical Sections
580************************
581
582Spinlocks provide support for nested critical sections. Processes can acquire
583multiple spinlocks at the same time, as long as the locking order is not
584violated.
585
586References
587----------
588
589.. [EWD310] Dijkstra, E.W. Hierarchical ordering of sequential processes.
590 EWD 310.
591
592.. [EWD625] Dijkstra, E.W. Two starvation free solutions to a general exclusion
593 problem. EWD 625.
594
595.. [MCS] Mellor-Crummey, John M. and Scott, Michael L. Algorithms for scalable
596 synchronization on shared-memory multiprocessors. ACM TOCS, Volume 9,
597 Issue 1, Feb. 1991.
598
599.. [WS2001] Stallings, W. (2001). Operating systems: Internals and design
600 principles. Upper Saddle River, N.J: Prentice Hall.