Andrew Walbran | b2be7c6 | 2019-08-06 14:55:29 +0100 | [diff] [blame] | 1 | # VM interface |
| 2 | |
| 3 | This page provides an overview of the interface Hafnium provides to VMs. Hafnium |
| 4 | makes a distinction between the 'primary VM', which controls scheduling and has |
| 5 | more direct access to some hardware, and 'secondary VMs' which exist mostly to |
| 6 | provide services to the primary VM, and have a more paravirtualised interface. |
| 7 | The intention is that the primary VM can run a mostly unmodified operating |
Andrew Walbran | 6e524d7 | 2019-11-12 17:36:57 +0000 | [diff] [blame] | 8 | system (such as Linux) with the addition of a Hafnium driver which |
| 9 | [fulfils certain expectations](SchedulerExpectations.md), while secondary VMs |
| 10 | will run more specialised trusted OSes or bare-metal code which is designed with |
| 11 | Hafnium in mind. |
Andrew Walbran | b2be7c6 | 2019-08-06 14:55:29 +0100 | [diff] [blame] | 12 | |
| 13 | The interface documented here is what is planned for the first release of |
| 14 | Hafnium, not necessarily what is currently implemented. |
| 15 | |
Andrew Walbran | b784997 | 2019-11-15 15:23:43 +0000 | [diff] [blame] | 16 | [TOC] |
| 17 | |
Andrew Walbran | b2be7c6 | 2019-08-06 14:55:29 +0100 | [diff] [blame] | 18 | ## CPU scheduling |
| 19 | |
| 20 | The primary VM will have one vCPU for each physical CPU, and control the |
| 21 | scheduling. |
| 22 | |
| 23 | Secondary VMs will have a configurable number of vCPUs, scheduled on arbitrary |
| 24 | physical CPUs at the whims of the primary VM scheduler. |
| 25 | |
| 26 | All VMs will start with a single active vCPU. Subsequent vCPUs can be started |
| 27 | through PSCI. |
| 28 | |
| 29 | ## PSCI |
| 30 | |
| 31 | The primary VM will be able to control the physical CPUs through the following |
| 32 | PSCI 1.1 calls, which will be forwarded to the underlying implementation in EL3: |
| 33 | |
| 34 | * PSCI_VERSION |
| 35 | * PSCI_FEATURES |
| 36 | * PSCI_SYSTEM_OFF |
| 37 | * PSCI_SYSTEM_RESET |
| 38 | * PSCI_AFFINITY_INFO |
| 39 | * PSCI_CPU_SUSPEND |
| 40 | * PSCI_CPU_OFF |
| 41 | * PSCI_CPU_ON |
| 42 | |
| 43 | All other PSCI calls are unsupported. |
| 44 | |
| 45 | Secondary VMs will be able to control their vCPUs through the following PSCI 1.1 |
| 46 | calls, which will be implemented by Hafnium: |
| 47 | |
| 48 | * PSCI_VERSION |
| 49 | * PSCI_FEATURES |
| 50 | * PSCI_AFFINITY_INFO |
| 51 | * PSCI_CPU_SUSPEND |
| 52 | * PSCI_CPU_OFF |
| 53 | * PSCI_CPU_ON |
| 54 | |
| 55 | All other PSCI calls are unsupported. |
| 56 | |
| 57 | ## Hardware timers |
| 58 | |
| 59 | The primary VM will have access to both the physical and virtual EL1 timers |
| 60 | through the usual control registers (`CNT[PV]_TVAL_EL0` and `CNT[PV]_CTL_EL0`). |
| 61 | |
| 62 | Secondary VMs will have access to the virtual timer only, which will be emulated |
| 63 | with help from the kernel driver in the primary VM. |
| 64 | |
| 65 | ## Interrupts |
| 66 | |
| 67 | The primary VM will have direct access to control the physical GIC, and receive |
| 68 | all interrupts (other than anything already trapped by TrustZone). It will be |
| 69 | responsible for forwarding any necessary interrupts to secondary VMs. The |
| 70 | Interrupt Translation Service (ITS) will be disabled by Hafnium so that it |
| 71 | cannot be used to circumvent access controls. |
| 72 | |
| 73 | Secondary VMs will have access to a simple paravirtualized interrupt controller |
| 74 | through two hypercalls: one to enable or disable a given virtual interrupt ID, |
| 75 | and one to get and acknowledge the next pending interrupt. There is no concept |
| 76 | of interrupt priorities or a distinction between edge and level triggered |
| 77 | interrupts. Secondary VMs may also inject interrupts into their own vCPUs. |
| 78 | |
| 79 | ## Performance counters |
| 80 | |
| 81 | VMs will be blocked from accessing performance counter registers (for the |
| 82 | performance monitor extensions described in chapter D5 of the ARMv8-A reference |
| 83 | manual) in production, to prevent them from being used as a side channel to leak |
| 84 | data between VMs. |
| 85 | |
| 86 | Hafnium may allow VMs to use them in debug builds. |
| 87 | |
| 88 | ## Debug registers |
| 89 | |
| 90 | VMs will be blocked from accessing debug registers in production builds, to |
| 91 | prevent them from being used to circumvent access controls. |
| 92 | |
| 93 | Hafnium may allow VMs to use these registers in debug builds. |
| 94 | |
| 95 | ## RAS Extension registers |
| 96 | |
Fuad Tabba | 66476b3 | 2019-10-29 10:32:04 +0000 | [diff] [blame] | 97 | Secondary VMs will be blocked from using registers associated with the RAS |
| 98 | Extension. |
Andrew Walbran | b2be7c6 | 2019-08-06 14:55:29 +0100 | [diff] [blame] | 99 | |
| 100 | ## Asynchronous message passing |
| 101 | |
| 102 | VMs will be able to send messages of up to 4 KiB to each other asynchronously, |
| 103 | with no queueing, as specified by SPCI. |
| 104 | |
| 105 | ## Memory |
| 106 | |
| 107 | VMs will statically be given access to mutually-exclusive regions of the |
| 108 | physical address space at boot. This includes MMIO space for controlling |
| 109 | devices, plus a fixed amount of RAM for secondaries, and all remaining address |
| 110 | space to the primary. Note that this means that only one VM can control any |
| 111 | given page of MMIO registers for a device. |
| 112 | |
| 113 | VMs may choose to donate or share their memory with other VMs at runtime. Any |
| 114 | given page may be shared with at most 2 VMs at once (including the original |
| 115 | owning VM). Memory which has been donated or shared may not be forcefully |
| 116 | reclaimed, but the VM with which it was shared may choose to return it. |
| 117 | |
Fuad Tabba | 7a31b8d | 2019-10-28 15:17:27 +0000 | [diff] [blame] | 118 | ## Cache |
| 119 | |
| 120 | VMs will be blocked from using cache maintenance instructions that operate by |
| 121 | set/way. These operations are difficult to virtualize, and could expose the |
| 122 | system to side-channel attacks. |
| 123 | |
Andrew Walbran | b2be7c6 | 2019-08-06 14:55:29 +0100 | [diff] [blame] | 124 | ## Logging |
| 125 | |
| 126 | VMs may send a character to a shared log by means of a hypercall or SMC call. |
| 127 | These log messages will be buffered per VM to make complete lines, then output |
| 128 | to a Hafnium-owned UART and saved in a shared ring buffer which may be extracted |
| 129 | from RAM dumps. VM IDs will be prepended to these logs. |
| 130 | |
| 131 | This log API is intended for use in early bringup and low-level debugging. No |
| 132 | sensitive data should be logged through it. Higher level logs can be sent to the |
| 133 | primary VM through the asynchronous message passing mechanism described above, |
| 134 | or through shared memory. |
| 135 | |
| 136 | ## Configuration |
| 137 | |
| 138 | Hafnium will read configuration from a flattened device tree blob (FDT). This |
| 139 | may either be the same device tree used for the other details of the system or a |
| 140 | separate minimal one just for Hafnium. This will include at least: |
| 141 | |
| 142 | * The available RAM. |
| 143 | * The number of secondary VMs, how many vCPUs each should have, how much |
| 144 | memory to assign to each of them, and where to load their initial images. |
| 145 | (Most likely the initial image will be a minimal loader supplied with |
| 146 | Hafnium which will validate and load the rest of the image from the primary |
| 147 | later on.) |
| 148 | * Which devices exist on the system, their details (MMIO regions, interrupts |
| 149 | and SYSMMU details), and which VM each is assigned to. |
| 150 | * A single physical device may be split into multiple logical ‘devices’ |
| 151 | from Hafnium’s point of view if necessary to have different VMs own |
| 152 | different parts of it. |
| 153 | * A whitelist of which SMC calls each VM is allowed to make. |
| 154 | |
| 155 | ## Failure handling |
| 156 | |
| 157 | If a secondary VM tries to do something it shouldn't, Hafnium will either inject |
| 158 | a fault or kill it and inform the primary VM. The primary VM may choose to |
| 159 | restart the system or to continue without the secondary VM. |
| 160 | |
| 161 | If the primary VM tries to do something it shouldn't, Hafnium will either inject |
| 162 | a fault or restart the system. |
| 163 | |
| 164 | ## TrustZone communication |
| 165 | |
| 166 | The primary VM will be able to communicate with a TEE running in TrustZone |
| 167 | either through SPCI messages or through whitelisted SMC calls, and through |
| 168 | shared memory. |
| 169 | |
| 170 | ## Other SMC calls |
| 171 | |
| 172 | Other than the PSCI calls described above and those used to communicate with |
| 173 | Hafnium, all other SMC calls will be blocked by default. Hafnium will allow SMC |
| 174 | calls to be whitelisted on a per-VM, per-function ID basis, as part of the |
| 175 | static configuration described above. These whitelisted SMC calls will be |
| 176 | forwarded to the EL3 handler with the client ID (as described by the SMCCC) set |
| 177 | to the calling VM's ID. |