David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ============================ |
| 4 | PCI Peer-to-Peer DMA Support |
| 5 | ============================ |
| 6 | |
| 7 | The PCI bus has pretty decent support for performing DMA transfers |
| 8 | between two devices on the bus. This type of transaction is henceforth |
| 9 | called Peer-to-Peer (or P2P). However, there are a number of issues that |
| 10 | make P2P transactions tricky to do in a perfectly safe way. |
| 11 | |
| 12 | One of the biggest issues is that PCI doesn't require forwarding |
| 13 | transactions between hierarchy domains, and in PCIe, each Root Port |
| 14 | defines a separate hierarchy domain. To make things worse, there is no |
| 15 | simple way to determine if a given Root Complex supports this or not. |
| 16 | (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel |
| 17 | only supports doing P2P when the endpoints involved are all behind the |
| 18 | same PCI bridge, as such devices are all in the same PCI hierarchy |
| 19 | domain, and the spec guarantees that all transactions within the |
| 20 | hierarchy will be routable, but it does not require routing |
| 21 | between hierarchies. |
| 22 | |
| 23 | The second issue is that to make use of existing interfaces in Linux, |
| 24 | memory that is used for P2P transactions needs to be backed by struct |
| 25 | pages. However, PCI BARs are not typically cache coherent so there are |
| 26 | a few corner case gotchas with these pages so developers need to |
| 27 | be careful about what they do with them. |
| 28 | |
| 29 | |
| 30 | Driver Writer's Guide |
| 31 | ===================== |
| 32 | |
| 33 | In a given P2P implementation there may be three or more different |
| 34 | types of kernel drivers in play: |
| 35 | |
| 36 | * Provider - A driver which provides or publishes P2P resources like |
| 37 | memory or doorbell registers to other drivers. |
| 38 | * Client - A driver which makes use of a resource by setting up a |
| 39 | DMA transaction to or from it. |
| 40 | * Orchestrator - A driver which orchestrates the flow of data between |
| 41 | clients and providers. |
| 42 | |
| 43 | In many cases there could be overlap between these three types (i.e., |
| 44 | it may be typical for a driver to be both a provider and a client). |
| 45 | |
| 46 | For example, in the NVMe Target Copy Offload implementation: |
| 47 | |
| 48 | * The NVMe PCI driver is both a client, provider and orchestrator |
| 49 | in that it exposes any CMB (Controller Memory Buffer) as a P2P memory |
| 50 | resource (provider), it accepts P2P memory pages as buffers in requests |
| 51 | to be used directly (client) and it can also make use of the CMB as |
| 52 | submission queue entries (orchestrator). |
| 53 | * The RDMA driver is a client in this arrangement so that an RNIC |
| 54 | can DMA directly to the memory exposed by the NVMe device. |
| 55 | * The NVMe Target driver (nvmet) can orchestrate the data from the RNIC |
| 56 | to the P2P memory (CMB) and then to the NVMe device (and vice versa). |
| 57 | |
| 58 | This is currently the only arrangement supported by the kernel but |
| 59 | one could imagine slight tweaks to this that would allow for the same |
| 60 | functionality. For example, if a specific RNIC added a BAR with some |
| 61 | memory behind it, its driver could add support as a P2P provider and |
| 62 | then the NVMe Target could use the RNIC's memory instead of the CMB |
| 63 | in cases where the NVMe cards in use do not have CMB support. |
| 64 | |
| 65 | |
| 66 | Provider Drivers |
| 67 | ---------------- |
| 68 | |
| 69 | A provider simply needs to register a BAR (or a portion of a BAR) |
| 70 | as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. |
| 71 | This will register struct pages for all the specified memory. |
| 72 | |
| 73 | After that it may optionally publish all of its resources as |
| 74 | P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow |
| 75 | any orchestrator drivers to find and use the memory. When marked in |
| 76 | this way, the resource must be regular memory with no side effects. |
| 77 | |
| 78 | For the time being this is fairly rudimentary in that all resources |
| 79 | are typically going to be P2P memory. Future work will likely expand |
| 80 | this to include other types of resources like doorbells. |
| 81 | |
| 82 | |
| 83 | Client Drivers |
| 84 | -------------- |
| 85 | |
| 86 | A client driver typically only has to conditionally change its DMA map |
| 87 | routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead |
| 88 | of the usual :c:func:`dma_map_sg()` function. Memory mapped in this |
| 89 | way does not need to be unmapped. |
| 90 | |
| 91 | The client may also, optionally, make use of |
| 92 | :c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping |
| 93 | functions and when to use the regular mapping functions. In some |
| 94 | situations, it may be more appropriate to use a flag to indicate a |
| 95 | given request is P2P memory and map appropriately. It is important to |
| 96 | ensure that struct pages that back P2P memory stay out of code that |
| 97 | does not have support for them as other code may treat the pages as |
| 98 | regular memory which may not be appropriate. |
| 99 | |
| 100 | |
| 101 | Orchestrator Drivers |
| 102 | -------------------- |
| 103 | |
| 104 | The first task an orchestrator driver must do is compile a list of |
| 105 | all client devices that will be involved in a given transaction. For |
| 106 | example, the NVMe Target driver creates a list including the namespace |
| 107 | block device and the RNIC in use. If the orchestrator has access to |
| 108 | a specific P2P provider to use it may check compatibility using |
| 109 | :c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider |
| 110 | that's compatible with all clients using :c:func:`pci_p2pmem_find()`. |
| 111 | If more than one provider is supported, the one nearest to all the clients will |
| 112 | be chosen first. If more than one provider is an equal distance away, the |
| 113 | one returned will be chosen at random (it is not an arbitrary but |
| 114 | truly random). This function returns the PCI device to use for the provider |
| 115 | with a reference taken and therefore when it's no longer needed it should be |
| 116 | returned with pci_dev_put(). |
| 117 | |
| 118 | Once a provider is selected, the orchestrator can then use |
| 119 | :c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to |
| 120 | allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` |
| 121 | and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for |
| 122 | allocating scatter-gather lists with P2P memory. |
| 123 | |
| 124 | Struct Page Caveats |
| 125 | ------------------- |
| 126 | |
| 127 | Driver writers should be very careful about not passing these special |
| 128 | struct pages to code that isn't prepared for it. At this time, the kernel |
| 129 | interfaces do not have any checks for ensuring this. This obviously |
| 130 | precludes passing these pages to userspace. |
| 131 | |
| 132 | P2P memory is also technically IO memory but should never have any side |
| 133 | effects behind it. Thus, the order of loads and stores should not be important |
| 134 | and ioreadX(), iowriteX() and friends should not be necessary. |
| 135 | |
| 136 | |
| 137 | P2P DMA Support Library |
| 138 | ======================= |
| 139 | |
| 140 | .. kernel-doc:: drivers/pci/p2pdma.c |
| 141 | :export: |