David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 1 | ============================= |
| 2 | Guidance for writing policies |
| 3 | ============================= |
| 4 | |
| 5 | Try to keep transactionality out of it. The core is careful to |
| 6 | avoid asking about anything that is migrating. This is a pain, but |
| 7 | makes it easier to write the policies. |
| 8 | |
| 9 | Mappings are loaded into the policy at construction time. |
| 10 | |
| 11 | Every bio that is mapped by the target is referred to the policy. |
| 12 | The policy can return a simple HIT or MISS or issue a migration. |
| 13 | |
| 14 | Currently there's no way for the policy to issue background work, |
| 15 | e.g. to start writing back dirty blocks that are going to be evicted |
| 16 | soon. |
| 17 | |
| 18 | Because we map bios, rather than requests it's easy for the policy |
| 19 | to get fooled by many small bios. For this reason the core target |
| 20 | issues periodic ticks to the policy. It's suggested that the policy |
| 21 | doesn't update states (eg, hit counts) for a block more than once |
| 22 | for each tick. The core ticks by watching bios complete, and so |
| 23 | trying to see when the io scheduler has let the ios run. |
| 24 | |
| 25 | |
| 26 | Overview of supplied cache replacement policies |
| 27 | =============================================== |
| 28 | |
| 29 | multiqueue (mq) |
| 30 | --------------- |
| 31 | |
| 32 | This policy is now an alias for smq (see below). |
| 33 | |
| 34 | The following tunables are accepted, but have no effect:: |
| 35 | |
| 36 | 'sequential_threshold <#nr_sequential_ios>' |
| 37 | 'random_threshold <#nr_random_ios>' |
| 38 | 'read_promote_adjustment <value>' |
| 39 | 'write_promote_adjustment <value>' |
| 40 | 'discard_promote_adjustment <value>' |
| 41 | |
| 42 | Stochastic multiqueue (smq) |
| 43 | --------------------------- |
| 44 | |
| 45 | This policy is the default. |
| 46 | |
| 47 | The stochastic multi-queue (smq) policy addresses some of the problems |
| 48 | with the multiqueue (mq) policy. |
| 49 | |
| 50 | The smq policy (vs mq) offers the promise of less memory utilization, |
| 51 | improved performance and increased adaptability in the face of changing |
| 52 | workloads. smq also does not have any cumbersome tuning knobs. |
| 53 | |
| 54 | Users may switch from "mq" to "smq" simply by appropriately reloading a |
| 55 | DM table that is using the cache target. Doing so will cause all of the |
| 56 | mq policy's hints to be dropped. Also, performance of the cache may |
| 57 | degrade slightly until smq recalculates the origin device's hotspots |
| 58 | that should be cached. |
| 59 | |
| 60 | Memory usage |
| 61 | ^^^^^^^^^^^^ |
| 62 | |
| 63 | The mq policy used a lot of memory; 88 bytes per cache block on a 64 |
| 64 | bit machine. |
| 65 | |
| 66 | smq uses 28bit indexes to implement its data structures rather than |
| 67 | pointers. It avoids storing an explicit hit count for each block. It |
| 68 | has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of |
| 69 | the entries (each hotspot block covers a larger area than a single |
| 70 | cache block). |
| 71 | |
| 72 | All this means smq uses ~25bytes per cache block. Still a lot of |
| 73 | memory, but a substantial improvement nontheless. |
| 74 | |
| 75 | Level balancing |
| 76 | ^^^^^^^^^^^^^^^ |
| 77 | |
| 78 | mq placed entries in different levels of the multiqueue structures |
| 79 | based on their hit count (~ln(hit count)). This meant the bottom |
| 80 | levels generally had the most entries, and the top ones had very |
| 81 | few. Having unbalanced levels like this reduced the efficacy of the |
| 82 | multiqueue. |
| 83 | |
| 84 | smq does not maintain a hit count, instead it swaps hit entries with |
| 85 | the least recently used entry from the level above. The overall |
| 86 | ordering being a side effect of this stochastic process. With this |
| 87 | scheme we can decide how many entries occupy each multiqueue level, |
| 88 | resulting in better promotion/demotion decisions. |
| 89 | |
| 90 | Adaptability: |
| 91 | The mq policy maintained a hit count for each cache block. For a |
| 92 | different block to get promoted to the cache its hit count has to |
| 93 | exceed the lowest currently in the cache. This meant it could take a |
| 94 | long time for the cache to adapt between varying IO patterns. |
| 95 | |
| 96 | smq doesn't maintain hit counts, so a lot of this problem just goes |
| 97 | away. In addition it tracks performance of the hotspot queue, which |
| 98 | is used to decide which blocks to promote. If the hotspot queue is |
| 99 | performing badly then it starts moving entries more quickly between |
| 100 | levels. This lets it adapt to new IO patterns very quickly. |
| 101 | |
| 102 | Performance |
| 103 | ^^^^^^^^^^^ |
| 104 | |
| 105 | Testing smq shows substantially better performance than mq. |
| 106 | |
| 107 | cleaner |
| 108 | ------- |
| 109 | |
| 110 | The cleaner writes back all dirty blocks in a cache to decommission it. |
| 111 | |
| 112 | Examples |
| 113 | ======== |
| 114 | |
| 115 | The syntax for a table is:: |
| 116 | |
| 117 | cache <metadata dev> <cache dev> <origin dev> <block size> |
| 118 | <#feature_args> [<feature arg>]* |
| 119 | <policy> <#policy_args> [<policy arg>]* |
| 120 | |
| 121 | The syntax to send a message using the dmsetup command is:: |
| 122 | |
| 123 | dmsetup message <mapped device> 0 sequential_threshold 1024 |
| 124 | dmsetup message <mapped device> 0 random_threshold 8 |
| 125 | |
| 126 | Using dmsetup:: |
| 127 | |
| 128 | dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ |
| 129 | /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" |
| 130 | creates a 128GB large mapped device named 'blah' with the |
| 131 | sequential threshold set to 1024 and the random_threshold set to 8. |