David Brazdil | 0f672f6 | 2019-12-10 10:32:29 +0000 | [diff] [blame^] | 1 | ================== |
| 2 | Partial Parity Log |
| 3 | ================== |
| 4 | |
| 5 | Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue |
| 6 | addressed by PPL is that after a dirty shutdown, parity of a particular stripe |
| 7 | may become inconsistent with data on other member disks. If the array is also |
| 8 | in degraded state, there is no way to recalculate parity, because one of the |
| 9 | disks is missing. This can lead to silent data corruption when rebuilding the |
| 10 | array or using it is as degraded - data calculated from parity for array blocks |
| 11 | that have not been touched by a write request during the unclean shutdown can |
| 12 | be incorrect. Such condition is known as the RAID5 Write Hole. Because of |
| 13 | this, md by default does not allow starting a dirty degraded array. |
| 14 | |
| 15 | Partial parity for a write operation is the XOR of stripe data chunks not |
| 16 | modified by this write. It is just enough data needed for recovering from the |
| 17 | write hole. XORing partial parity with the modified chunks produces parity for |
| 18 | the stripe, consistent with its state before the write operation, regardless of |
| 19 | which chunk writes have completed. If one of the not modified data disks of |
| 20 | this stripe is missing, this updated parity can be used to recover its |
| 21 | contents. PPL recovery is also performed when starting an array after an |
| 22 | unclean shutdown and all disks are available, eliminating the need to resync |
| 23 | the array. Because of this, using write-intent bitmap and PPL together is not |
| 24 | supported. |
| 25 | |
| 26 | When handling a write request PPL writes partial parity before new data and |
| 27 | parity are dispatched to disks. PPL is a distributed log - it is stored on |
| 28 | array member drives in the metadata area, on the parity drive of a particular |
| 29 | stripe. It does not require a dedicated journaling drive. Write performance is |
| 30 | reduced by up to 30%-40% but it scales with the number of drives in the array |
| 31 | and the journaling drive does not become a bottleneck or a single point of |
| 32 | failure. |
| 33 | |
| 34 | Unlike raid5-cache, the other solution in md for closing the write hole, PPL is |
| 35 | not a true journal. It does not protect from losing in-flight data, only from |
| 36 | silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is |
| 37 | performed for this stripe (parity is not updated). So it is possible to have |
| 38 | arbitrary data in the written part of a stripe if that disk is lost. In such |
| 39 | case the behavior is the same as in plain raid5. |
| 40 | |
| 41 | PPL is available for md version-1 metadata and external (specifically IMSM) |
| 42 | metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl. |
| 43 | |
| 44 | There is a limitation of maximum 64 disks in the array for PPL. It allows to |
| 45 | keep data structures and implementation simple. RAID5 arrays with so many disks |
| 46 | are not likely due to high risk of multiple disks failure. Such restriction |
| 47 | should not be a real life limitation. |