• Shaohua Li's avatar
    MD: raid5 trim support · 620125f2
    Shaohua Li authored
    
    Discard for raid4/5/6 has limitation. If discard request size is
    small, we do discard for one disk, but we need calculate parity and
    write parity disk.  To correctly calculate parity, zero_after_discard
    must be guaranteed. Even it's true, we need do discard for one disk
    but write another disks, which makes the parity disks wear out
    fast. This doesn't make sense. So an efficient discard for raid4/5/6
    should discard all data disks and parity disks, which requires the
    write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
    is smaller than chunk_size, such pattern is almost impossible in
    practice. So in this patch, I only handle the case that A's size
    equals to chunk_size. That is discard request should be aligned to
    stripe size and its size is multiple of stripe size.
    
    Since we can only handle request with specific alignment and size (or
    part of the request fitting stripes), we can't guarantee
    zero_after_discard even zero_after_discard is true in low level
    drives.
    
    The block layer doesn't send down correctly aligned requests even
    correct discard alignment is set, so I must filter out.
    
    For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
    zero_after_discard is true for all disks, data is consistent after
    discard.  Otherwise, data might be lost. Let's consider a scenario:
    discard a stripe, write data to one disk and write parity disk. The
    stripe could be still inconsistent till then depending on using data
    from other data disks or parity disks to calculate new parity. If the
    disk is broken, we can't restore it. So in this patch, we only enable
    discard support if all disks have zero_after_discard.
    
    If discard fails in one disk, we face the similar inconsistent issue
    above. The patch will make discard follow the same path as normal
    write request. If discard fails, a resync will be scheduled to make
    the data consistent. This isn't good to have extra writes, but data
    consistency is important.
    
    If a subsequent read/write request hits raid5 cache of a discarded
    stripe, the discarded dev page should have zero filled, so the data is
    consistent. This patch will always zero dev page for discarded request
    stripe. This isn't optimal because discard request doesn't need such
    payload. Next patch will avoid it.
    Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    620125f2
raid5.h 20.3 KB