• Shaohua Li's avatar
    raid5: add basic stripe log · f6bed0ef
    Shaohua Li authored
    This introduces a simple log for raid5. Data/parity writing to raid
    array first writes to the log, then write to raid array disks. If
    crash happens, we can recovery data from the log. This can speed up
    raid resync and fix write hole issue.
    
    The log structure is pretty simple. Data/meta data is stored in block
    unit, which is 4k generally. It has only one type of meta data block.
    The meta data block can track 3 types of data, stripe data, stripe
    parity and flush block. MD superblock will point to the last valid
    meta data block. Each meta data block has checksum/seq number, so
    recovery can scan the log correctly. We store a checksum of stripe
    data/parity to the metadata block, so meta data and stripe data/parity
    can be written to log disk together. otherwise, meta data write must
    wait till stripe data/parity is finished.
    
    For stripe data, meta data block will record stripe data sector and
    size. Currently the size is always 4k. This meta data record can be made
    simpler if we just fix write hole (eg, we can record data of a stripe's
    different disks together), but this format can be extended to support
    caching in the future, which must record data address/size.
    
    For stripe parity, meta data block will record stripe sector. It's
    size should be 4k (for raid5) or 8k (for raid6). We always store p
    parity first. This format should work for caching too.
    
    flush block indicates a stripe is in raid array disks. Fixing write
    hole doesn't need this type of meta data, it's for caching extension.
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    f6bed0ef
raid5.h 23.4 KB