• Liu Bo's avatar
    Btrfs: make raid6 rebuild retry more · 8810f751
    Liu Bo authored
    There is a scenario that can end up with rebuild process failing to
    return good content, i.e.
    suppose that all disks can be read without problems and if the content
    that was read out doesn't match its checksum, currently for raid6
    btrfs at most retries twice,
    
    - the 1st retry is to rebuild with all other stripes, it'll eventually
      be a raid5 xor rebuild,
    - if the 1st fails, the 2nd retry will deliberately fail parity p so
      that it will do raid6 style rebuild,
    
    however, the chances are that another non-parity stripe content also
    has something corrupted, so that the above retries are not able to
    return correct content, and users will think of this as data loss.
    More seriouly, if the loss happens on some important internal btree
    roots, it could refuse to mount.
    
    This extends btrfs to do more retries and each retry fails only one
    stripe.  Since raid6 can tolerate 2 disk failures, if there is one
    more failure besides the failure on which we're recovering, this can
    always work.
    
    The worst case is to retry as many times as the number of raid6 disks,
    but given the fact that such a scenario is really rare in practice,
    it's still acceptable.
    Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    8810f751
volumes.c 192 KB