• Qu Wenruo's avatar
    btrfs: output affected files when relocation fails · b9a9a850
    Qu Wenruo authored
    [PROBLEM]
    When relocation fails (mostly due to checksum mismatch), we only got
    very cryptic error messages like:
    
      BTRFS info (device dm-4): relocating block group 13631488 flags data
      BTRFS warning (device dm-4): csum failed root -9 ino 257 off 0 csum 0x373e1ae3 expected csum 0x98757625 mirror 1
      BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
      BTRFS info (device dm-4): balance: ended with status: -5
    
    The end user has to decipher the above messages and use various tools to
    locate the affected files and find a way to fix the problem (mostly
    deleting the file).  This is not an easy work even for experienced
    developer, not to mention the end users.
    
    [SCRUB IS DOING BETTER]
    By contrast, scrub is providing much better error messages:
    
      BTRFS error (device dm-4): unable to fixup (regular) error at logical 13631488 on dev /dev/mapper/test-scratch1 physical 13631488
      BTRFS warning (device dm-4): checksum error at logical 13631488 on dev /dev/mapper/test-scratch1, physical 13631488, root 5, inode 257, offset 0, length 4096, links 1 (path: file)
      BTRFS info (device dm-4): scrub: finished on devid 1 with status: 0
    
    Which provides the affected files directly to the end user.
    
    [IMPROVEMENT]
    Instead of the generic data checksum error messages, which is not doing
    a good job for data reloc inodes, this patch introduce a scrub like
    backref walking based solution.
    
    When a sector fails its checksum for data reloc inode, we go the
    following workflow:
    
    - Get the real logical bytenr
      For data reloc inode, the file offset is the offset inside the block
      group.
      Thus the real logical bytenr is @file_off + @block_group->start.
    
    - Do an extent type check
      If it's tree blocks it's much easier to handle, just go through
      all the tree block backref.
    
    - Do a backref walk and inode path resolution for data extents
      This is mostly the same as scrub.
      But unfortunately we can not reuse the same function as the output
      format is different.
    
    Now the new output would be more user friendly:
    
      BTRFS info (device dm-4): relocating block group 13631488 flags data
      BTRFS warning (device dm-4): csum failed root -9 ino 257 off 0 logical 13631488 csum 0x373e1ae3 expected csum 0x98757625 mirror 1
      BTRFS warning (device dm-4): checksum error at logical 13631488 mirror 1 root 5 inode 257 offset 0 length 4096 links 1 (path: file)
      BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
      BTRFS info (device dm-4): balance: ended with status: -5
    Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b9a9a850
relocation.h 1.11 KB