1. 25 Jul, 2024 2 commits
    • Filipe Manana's avatar
      btrfs: fix corrupt read due to bad offset of a compressed extent map · de9f46cb
      Filipe Manana authored
      If we attempt to insert a compressed extent map that has a range that
      overlaps another extent map we have in the inode's extent map tree, we
      can end up with an incorrect offset after adjusting the new extent map at
      merge_extent_mapping() because we don't update the extent map's offset.
      
      For example consider the following scenario:
      
      1) We have a file extent item for a compressed extent covering the file
         range [108K, 144K) and currently there's no corresponding extent map
         in the inode's extent map tree;
      
      2) The inode's size is 141K;
      
      3) We have an encoded write (compressed) into the file range [120K, 128K),
         which overlaps the existing file extent item. The encoded write creates
         a matching extent map, adds it to the inode's extent map tree and
         creates an ordered extent for it.
      
         Note that the corresponding file extent item is added to the subvolume
         tree only when the ordered extent completes (when executing
         btrfs_finish_one_ordered());
      
      4) We have a write into the file range [160K, 164K).
      
         This writes increases the i_size of the file, and there's a hole
         between the current i_size (141K) and the start offset of this write,
         and since the old i_size is in the middle of the block [140K, 144K),
         we have to write zeroes to the range [141K, 144K) (3072 bytes) and
         therefore dirty that page.
      
         We then call btrfs_set_extent_delalloc() with a start offset of 140K.
         We then end up at btrfs_find_new_delalloc_bytes() which will call
         btrfs_get_extent() for the range [140K, 144K);
      
      5) The btrfs_get_extent() doesn't find any extent map in the inode's
         extent map tree covering the range [140K, 144K), so it searches the
         subvolume tree for any file extent items covering that range.
      
         There it finds the file extent item for the range [108K, 144K),
         creates a compressed extent map for that range and then calls
         btrfs_add_extent_mapping() with that extent map and passes the
         range [140K, 144K) via the "start" and "len" parameters;
      
      6) The call to add_extent_mapping() done by btrfs_add_extent_mapping()
         fails with -EEXIST because there's an extent map, created at step 2
         for the [120K, 128K) range, that covers that overlaps with the range
         of the given extent map ([108K, 144K)).
      
         Then it does a lookup for extent map from step 2 add calls
         merge_extent_mapping() to adjust the input extent map ([108K, 144K)).
         That adjust the extent map to a start offset of 128K and a length
         of 16K (starting just after the extent map from step 2), but it does
         not update the offset field of the extent map, leaving it with a value
         of zero instead of updating to a value of 20K (128K - 108K = 20K).
      
         As a result any read for the range [128K, 144K) can return
         incorrect data since we read from a wrong section of the extent (unless
         both the correct and incorrect ranges happen to have the same data).
      
      So fix this by changing merge_extent_mapping() to update the extent map's
      offset even if it's compressed. Also add a test case to the self tests.
      This didn't happen before the patchset that does big changes in the extent
      map structure (which includes the commit in the Fixes tag below) because
      we kept track of the original start offset in the extent map (member
      "orig_start") so we could always calculate the correct offset by
      subtracting that offset from the start offset.
      
      A test case for fstests that triggered this problem using send/receive
      with compressed writes will be added soon.
      
      Fixes: 3d2ac992 ("btrfs: introduce new members for extent_map")
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      de9f46cb
    • Qu Wenruo's avatar
      btrfs: tree-checker: validate dref root and objectid · f333a3c7
      Qu Wenruo authored
      [CORRUPTION]
      There is a bug report that btrfs flips RO due to a corruption in the
      extent tree, the involved dumps looks like this:
      
       	item 188 key (402811572224 168 4096) itemoff 14598 itemsize 79
       		extent refs 3 gen 3678544 flags 1
       		ref#0: extent data backref root 13835058055282163977 objectid 281473384125923 offset 81432576 count 1
       		ref#1: shared data backref parent 1947073626112 count 1
       		ref#2: shared data backref parent 1156030103552 count 1
       BTRFS critical (device vdc1: state EA): unable to find ref byte nr 402811572224 parent 0 root 265 owner 28703026 offset 81432576 slot 189
       BTRFS error (device vdc1: state EA): failed to run delayed ref for logical 402811572224 num_bytes 4096 type 178 action 2 ref_mod 1: -2
      
      [CAUSE]
      The corrupted entry is ref#0 of item 188.
      The root number 13835058055282163977 is beyond the upper limit for root
      items (the current limit is 1 << 48), and the objectid also looks
      suspicious.
      
      Only the offset and count is correct.
      
      [ENHANCEMENT]
      Although it's still unknown why we have such many bytes corrupted
      randomly, we can still enhance the tree-checker for data backrefs by:
      
      - Validate the root value
        For now there should only be 3 types of roots can have data backref:
        * subvolume trees
        * data reloc trees
        * root tree
          Only for v1 space cache
      
      - validate the objectid value
        The objectid should be a valid inode number.
      
      Hopefully we can catch such problem in the future with the new checkers.
      Reported-by: default avatarKai Krakow <hurikhan77@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAMthOuPjg5RDT-G_LXeBBUUtzt3cq=JywF+D1_h+JYxe=WKp-Q@mail.gmail.com/#tReviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f333a3c7
  2. 19 Jul, 2024 1 commit
    • Qu Wenruo's avatar
      btrfs: change BTRFS_MOUNT_* flags to 64bit type · c3ece6b7
      Qu Wenruo authored
      Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is
      going to cause compilation errors for some 32 bit systems, as their
      unsigned long is only 32 bits long, thus flag
      BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors.
      
      Fix the problem by:
      
      - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long
      - Migrate all mount option related variables to unsigned long long
        * btrfs_fs_info::mount_opt
        * btrfs_fs_context::mount_opt
        * mount_opt parameter of btrfs_check_options()
        * old_opts parameter of btrfs_remount_begin()
        * old_opts parameter of btrfs_remount_cleanup()
        * mount_opt parameter of btrfs_check_mountopts_zoned()
        * mount_opt and opt parameters of check_ro_option()
      
      Fixes: 32e62165 ("btrfs: introduce new "rescue=ignoresuperflags" mount option")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c3ece6b7
  3. 11 Jul, 2024 37 commits