1. 03 Jan, 2023 6 commits
    • Qu Wenruo's avatar
      btrfs: always report error in run_one_delayed_ref() · 39f501d6
      Qu Wenruo authored
      Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
      if end users hit such problem, there will be no chance that
      btrfs_debug() is enabled.  This can lead to very little useful info for
      debugging.
      
      This patch will:
      
      - Add extra info for error reporting
        Including:
        * logical bytenr
        * num_bytes
        * type
        * action
        * ref_mod
      
      - Replace the btrfs_debug() with btrfs_err()
      
      - Move the error reporting into run_one_delayed_ref()
        This is to avoid use-after-free, the @node can be freed in the caller.
      
      This error should only be triggered at most once.
      
      As if run_one_delayed_ref() failed, we trigger the error message, then
      causing the call chain to error out:
      
      btrfs_run_delayed_refs()
      `- btrfs_run_delayed_refs()
         `- btrfs_run_delayed_refs_for_head()
            `- run_one_delayed_ref()
      
      And we will abort the current transaction in btrfs_run_delayed_refs().
      If we have to run delayed refs for the abort transaction,
      run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
      new error messages would be output.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      39f501d6
    • Qu Wenruo's avatar
      btrfs: handle case when repair happens with dev-replace · d73a27b8
      Qu Wenruo authored
      [BUG]
      There is a bug report that a BUG_ON() in btrfs_repair_io_failure()
      (originally repair_io_failure() in v6.0 kernel) got triggered when
      replacing a unreliable disk:
      
        BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3
        kernel BUG at fs/btrfs/extent_io.c:2380!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G           OE      6.0.0-5-amd64 #1  Debian 6.0.10-2
        Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
        Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
        RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
        Call Trace:
         <TASK>
         clean_io_failure+0x14d/0x180 [btrfs]
         end_bio_extent_readpage+0x412/0x6e0 [btrfs]
         ? __switch_to+0x106/0x420
         process_one_work+0x1c7/0x380
         worker_thread+0x4d/0x380
         ? rescuer_thread+0x3a0/0x3a0
         kthread+0xe9/0x110
         ? kthread_complete_and_exit+0x20/0x20
         ret_from_fork+0x22/0x30
      
      [CAUSE]
      
      Before the BUG_ON(), we got some read errors from the replace target
      first, note the mirror number (3, which is beyond RAID1 duplication,
      thus it's read from the replace target device).
      
      Then at the BUG_ON() location, we are trying to writeback the repaired
      sectors back the failed device.
      
      The check looks like this:
      
      		ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
      				      &map_length, &bioc, mirror_num);
      		if (ret)
      			goto out_counter_dec;
      		BUG_ON(mirror_num != bioc->mirror_num);
      
      But inside btrfs_map_block(), we can modify bioc->mirror_num especially
      for dev-replace:
      
      	if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
      	    !need_full_stripe(op) && dev_replace->tgtdev != NULL) {
      		ret = get_extra_mirror_from_replace(fs_info, logical, *length,
      						    dev_replace->srcdev->devid,
      						    &mirror_num,
      					    &physical_to_patch_in_first_stripe);
      		patch_the_first_stripe_for_dev_replace = 1;
      	}
      
      Thus if we're repairing the replace target device, we're going to
      trigger that BUG_ON().
      
      But in reality, the read failure from the replace target device may be
      that, our replace hasn't reached the range we're reading, thus we're
      reading garbage, but with replace running, the range would be properly
      filled later.
      
      Thus in that case, we don't need to do anything but let the replace
      routine to handle it.
      
      [FIX]
      Instead of a BUG_ON(), just skip the repair if we're repairing the
      device replace target device.
      Reported-by: default avatar小太 <nospam@kota.moe>
      Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@mail.gmail.com/
      CC: stable@vger.kernel.org # 6.0+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d73a27b8
    • Filipe Manana's avatar
      btrfs: fix off-by-one in delalloc search during lseek · 2f2e84ca
      Filipe Manana authored
      During lseek, when searching for delalloc in a range that represents a
      hole and that range has a length of 1 byte, we end up not doing the actual
      delalloc search in the inode's io tree, resulting in not correctly
      reporting the offset with data or a hole. This actually only happens when
      the start offset is 0 because with any other start offset we round it down
      by sector size.
      
      Reproducer:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt/sdc
      
        $ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo
      
        $ xfs_io -c "seek -d 0" /mnt/sdc/foo
        Whence   Result
        DATA	   EOF
      
      It should have reported an offset of 0 instead of EOF.
      
      Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits()
      to deal with inclusive ranges properly. These functions are already
      supposed to work with inclusive end offsets, they just got it wrong in a
      couple places due to off-by-one mistakes.
      
      A test case for fstests will be added later.
      Reported-by: default avatarJoan Bruguera Micó <joanbrugueram@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/
      Fixes: b6e83356 ("btrfs: make hole and data seeking a lot more efficient")
      CC: stable@vger.kernel.org # 6.1
      Tested-by: default avatarJoan Bruguera Micó <joanbrugueram@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2f2e84ca
    • Qu Wenruo's avatar
      btrfs: fix false alert on bad tree level check · 1d854e4f
      Qu Wenruo authored
      [BUG]
      There is a bug report that on a RAID0 NVMe btrfs system, under heavy
      write load the filesystem can flip RO randomly.
      
      With extra debugging, it shows some tree blocks failed to pass their
      level checks, and if that happens at critical path of a transaction, we
      abort the transaction:
      
        BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1
        BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5)
        BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure
        BTRFS info (device nvme0n1p3: state EA): forced readonly
      
      [CAUSE]
      The reporter has already bisected to commit 947a6299 ("btrfs: move
      tree block parentness check into validate_extent_buffer()").
      
      And with extra debugging, it shows we can have btrfs_tree_parent_check
      filled with all zeros in the following call trace:
      
        submit_one_bio+0xd4/0xe0
        submit_extent_page+0x142/0x550
        read_extent_buffer_pages+0x584/0x9c0
        ? __pfx_end_bio_extent_readpage+0x10/0x10
        ? folio_unlock+0x1d/0x50
        btrfs_read_extent_buffer+0x98/0x150
        read_tree_block+0x43/0xa0
        read_block_for_search+0x266/0x370
        btrfs_search_slot+0x351/0xd30
        ? lock_is_held_type+0xe8/0x140
        btrfs_lookup_csum+0x63/0x150
        btrfs_csum_file_blocks+0x197/0x6c0
        ? sched_clock_cpu+0x9f/0xc0
        ? lock_release+0x14b/0x440
        ? _raw_read_unlock+0x29/0x50
        btrfs_finish_ordered_io+0x441/0x860
        btrfs_work_helper+0xfe/0x400
        ? lock_is_held_type+0xe8/0x140
        process_one_work+0x294/0x5b0
        worker_thread+0x4f/0x3a0
        ? __pfx_worker_thread+0x10/0x10
        kthread+0xf5/0x120
        ? __pfx_kthread+0x10/0x10
        ret_from_fork+0x2c/0x50
      
      Currently we only copy the btrfs_tree_parent_check structure into bbio
      at read_extent_buffer_pages() after we have assembled the bbio.
      
      But as shown above, submit_extent_page() itself can already submit the
      bbio, leaving the bbio->parent_check uninitialized, and cause the false
      alert.
      
      [FIX]
      Instead of copying @check into bbio after bbio is assembled, we pass
      @check in btrfs_bio_ctrl::parent_check, and copy the content of
      parent_check in submit_one_bio() for metadata read.
      
      By this we should be able to pass the needed info for metadata endio
      verification, and fix the false alert.
      Reported-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/
      Fixes: 947a6299 ("btrfs: move tree block parentness check into validate_extent_buffer()")
      Tested-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1d854e4f
    • Qu Wenruo's avatar
      btrfs: add error message for metadata level mismatch · 77177ed1
      Qu Wenruo authored
      From a recent regression report, we found that after commit 947a6299
      ("btrfs: move tree block parentness check into
      validate_extent_buffer()") if we have a level mismatch (false alert
      though), there is no error message at all.
      
      This makes later debugging harder.  This patch will add the proper error
      message for such case.
      
      Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      77177ed1
    • Tanmay Bhushan's avatar
      btrfs: fix ASSERT em->len condition in btrfs_get_extent · 946c2923
      Tanmay Bhushan authored
      The em->len value is supposed to be verified in the assertion condition
      though we expect it to be same as the sectorsize.
      
      Fixes: a196a894 ("btrfs: do not reset extent map members for inline extents read")
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarTanmay Bhushan <007047221b@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      946c2923
  2. 20 Dec, 2022 3 commits
    • Filipe Manana's avatar
      btrfs: fix fscrypt name leak after failure to join log transaction · fee4c199
      Filipe Manana authored
      When logging a new name, we don't expect to fail joining a log transaction
      since we know at least one of the inodes was logged before in the current
      transaction. However if we fail for some unexpected reason, we end up not
      freeing the fscrypt name we previously allocated. So fix that by freeing
      the name in case we failed to join a log transaction.
      
      Fixes: ab3c5c18 ("btrfs: setup qstr from dentrys using fscrypt helper")
      Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fee4c199
    • Josef Bacik's avatar
      btrfs: scrub: fix uninitialized return value in recover_scrub_rbio · e7fc357e
      Josef Bacik authored
      Commit 75b47033 ("btrfs: raid56: migrate recovery and scrub recovery
      path to use error_bitmap") introduced an uninitialized return variable.
      
      This can be caught by gcc 12.1 by -Wmaybe-uninitialized:
      
        CC [M]  fs/btrfs/raid56.o
      fs/btrfs/raid56.c: In function ‘scrub_rbio’:
      fs/btrfs/raid56.c:2801:15: warning: ‘ret’ may be used uninitialized [-Wmaybe-uninitialized]
       2801 |         ret = recover_scrub_rbio(rbio);
            |               ^~~~~~~~~~~~~~~~~~~~~~~~
      fs/btrfs/raid56.c:2649:13: note: ‘ret’ was declared here
       2649 |         int ret;
      
      The warning is disabled by default so we haven't caught that.
      
      Due to the bug the raid56 scrub fstests have been failing since the
      patch was merged, so initialize that.
      
      Fixes: 75b47033 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap")
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e7fc357e
    • Boris Burkov's avatar
      btrfs: fix resolving backrefs for inline extent followed by prealloc · 560840af
      Boris Burkov authored
      If a file consists of an inline extent followed by a regular or prealloc
      extent, then a legitimate attempt to resolve a logical address in the
      non-inline region will result in add_all_parents reading the invalid
      offset field of the inline extent. If the inline extent item is placed
      in the leaf eb s.t. it is the first item, attempting to access the
      offset field will not only be meaningless, it will go past the end of
      the eb and cause this panic:
      
        [17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8
        [17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI
        [17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199
        [17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        [17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110
        [17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202
        [17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000
        [17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001
        [17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff
        [17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918
        [17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd
        [17.663617] FS:  00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000
        [17.666525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0
        [17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [17.676034] PKRU: 55555554
        [17.677004] Call Trace:
        [17.677877]  add_all_parents+0x276/0x480
        [17.679325]  find_parent_nodes+0xfae/0x1590
        [17.680771]  btrfs_find_all_leafs+0x5e/0xa0
        [17.682217]  iterate_extent_inodes+0xce/0x260
        [17.683809]  ? btrfs_inode_flags_to_xflags+0x50/0x50
        [17.685597]  ? iterate_inodes_from_logical+0xa1/0xd0
        [17.687404]  iterate_inodes_from_logical+0xa1/0xd0
        [17.689121]  ? btrfs_inode_flags_to_xflags+0x50/0x50
        [17.691010]  btrfs_ioctl_logical_to_ino+0x131/0x190
        [17.692946]  btrfs_ioctl+0x104a/0x2f60
        [17.694384]  ? selinux_file_ioctl+0x182/0x220
        [17.695995]  ? __x64_sys_ioctl+0x84/0xc0
        [17.697394]  __x64_sys_ioctl+0x84/0xc0
        [17.698697]  do_syscall_64+0x33/0x40
        [17.700017]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [17.701753] RIP: 0033:0x7f64e72761b7
        [17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7
        [17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003
        [17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60
        [17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001
        [17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0
        [17.724839] Modules linked in:
      
      Fix the bug by detecting the inline extent item in add_all_parents and
      skipping to the next extent item.
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      560840af
  3. 15 Dec, 2022 5 commits
  4. 05 Dec, 2022 26 commits