• Qu Wenruo's avatar
    btrfs: handle case when repair happens with dev-replace · d73a27b8
    Qu Wenruo authored
    [BUG]
    There is a bug report that a BUG_ON() in btrfs_repair_io_failure()
    (originally repair_io_failure() in v6.0 kernel) got triggered when
    replacing a unreliable disk:
    
      BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3
      kernel BUG at fs/btrfs/extent_io.c:2380!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G           OE      6.0.0-5-amd64 #1  Debian 6.0.10-2
      Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021
      Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
      RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs]
      Call Trace:
       <TASK>
       clean_io_failure+0x14d/0x180 [btrfs]
       end_bio_extent_readpage+0x412/0x6e0 [btrfs]
       ? __switch_to+0x106/0x420
       process_one_work+0x1c7/0x380
       worker_thread+0x4d/0x380
       ? rescuer_thread+0x3a0/0x3a0
       kthread+0xe9/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
    
    [CAUSE]
    
    Before the BUG_ON(), we got some read errors from the replace target
    first, note the mirror number (3, which is beyond RAID1 duplication,
    thus it's read from the replace target device).
    
    Then at the BUG_ON() location, we are trying to writeback the repaired
    sectors back the failed device.
    
    The check looks like this:
    
    		ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
    				      &map_length, &bioc, mirror_num);
    		if (ret)
    			goto out_counter_dec;
    		BUG_ON(mirror_num != bioc->mirror_num);
    
    But inside btrfs_map_block(), we can modify bioc->mirror_num especially
    for dev-replace:
    
    	if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
    	    !need_full_stripe(op) && dev_replace->tgtdev != NULL) {
    		ret = get_extra_mirror_from_replace(fs_info, logical, *length,
    						    dev_replace->srcdev->devid,
    						    &mirror_num,
    					    &physical_to_patch_in_first_stripe);
    		patch_the_first_stripe_for_dev_replace = 1;
    	}
    
    Thus if we're repairing the replace target device, we're going to
    trigger that BUG_ON().
    
    But in reality, the read failure from the replace target device may be
    that, our replace hasn't reached the range we're reading, thus we're
    reading garbage, but with replace running, the range would be properly
    filled later.
    
    Thus in that case, we don't need to do anything but let the replace
    routine to handle it.
    
    [FIX]
    Instead of a BUG_ON(), just skip the repair if we're repairing the
    device replace target device.
    Reported-by: default avatar小太 <nospam@kota.moe>
    Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@mail.gmail.com/
    CC: stable@vger.kernel.org # 6.0+
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    d73a27b8
bio.c 10.4 KB