• Naohiro Aota's avatar
    btrfs: zoned: prevent allocation from previous data relocation BG · 343d8a30
    Naohiro Aota authored
    After commit 5f0addf7 ("btrfs: zoned: use dedicated lock for data
    relocation"), we observe IO errors on e.g, btrfs/232 like below.
    
      [09.0][T4038707] WARNING: CPU: 3 PID: 4038707 at fs/btrfs/extent-tree.c:2381 btrfs_cross_ref_exist+0xfc/0x120 [btrfs]
      <snip>
      [09.9][T4038707] Call Trace:
      [09.5][T4038707]  <TASK>
      [09.3][T4038707]  run_delalloc_nocow+0x7f1/0x11a0 [btrfs]
      [09.6][T4038707]  ? test_range_bit+0x174/0x320 [btrfs]
      [09.2][T4038707]  ? fallback_to_cow+0x980/0x980 [btrfs]
      [09.3][T4038707]  ? find_lock_delalloc_range+0x33e/0x3e0 [btrfs]
      [09.5][T4038707]  btrfs_run_delalloc_range+0x445/0x1320 [btrfs]
      [09.2][T4038707]  ? test_range_bit+0x320/0x320 [btrfs]
      [09.4][T4038707]  ? lock_downgrade+0x6a0/0x6a0
      [09.2][T4038707]  ? orc_find.part.0+0x1ed/0x300
      [09.5][T4038707]  ? __module_address.part.0+0x25/0x300
      [09.0][T4038707]  writepage_delalloc+0x159/0x310 [btrfs]
      <snip>
      [09.4][    C3] sd 10:0:1:0: [sde] tag#2620 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
      [09.5][    C3] sd 10:0:1:0: [sde] tag#2620 Sense Key : Illegal Request [current]
      [09.9][    C3] sd 10:0:1:0: [sde] tag#2620 Add. Sense: Unaligned write command
      [09.5][    C3] sd 10:0:1:0: [sde] tag#2620 CDB: Write(16) 8a 00 00 00 00 00 02 f3 63 87 00 00 00 2c 00 00
      [09.4][    C3] critical target error, dev sde, sector 396041272 op 0x1:(WRITE) flags 0x800 phys_seg 3 prio class 0
      [09.9][    C3] BTRFS error (device dm-1): bdev /dev/mapper/dml_102_2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
    
    The IO errors occur when we allocate a regular extent in previous data
    relocation block group.
    
    On zoned btrfs, we use a dedicated block group to relocate a data
    extent. Thus, we allocate relocating data extents (pre-alloc) only from
    the dedicated block group and vice versa. Once the free space in the
    dedicated block group gets tight, a relocating extent may not fit into
    the block group. In that case, we need to switch the dedicated block
    group to the next one. Then, the previous one is now freed up for
    allocating a regular extent. The BG is already not enough to allocate
    the relocating extent, but there is still room to allocate a smaller
    extent. Now the problem happens. By allocating a regular extent while
    nocow IOs for the relocation is still on-going, we will issue WRITE IOs
    (for relocation) and ZONE APPEND IOs (for the regular writes) at the
    same time. That mixed IOs confuses the write pointer and arises the
    unaligned write errors.
    
    This commit introduces a new bit 'zoned_data_reloc_ongoing' to the
    btrfs_block_group. We set this bit before releasing the dedicated block
    group, and no extent are allocated from a block group having this bit
    set. This bit is similar to setting block_group->ro, but is different from
    it by allowing nocow writes to start.
    
    Once all the nocow IO for relocation is done (hooked from
    btrfs_finish_ordered_io), we reset the bit to release the block group for
    further allocation.
    
    Fixes: c2707a25 ("btrfs: zoned: add a dedicated data relocation block group")
    CC: stable@vger.kernel.org # 5.16+
    Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    343d8a30
extent-tree.c 167 KB