• Josef Bacik's avatar
    btrfs: do not start relocation until in progress drops are done · b4be6aef
    Josef Bacik authored
    We hit a bug with a recovering relocation on mount for one of our file
    systems in production.  I reproduced this locally by injecting errors
    into snapshot delete with balance running at the same time.  This
    presented as an error while looking up an extent item
    
      WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
      CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
      RIP: 0010:lookup_inline_extent_backref+0x647/0x680
      RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
      RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
      R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
      R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
      FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       insert_inline_extent_backref+0x46/0xd0
       __btrfs_inc_extent_ref.isra.0+0x5f/0x200
       ? btrfs_merge_delayed_refs+0x164/0x190
       __btrfs_run_delayed_refs+0x561/0xfa0
       ? btrfs_search_slot+0x7b4/0xb30
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_run_delayed_refs+0x73/0x1f0
       ? btrfs_update_root+0x1a9/0x2c0
       btrfs_commit_transaction+0x50/0xa50
       ? btrfs_update_reloc_root+0x122/0x220
       prepare_to_merge+0x29f/0x320
       relocate_block_group+0x2b8/0x550
       btrfs_relocate_block_group+0x1a6/0x350
       btrfs_relocate_chunk+0x27/0xe0
       btrfs_balance+0x777/0xe60
       balance_kthread+0x35/0x50
       ? btrfs_balance+0xe60/0xe60
       kthread+0x16b/0x190
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x22/0x30
       </TASK>
    
    Normally snapshot deletion and relocation are excluded from running at
    the same time by the fs_info->cleaner_mutex.  However if we had a
    pending balance waiting to get the ->cleaner_mutex, and a snapshot
    deletion was running, and then the box crashed, we would come up in a
    state where we have a half deleted snapshot.
    
    Again, in the normal case the snapshot deletion needs to complete before
    relocation can start, but in this case relocation could very well start
    before the snapshot deletion completes, as we simply add the root to the
    dead roots list and wait for the next time the cleaner runs to clean up
    the snapshot.
    
    Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
    had a pending drop_progress key.  If they do then we know we were in the
    middle of the drop operation and set a flag on the fs_info.  Then
    balance can wait until this flag is cleared to start up again.
    
    If there are DEAD_ROOT's that don't have a drop_progress set then we're
    safe to start balance right away as we'll be properly protected by the
    cleaner_mutex.
    
    CC: stable@vger.kernel.org # 5.10+
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b4be6aef
root-tree.c 14.4 KB