• Filipe Manana's avatar
    Btrfs: fix race between block group creation and their cache writeout · ff1f8250
    Filipe Manana authored
    So creating a block group has 2 distinct phases:
    
    Phase 1 - creates the btrfs_block_group_cache item and adds it to the
    rbtree fs_info->block_group_cache_tree and to the corresponding list
    space_info->block_groups[];
    
    Phase 2 - adds the block group item to the extent tree and corresponding
    items to the chunk tree.
    
    The first phase adds the block_group_cache_item to a list of pending block
    groups in the transaction handle, and phase 2 happens when
    btrfs_end_transaction() is called against the transaction handle.
    
    It happens that once phase 1 completes, other concurrent tasks that use
    their own transaction handle, but points to the same running transaction
    (struct btrfs_trans_handle->transaction), can use this block group for
    space allocations and therefore mark it dirty. Dirty block groups are
    tracked in a list belonging to the currently running transaction (struct
    btrfs_transaction) and not in the transaction handle (btrfs_trans_handle).
    
    This is a problem because once a task calls btrfs_commit_transaction(),
    it calls btrfs_start_dirty_block_groups() which will see all dirty block
    groups and attempt to start their writeout, including those that are
    still attached to the transaction handle of some concurrent task that
    hasn't called btrfs_end_transaction() yet - which means those block
    groups haven't gone through phase 2 yet and therefore when
    write_one_cache_group() is called, it won't find the block group items
    in the extent tree and abort the current transaction with -ENOENT,
    turning the fs into readonly mode and require a remount.
    
    Fix this by ignoring -ENOENT when looking for block group items in the
    extent tree when we attempt to start the writeout of the block group
    caches outside the critical section of the transaction commit. We will
    try again later during the critical section and if there we still don't
    find the block group item in the extent tree, we then abort the current
    transaction.
    
    This issue happened twice, once while running fstests btrfs/067 and once
    for btrfs/078, which produced the following trace:
    
    [ 3278.703014] WARNING: CPU: 7 PID: 18499 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
    [ 3278.707329] BTRFS: Transaction aborted (error -2)
    (...)
    [ 3278.731555] Call Trace:
    [ 3278.732396]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
    [ 3278.733860]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
    [ 3278.735312]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
    [ 3278.736874]  [<ffffffffa03ada6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [ 3278.738302]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
    [ 3278.739520]  [<ffffffffa03ada6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
    [ 3278.741222]  [<ffffffffa03b9e56>] write_one_cache_group+0xae/0xbf [btrfs]
    [ 3278.742797]  [<ffffffffa03c487b>] btrfs_start_dirty_block_groups+0x170/0x2b2 [btrfs]
    [ 3278.744492]  [<ffffffffa03d309c>] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
    [ 3278.746084]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
    [ 3278.747249]  [<ffffffffa03e5660>] btrfs_sync_file+0x313/0x387 [btrfs]
    [ 3278.748744]  [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4
    [ 3278.749958]  [<ffffffff81435b54>] ? ret_from_sys_call+0x1d/0x58
    [ 3278.751218]  [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e
    [ 3278.754197]  [<ffffffff8117ae54>] do_fsync+0x34/0x4e
    [ 3278.755192]  [<ffffffff8117b07c>] SyS_fsync+0x10/0x14
    [ 3278.756236]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
    [ 3278.757366] ---[ end trace 9a4d4df4969709aa ]---
    
    Fixes: 1bbc621e ("Btrfs: allow block group cache writeout
                          outside critical section in commit")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    ff1f8250
extent-tree.c 272 KB