• Qu Wenruo's avatar
    btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock · b72c3aba
    Qu Wenruo authored
    [BUG]
    For certain crafted image, whose csum root leaf has missing backref, if
    we try to trigger write with data csum, it could cause deadlock with the
    following kernel WARN_ON():
    
      WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
      CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
      Workqueue: btrfs-endio-write btrfs_endio_write_helper
      RIP: 0010:btrfs_tree_lock+0x3e2/0x400
      Call Trace:
       btrfs_alloc_tree_block+0x39f/0x770
       __btrfs_cow_block+0x285/0x9e0
       btrfs_cow_block+0x191/0x2e0
       btrfs_search_slot+0x492/0x1160
       btrfs_lookup_csum+0xec/0x280
       btrfs_csum_file_blocks+0x2be/0xa60
       add_pending_csums+0xaf/0xf0
       btrfs_finish_ordered_io+0x74b/0xc90
       finish_ordered_fn+0x15/0x20
       normal_work_helper+0xf6/0x500
       btrfs_endio_write_helper+0x12/0x20
       process_one_work+0x302/0x770
       worker_thread+0x81/0x6d0
       kthread+0x180/0x1d0
       ret_from_fork+0x35/0x40
    
    [CAUSE]
    That crafted image has missing backref for csum tree root leaf.  And
    when we try to allocate new tree block, since there is no
    EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
    and use it.
    
    The extent tree of the image looks like:
    
      Normal image                      |       This fuzzed image
      ----------------------------------+--------------------------------
      BG 29360128                       | BG 29360128
       One empty slot                   |  One empty slot
      29364224: backref to UUID tree    | 29364224: backref to UUID tree
       Two empty slots                  |  Two empty slots
      29376512: backref to CSUM tree    |  One empty slot (bad type) <<<
      29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree
      ...                               | ...
    
    Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to
    alloc tree block, it's an valid slot for btrfs.
    
    And for finish_ordered_write, when we need to insert csum, we try to CoW
    csum tree root.
    
    By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K,
    BG_OFFSET + 12K is already used by tree block COW for other trees, the
    next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM
    tree.
    
    But due to the bad type, btrfs can recognize it and still consider it as
    an empty slot, and will try to use it for csum tree CoW.
    
    Then in the following call trace, we will try to lock the new tree
    block, which turns out to be the old csum tree root which is already
    locked:
    
    btrfs_search_slot() called on csum tree root, which is at 29376512
    |- btrfs_cow_block()
       |- btrfs_set_lock_block()
       |  |- Now locks tree block 29376512 (old csum tree root)
       |- __btrfs_cow_block()
          |- btrfs_alloc_tree_block()
             |- btrfs_reserve_extent()
                | Now it returns tree block 29376512, which extent tree
                | shows its empty slot, but it's already hold by csum tree
                |- btrfs_init_new_buffer()
                   |- btrfs_tree_lock()
                      | Triggers WARN_ON(eb->lock_owner == current->pid)
                      |- wait_event()
                         Wait lock owner to release the lock, but it's
                         locked by ourself, so it will deadlock
    
    [FIX]
    This patch will do the lock_owner and current->pid check at
    btrfs_init_new_buffer().
    So above deadlock can be avoided.
    
    Since such problem can only happen in crafted image, we will still
    trigger kernel warning for later aborted transaction, but with a little
    more meaningful warning message.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405Reported-by: default avatarXu Wen <wen.xu@gatech.edu>
    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b72c3aba
extent-tree.c 300 KB