• Josef Bacik's avatar
    btrfs: do not do delalloc reservation under page lock · f4b1363c
    Josef Bacik authored
    We ran into a deadlock in production with the fixup worker.  The stack
    traces were as follows:
    
    Thread responsible for the writeout, waiting on the page lock
    
      [<0>] io_schedule+0x12/0x40
      [<0>] __lock_page+0x109/0x1e0
      [<0>] extent_write_cache_pages+0x206/0x360
      [<0>] extent_writepages+0x40/0x60
      [<0>] do_writepages+0x31/0xb0
      [<0>] __writeback_single_inode+0x3d/0x350
      [<0>] writeback_sb_inodes+0x19d/0x3c0
      [<0>] __writeback_inodes_wb+0x5d/0xb0
      [<0>] wb_writeback+0x231/0x2c0
      [<0>] wb_workfn+0x308/0x3c0
      [<0>] process_one_work+0x1e0/0x390
      [<0>] worker_thread+0x2b/0x3c0
      [<0>] kthread+0x113/0x130
      [<0>] ret_from_fork+0x35/0x40
      [<0>] 0xffffffffffffffff
    
    Thread of the fixup worker who is holding the page lock
    
      [<0>] start_delalloc_inodes+0x241/0x2d0
      [<0>] btrfs_start_delalloc_roots+0x179/0x230
      [<0>] btrfs_alloc_data_chunk_ondemand+0x11b/0x2e0
      [<0>] btrfs_check_data_free_space+0x53/0xa0
      [<0>] btrfs_delalloc_reserve_space+0x20/0x70
      [<0>] btrfs_writepage_fixup_worker+0x1fc/0x2a0
      [<0>] normal_work_helper+0x11c/0x360
      [<0>] process_one_work+0x1e0/0x390
      [<0>] worker_thread+0x2b/0x3c0
      [<0>] kthread+0x113/0x130
      [<0>] ret_from_fork+0x35/0x40
      [<0>] 0xffffffffffffffff
    
    Thankfully the stars have to align just right to hit this.  First you
    have to end up in the fixup worker, which is tricky by itself (my
    reproducer does DIO reads into a MMAP'ed region, so not a common
    operation).  Then you have to have less than a page size of free data
    space and 0 unallocated space so you go down the "commit the transaction
    to free up pinned space" path.  This was accomplished by a random
    balance that was running on the host.  Then you get this deadlock.
    
    I'm still in the process of trying to force the deadlock to happen on
    demand, but I've hit other issues.  I can still trigger the fixup worker
    path itself so this patch has been tested in that regard, so the normal
    case is fine.
    
    Fixes: 87826df0 ("btrfs: delalloc for page dirtied out-of-band in fixup worker")
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    f4b1363c
inode.c 287 KB