• Dave Chinner's avatar
    xfs: don't block in busy flushing when freeing extents · 8ebbf262
    Dave Chinner authored
    If the current transaction holds a busy extent and we are trying to
    allocate a new extent to fix up the free list, we can deadlock if
    the AG is entirely empty except for the busy extent held by the
    transaction.
    
    This can occur at runtime processing an XEFI with multiple extents
    in this path:
    
    __schedule+0x22f at ffffffff81f75e8f
    schedule+0x46 at ffffffff81f76366
    xfs_extent_busy_flush+0x69 at ffffffff81477d99
    xfs_alloc_ag_vextent_size+0x16a at ffffffff8141711a
    xfs_alloc_ag_vextent+0x19b at ffffffff81417edb
    xfs_alloc_fix_freelist+0x22f at ffffffff8141896f
    xfs_free_extent_fix_freelist+0x6a at ffffffff8141939a
    __xfs_free_extent+0x99 at ffffffff81419499
    xfs_trans_free_extent+0x3e at ffffffff814a6fee
    xfs_extent_free_finish_item+0x24 at ffffffff814a70d4
    xfs_defer_finish_noroll+0x1f7 at ffffffff81441407
    xfs_defer_finish+0x11 at ffffffff814417e1
    xfs_itruncate_extents_flags+0x13d at ffffffff8148b7dd
    xfs_inactive_truncate+0xb9 at ffffffff8148bb89
    xfs_inactive+0x227 at ffffffff8148c4f7
    xfs_fs_destroy_inode+0xb8 at ffffffff81496898
    destroy_inode+0x3b at ffffffff8127d2ab
    do_unlinkat+0x1d1 at ffffffff81270df1
    do_syscall_64+0x40 at ffffffff81f6b5f0
    entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8200007c
    
    This can also happen in log recovery when processing an EFI
    with multiple extents through this path:
    
    context_switch() kernel/sched/core.c:3881
    __schedule() kernel/sched/core.c:5111
    schedule() kernel/sched/core.c:5186
    xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598
    xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641
    xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828
    xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362
    xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029
    __xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067
    xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370
    xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626
    xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605
    xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893
    xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824
    xfs_log_mount_finish() fs/xfs/xfs_log.c:764
    xfs_mountfs() fs/xfs/xfs_mount.c:978
    xfs_fs_fill_super() fs/xfs/xfs_super.c:1908
    mount_bdev() fs/super.c:1417
    xfs_fs_mount() fs/xfs/xfs_super.c:1985
    legacy_get_tree() fs/fs_context.c:647
    vfs_get_tree() fs/super.c:1547
    do_new_mount() fs/namespace.c:2843
    do_mount() fs/namespace.c:3163
    ksys_mount() fs/namespace.c:3372
    __do_sys_mount() fs/namespace.c:3386
    __se_sys_mount() fs/namespace.c:3383
    __x64_sys_mount() fs/namespace.c:3383
    do_syscall_64() arch/x86/entry/common.c:296
    entry_SYSCALL_64() arch/x86/entry/entry_64.S:180
    
    To avoid this deadlock, we should not block in
    xfs_extent_busy_flush() if we hold a busy extent in the current
    transaction.
    
    Now that the EFI processing code can handle requeuing a partially
    completed EFI, we can detect this situation in
    xfs_extent_busy_flush() and return -EAGAIN rather than going to
    sleep forever. The -EAGAIN get propagated back out to the
    xfs_trans_free_extent() context, where the EFD is populated and the
    transaction is rolled, thereby moving the busy extents into the CIL.
    
    At this point, we can retry the extent free operation again with a
    clean transaction. If we hit the same "all free extents are busy"
    situation when trying to fix up the free list, we can safely call
    xfs_extent_busy_flush() and wait for the busy extents to resolve
    and wake us. At this point, the allocation search can make progress
    again and we can fix up the free list.
    
    This deadlock was first reported by Chandan in mid-2021, but I
    couldn't make myself understood during review, and didn't have time
    to fix it myself.
    
    It was reported again in March 2023, and again I have found myself
    unable to explain the complexities of the solution needed during
    review.
    
    As such, I don't have hours more time to waste trying to get the
    fix written the way it needs to be written, so I'm just doing it
    myself. This patchset is largely based on Wengang Wang's last patch,
    but with all the unnecessary stuff removed, split up into multiple
    patches and cleaned up somewhat.
    Reported-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
    Reported-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    8ebbf262
xfs_alloc.c 104 KB