• Darrick J. Wong's avatar
    xfs: fix intermittent hang during quotacheck · f0c2d7d2
    Darrick J. Wong authored
    Every now and then, I see the following hang during mount time
    quotacheck when running fstests.  Turning on KASAN seems to make it
    happen somewhat more frequently.  I've edited the backtrace for brevity.
    
    XFS (sdd): Quotacheck needed: Please wait.
    XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
    CPU: 0 PID: 1831409 Comm: mount Tainted: G        W         5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
    RIP: 0010:assfail+0x46/0x4a [xfs]
    Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24
    RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0
    RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a
    R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20
    R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000
    FS:  00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0
    Call Trace:
     <TASK>
     xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
     get_tree_bdev+0x175/0x280
     vfs_get_tree+0x1a/0x80
     path_mount+0x6f5/0xaa0
     __x64_sys_mount+0x103/0x140
     do_syscall_64+0x2b/0x80
     entry_SYSCALL_64_after_hwframe+0x46/0xb0
    
    I /think/ this can happen if xfs_qm_flush_one is racing with
    xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has
    taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the
    dquot buffer, let alone queued it to the delwri list.  In this case,
    flush_one will fail to get the dquot flush lock, but it can lock the
    incore buffer, but xfs_buf_delwri_pushbuf will then trip over this
    ASSERT, which checks that the buffer isn't on a delwri list.  The hang
    results because the _delwri_submit_buffers ignores non DELWRI_Q buffers,
    which means that xfs_buf_iowait waits forever for an IO that has not yet
    been scheduled.
    
    AFAICT, a reasonable solution here is to detect a dquot buffer that is
    not on a DELWRI list, drop it, and return -EAGAIN to try the flush
    again.  It's not /that/ big of a deal if quotacheck writes the dquot
    buffer repeatedly before we even set QUOTA_CHKD.
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    f0c2d7d2
xfs_qm.c 45.4 KB