• Darrick J. Wong's avatar
    xfs: fix xfs_inodegc_stop racing with mod_delayed_work · 2254a739
    Darrick J. Wong authored
    syzbot reported this warning from the faux inodegc shrinker that tries
    to kick off inodegc work:
    
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
    RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444
    Call Trace:
     __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
     mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
     xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
     xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
     do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
     shrink_slab+0x175/0x660 mm/vmscan.c:1013
     shrink_one+0x502/0x810 mm/vmscan.c:5343
     shrink_many mm/vmscan.c:5394 [inline]
     lru_gen_shrink_node mm/vmscan.c:5511 [inline]
     shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
     kswapd_shrink_node mm/vmscan.c:7262 [inline]
     balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
     kswapd+0x677/0xd60 mm/vmscan.c:7712
     kthread+0x2e8/0x3a0 kernel/kthread.c:376
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
    
    This warning corresponds to this code in __queue_work:
    
    	/*
    	 * For a draining wq, only works from the same workqueue are
    	 * allowed. The __WQ_DESTROYING helps to spot the issue that
    	 * queues a new work item to a wq after destroy_workqueue(wq).
    	 */
    	if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
    		     WARN_ON_ONCE(!is_chained_work(wq))))
    		return;
    
    For this to trip, we must have a thread draining the inodedgc workqueue
    and a second thread trying to queue inodegc work to that workqueue.
    This can happen if freezing or a ro remount race with reclaim poking our
    faux inodegc shrinker and another thread dropping an unlinked O_RDONLY
    file:
    
    Thread 0	Thread 1	Thread 2
    
    xfs_inodegc_stop
    
    				xfs_inodegc_shrinker_scan
    				xfs_is_inodegc_enabled
    				<yes, will continue>
    
    xfs_clear_inodegc_enabled
    xfs_inodegc_queue_all
    <list empty, do not queue inodegc worker>
    
    		xfs_inodegc_queue
    		<add to list>
    		xfs_is_inodegc_enabled
    		<no, returns>
    
    drain_workqueue
    <set WQ_DRAINING>
    
    				llist_empty
    				<no, will queue list>
    				mod_delayed_work_on(..., 0)
    				__queue_work
    				<sees WQ_DRAINING, kaboom>
    
    In other words, everything between the access to inodegc_enabled state
    and the decision to poke the inodegc workqueue requires some kind of
    coordination to avoid the WQ_DRAINING state.  We could perhaps introduce
    a lock here, but we could also try to eliminate WQ_DRAINING from the
    picture.
    
    We could replace the drain_workqueue call with a loop that flushes the
    workqueue and queues workers as long as there is at least one inode
    present in the per-cpu inodegc llists.  We've disabled inodegc at this
    point, so we know that the number of queued inodes will eventually hit
    zero as long as xfs_inodegc_start cannot reactivate the workers.
    
    There are four callers of xfs_inodegc_start.  Three of them come from the
    VFS with s_umount held: filesystem thawing, failed filesystem freezing,
    and the rw remount transition.  The fourth caller is mounting rw (no
    remount or freezing possible).
    
    There are three callers ofs xfs_inodegc_stop.  One is unmounting (no
    remount or thaw possible).  Two of them come from the VFS with s_umount
    held: fs freezing and ro remount transition.
    
    Hence, it is correct to replace the drain_workqueue call with a loop
    that drains the inodegc llists.
    
    Fixes: 6191cf3a ("xfs: flush inodegc workqueue tasks before cancel")
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    2254a739
xfs_icache.c 56.6 KB