• Filipe Manana's avatar
    Btrfs: fix race setting up and completing qgroup rescan workers · bacff03b
    Filipe Manana authored
    commit 13fc1d27 upstream.
    
    There is a race between setting up a qgroup rescan worker and completing
    a qgroup rescan worker that can lead to callers of the qgroup rescan wait
    ioctl to either not wait for the rescan worker to complete or to hang
    forever due to missing wake ups. The following diagram shows a sequence
    of steps that illustrates the race.
    
            CPU 1                                                         CPU 2                                  CPU 3
    
     btrfs_ioctl_quota_rescan()
      btrfs_qgroup_rescan()
       qgroup_rescan_init()
        mutex_lock(&fs_info->qgroup_rescan_lock)
        spin_lock(&fs_info->qgroup_lock)
    
        fs_info->qgroup_flags |=
          BTRFS_QGROUP_STATUS_FLAG_RESCAN
    
        init_completion(
          &fs_info->qgroup_rescan_completion)
    
        fs_info->qgroup_rescan_running = true
    
        mutex_unlock(&fs_info->qgroup_rescan_lock)
        spin_unlock(&fs_info->qgroup_lock)
    
        btrfs_init_work()
         --> starts the worker
    
                                                            btrfs_qgroup_rescan_worker()
                                                             mutex_lock(&fs_info->qgroup_rescan_lock)
    
                                                             fs_info->qgroup_flags &=
                                                               ~BTRFS_QGROUP_STATUS_FLAG_RESCAN
    
                                                             mutex_unlock(&fs_info->qgroup_rescan_lock)
    
                                                             starts transaction, updates qgroup status
                                                             item, etc
    
                                                                                                               btrfs_ioctl_quota_rescan()
                                                                                                                btrfs_qgroup_rescan()
                                                                                                                 qgroup_rescan_init()
                                                                                                                  mutex_lock(&fs_info->qgroup_rescan_lock)
                                                                                                                  spin_lock(&fs_info->qgroup_lock)
    
                                                                                                                  fs_info->qgroup_flags |=
                                                                                                                    BTRFS_QGROUP_STATUS_FLAG_RESCAN
    
                                                                                                                  init_completion(
                                                                                                                    &fs_info->qgroup_rescan_completion)
    
                                                                                                                  fs_info->qgroup_rescan_running = true
    
                                                                                                                  mutex_unlock(&fs_info->qgroup_rescan_lock)
                                                                                                                  spin_unlock(&fs_info->qgroup_lock)
    
                                                                                                                  btrfs_init_work()
                                                                                                                   --> starts another worker
    
                                                             mutex_lock(&fs_info->qgroup_rescan_lock)
    
                                                             fs_info->qgroup_rescan_running = false
    
                                                             mutex_unlock(&fs_info->qgroup_rescan_lock)
    
    							 complete_all(&fs_info->qgroup_rescan_completion)
    
    Before the rescan worker started by the task at CPU 3 completes, if
    another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
    because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
    fs_info->qgroup_flags, which is expected and correct behaviour.
    
    However if other task calls btrfs_ioctl_quota_rescan_wait() before the
    rescan worker started by the task at CPU 3 completes, it will return
    immediately without waiting for the new rescan worker to complete,
    because fs_info->qgroup_rescan_running is set to false by CPU 2.
    
    This race is making test case btrfs/171 (from fstests) to fail often:
    
      btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
    #      --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
    #      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
    #      @@ -1,2 +1,3 @@
    #       QA output created by 171
    #      +ERROR: quota rescan failed: Operation now in progress
    #       Silence is golden
    #      ...
    #      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)
    
    That is because the test calls the btrfs-progs commands "qgroup quota
    rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
    calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
    commands 'qgroup assign' and 'qgroup remove' often call the rescan start
    ioctl after calling the qgroup assign ioctl,
    btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
    for a rescan worker to complete.
    
    Another problem the race can cause is missing wake ups for waiters,
    since the call to complete_all() happens outside a critical section and
    after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
    diagram above, if we have a waiter for the first rescan task (executed
    by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
    if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
    before it calls complete_all() against
    fs_info->qgroup_rescan_completion, the task at CPU 3 calls
    init_completion() against fs_info->qgroup_rescan_completion which
    re-initilizes its wait queue to an empty queue, therefore causing the
    rescan worker at CPU 2 to call complete_all() against an empty queue,
    never waking up the task waiting for that rescan worker.
    
    Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
    fs_info->qgroup_rescan_running to false in the same critical section,
    delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
    call to complete_all() in that same critical section. This gives the
    protection needed to avoid rescan wait ioctl callers not waiting for a
    running rescan worker and the lost wake ups problem, since setting that
    rescan flag and boolean as well as initializing the wait queue is done
    already in a critical section delimited by that mutex (at
    qgroup_rescan_init()).
    
    Fixes: 57254b6e ("Btrfs: add ioctl to wait for qgroup rescan completion")
    Fixes: d2c609b8 ("btrfs: properly track when rescan worker is running")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    bacff03b
qgroup.c 86 KB