1. 24 Sep, 2019 4 commits
    • Filipe Manana's avatar
      Btrfs: fix race setting up and completing qgroup rescan workers · 13fc1d27
      Filipe Manana authored
      There is a race between setting up a qgroup rescan worker and completing
      a qgroup rescan worker that can lead to callers of the qgroup rescan wait
      ioctl to either not wait for the rescan worker to complete or to hang
      forever due to missing wake ups. The following diagram shows a sequence
      of steps that illustrates the race.
      
              CPU 1                                                         CPU 2                                  CPU 3
      
       btrfs_ioctl_quota_rescan()
        btrfs_qgroup_rescan()
         qgroup_rescan_init()
          mutex_lock(&fs_info->qgroup_rescan_lock)
          spin_lock(&fs_info->qgroup_lock)
      
          fs_info->qgroup_flags |=
            BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
          init_completion(
            &fs_info->qgroup_rescan_completion)
      
          fs_info->qgroup_rescan_running = true
      
          mutex_unlock(&fs_info->qgroup_rescan_lock)
          spin_unlock(&fs_info->qgroup_lock)
      
          btrfs_init_work()
           --> starts the worker
      
                                                              btrfs_qgroup_rescan_worker()
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_flags &=
                                                                 ~BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
                                                               starts transaction, updates qgroup status
                                                               item, etc
      
                                                                                                                 btrfs_ioctl_quota_rescan()
                                                                                                                  btrfs_qgroup_rescan()
                                                                                                                   qgroup_rescan_init()
                                                                                                                    mutex_lock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_lock(&fs_info->qgroup_lock)
      
                                                                                                                    fs_info->qgroup_flags |=
                                                                                                                      BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                                                                                    init_completion(
                                                                                                                      &fs_info->qgroup_rescan_completion)
      
                                                                                                                    fs_info->qgroup_rescan_running = true
      
                                                                                                                    mutex_unlock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_unlock(&fs_info->qgroup_lock)
      
                                                                                                                    btrfs_init_work()
                                                                                                                     --> starts another worker
      
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_rescan_running = false
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
      							 complete_all(&fs_info->qgroup_rescan_completion)
      
      Before the rescan worker started by the task at CPU 3 completes, if
      another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
      because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
      fs_info->qgroup_flags, which is expected and correct behaviour.
      
      However if other task calls btrfs_ioctl_quota_rescan_wait() before the
      rescan worker started by the task at CPU 3 completes, it will return
      immediately without waiting for the new rescan worker to complete,
      because fs_info->qgroup_rescan_running is set to false by CPU 2.
      
      This race is making test case btrfs/171 (from fstests) to fail often:
      
        btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
            --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
            +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
            @@ -1,2 +1,3 @@
             QA output created by 171
            +ERROR: quota rescan failed: Operation now in progress
             Silence is golden
            ...
            (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)
      
      That is because the test calls the btrfs-progs commands "qgroup quota
      rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
      calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
      commands 'qgroup assign' and 'qgroup remove' often call the rescan start
      ioctl after calling the qgroup assign ioctl,
      btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
      for a rescan worker to complete.
      
      Another problem the race can cause is missing wake ups for waiters,
      since the call to complete_all() happens outside a critical section and
      after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
      diagram above, if we have a waiter for the first rescan task (executed
      by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
      if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
      before it calls complete_all() against
      fs_info->qgroup_rescan_completion, the task at CPU 3 calls
      init_completion() against fs_info->qgroup_rescan_completion which
      re-initilizes its wait queue to an empty queue, therefore causing the
      rescan worker at CPU 2 to call complete_all() against an empty queue,
      never waking up the task waiting for that rescan worker.
      
      Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
      fs_info->qgroup_rescan_running to false in the same critical section,
      delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
      call to complete_all() in that same critical section. This gives the
      protection needed to avoid rescan wait ioctl callers not waiting for a
      running rescan worker and the lost wake ups problem, since setting that
      rescan flag and boolean as well as initializing the wait queue is done
      already in a critical section delimited by that mutex (at
      qgroup_rescan_init()).
      
      Fixes: 57254b6e ("Btrfs: add ioctl to wait for qgroup rescan completion")
      Fixes: d2c609b8 ("btrfs: properly track when rescan worker is running")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      13fc1d27
    • Filipe Manana's avatar
      Btrfs: fix missing error return if writeback for extent buffer never started · 0607eb1d
      Filipe Manana authored
      If lock_extent_buffer_for_io() fails, it returns a negative value, but its
      caller btree_write_cache_pages() ignores such error. This means that a
      call to flush_write_bio(), from lock_extent_buffer_for_io(), might have
      failed. We should make btree_write_cache_pages() notice such error values
      and stop immediatelly, making sure filemap_fdatawrite_range() returns an
      error to the transaction commit path. A failure from flush_write_bio()
      should also result in the endio callback end_bio_extent_buffer_writepage()
      being invoked, which sets the BTRFS_FS_*_ERR bits appropriately, so that
      there's no risk a transaction or log commit doesn't catch a writeback
      failure.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0607eb1d
    • Dennis Zhou's avatar
      btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer · eb5b64f1
      Dennis Zhou authored
      Before, if a eb failed to write out, we would end up triggering a
      BUG_ON(). As of f4340622 ("btrfs: extent_io: Move the BUG_ON() in
      flush_write_bio() one level up"), we no longer BUG_ON(), so we should
      make life consistent and add back the unwritten bytes to
      dirty_metadata_bytes.
      
      Fixes: f4340622 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
      CC: stable@vger.kernel.org # 5.2+
      Reviewed-by: default avatarFilipe Manana <fdmanana@kernel.org>
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      eb5b64f1
    • Filipe Manana's avatar
      Btrfs: fix selftests failure due to uninitialized i_mode in test inodes · 9f7fec0b
      Filipe Manana authored
      Some of the self tests create a test inode, setup some extents and then do
      calls to btrfs_get_extent() to test that the corresponding extent maps
      exist and are correct. However btrfs_get_extent(), since the 5.2 merge
      window, now errors out when it finds a regular or prealloc extent for an
      inode that does not correspond to a regular file (its ->i_mode is not
      S_IFREG). This causes the self tests to fail sometimes, specially when
      KASAN, slub_debug and page poisoning are enabled:
      
        $ modprobe btrfs
        modprobe: ERROR: could not insert 'btrfs': Invalid argument
      
        $ dmesg
        [ 9414.691648] Btrfs loaded, crc32c=crc32c-intel, debug=on, assert=on, integrity-checker=on, ref-verify=on
        [ 9414.692655] BTRFS: selftest: sectorsize: 4096  nodesize: 4096
        [ 9414.692658] BTRFS: selftest: running btrfs free space cache tests
        [ 9414.692918] BTRFS: selftest: running extent only tests
        [ 9414.693061] BTRFS: selftest: running bitmap only tests
        [ 9414.693366] BTRFS: selftest: running bitmap and extent tests
        [ 9414.696455] BTRFS: selftest: running space stealing from bitmap to extent tests
        [ 9414.697131] BTRFS: selftest: running extent buffer operation tests
        [ 9414.697133] BTRFS: selftest: running btrfs_split_item tests
        [ 9414.697564] BTRFS: selftest: running extent I/O tests
        [ 9414.697583] BTRFS: selftest: running find delalloc tests
        [ 9415.081125] BTRFS: selftest: running find_first_clear_extent_bit test
        [ 9415.081278] BTRFS: selftest: running extent buffer bitmap tests
        [ 9415.124192] BTRFS: selftest: running inode tests
        [ 9415.124195] BTRFS: selftest: running btrfs_get_extent tests
        [ 9415.127909] BTRFS: selftest: running hole first btrfs_get_extent test
        [ 9415.128343] BTRFS critical (device (efault)): regular/prealloc extent found for non-regular inode 256
        [ 9415.131428] BTRFS: selftest: fs/btrfs/tests/inode-tests.c:904 expected a real extent, got 0
      
      This happens because the test inodes are created without ever initializing
      the i_mode field of the inode, and neither VFS's new_inode() nor the btrfs
      callback btrfs_alloc_inode() initialize the i_mode. Initialization of the
      i_mode is done through the various callbacks used by the VFS to create
      new inodes (regular files, directories, symlinks, tmpfiles, etc), which
      all call btrfs_new_inode() which in turn calls inode_init_owner(), which
      sets the inode's i_mode. Since the tests only uses new_inode() to create
      the test inodes, the i_mode was never initialized.
      
      This always happens on a VM I used with kasan, slub_debug and many other
      debug facilities enabled. It also happened to someone who reported this
      on bugzilla (on a 5.3-rc).
      
      Fix this by setting i_mode to S_IFREG at btrfs_new_test_inode().
      
      Fixes: 6bf9e4bd ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204397Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9f7fec0b
  2. 09 Sep, 2019 36 commits