• Filipe Manana's avatar
    btrfs: check if a log root exists before locking the log_mutex on unlink · e7a79811
    Filipe Manana authored
    This brings back an optimization that commit e678934c ("btrfs:
    Remove unnecessary check from join_running_log_trans") removed, but in
    a different form. So it's almost equivalent to a revert.
    
    That commit removed an optimization where we avoid locking a root's
    log_mutex when there is no log tree created in the current transaction.
    The affected code path is triggered through unlink operations.
    
    That commit was based on the assumption that the optimization was not
    necessary because we used to have the following checks when the patch
    was authored:
    
      int btrfs_del_dir_entries_in_log(...)
      {
            (...)
            if (dir->logged_trans < trans->transid)
                return 0;
    
            ret = join_running_log_trans(root);
            (...)
       }
    
       int btrfs_del_inode_ref_in_log(...)
       {
            (...)
            if (inode->logged_trans < trans->transid)
                return 0;
    
            ret = join_running_log_trans(root);
            (...)
       }
    
    However before that patch was merged, another patch was merged first which
    replaced those checks because they were buggy.
    
    That other patch corresponds to commit 803f0f64 ("Btrfs: fix fsync
    not persisting dentry deletions due to inode evictions"). The assumption
    that if the logged_trans field of an inode had a smaller value then the
    current transaction's generation (transid) meant that the inode was not
    logged in the current transaction was only correct if the inode was not
    evicted and reloaded in the current transaction. So the corresponding bug
    fix changed those checks and replaced them with the following helper
    function:
    
      static bool inode_logged(struct btrfs_trans_handle *trans,
                               struct btrfs_inode *inode)
      {
            if (inode->logged_trans == trans->transid)
                    return true;
    
            if (inode->last_trans == trans->transid &&
                test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags) &&
                !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
                    return true;
    
            return false;
      }
    
    So if we have a subvolume without a log tree in the current transaction
    (because we had no fsyncs), every time we unlink an inode we can end up
    trying to lock the log_mutex of the root through join_running_log_trans()
    twice, once for the inode being unlinked (by btrfs_del_inode_ref_in_log())
    and once for the parent directory (with btrfs_del_dir_entries_in_log()).
    
    This means if we have several unlink operations happening in parallel for
    inodes in the same subvolume, and the those inodes and/or their parent
    inode were changed in the current transaction, we end up having a lot of
    contention on the log_mutex.
    
    The test robots from intel reported a -30.7% performance regression for
    a REAIM test after commit e678934c ("btrfs: Remove unnecessary check
    from join_running_log_trans").
    
    So just bring back the optimization to join_running_log_trans() where we
    check first if a log root exists before trying to lock the log_mutex. This
    is done by checking for a bit that is set on the root when a log tree is
    created and removed when a log tree is freed (at transaction commit time).
    
    Commit e678934c ("btrfs: Remove unnecessary check from
    join_running_log_trans") was merged in the 5.4 merge window while commit
    803f0f64 ("Btrfs: fix fsync not persisting dentry deletions due to
    inode evictions") was merged in the 5.3 merge window. But the first
    commit was actually authored before the second commit (May 23 2019 vs
    June 19 2019).
    Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
    Link: https://lore.kernel.org/lkml/20200611090233.GL12456@shao2-debian/
    Fixes: e678934c ("btrfs: Remove unnecessary check from join_running_log_trans")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    e7a79811
ctree.h 117 KB