• Dave Chinner's avatar
    xfs: lockless buffer lookup · 298f3422
    Dave Chinner authored
    Now that we have a standalone fast path for buffer lookup, we can
    easily convert it to use rcu lookups. When we continually hammer the
    buffer cache with trylock lookups, we end up with a huge amount of
    lock contention on the per-ag buffer hash locks:
    
    -   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
       - 92.67% xfs_inodegc_worker
          - 92.13% xfs_inode_unlink
             - 91.52% xfs_inactive_ifree
                - 85.63% xfs_read_agi
                   - 85.61% xfs_trans_read_buf_map
                      - 85.59% xfs_buf_read_map
                         - xfs_buf_get_map
                            - 85.55% xfs_buf_find
                               - 72.87% _raw_spin_lock
                                  - do_raw_spin_lock
                                       71.86% __pv_queued_spin_lock_slowpath
                               - 8.74% xfs_buf_rele
                                  - 7.88% _raw_spin_lock
                                     - 7.88% do_raw_spin_lock
                                          7.63% __pv_queued_spin_lock_slowpath
                               - 1.70% xfs_buf_trylock
                                  - 1.68% down_trylock
                                     - 1.41% _raw_spin_lock_irqsave
                                        - 1.39% do_raw_spin_lock
                                             __pv_queued_spin_lock_slowpath
                               - 0.76% _raw_spin_unlock
                                    0.75% do_raw_spin_unlock
    
    This is basically hammering the pag->pag_buf_lock from lots of CPUs
    doing trylocks at the same time. Most of the buffer trylock
    operations ultimately fail after we've done the lookup, so we're
    really hammering the buf hash lock whilst making no progress.
    
    We can also see significant spinlock traffic on the same lock just
    under normal operation when lots of tasks are accessing metadata
    from the same AG, so let's avoid all this by converting the lookup
    fast path to leverages the rhashtable's ability to do rcu protected
    lookups.
    
    We avoid races with the buffer release path by using
    atomic_inc_not_zero() on the buffer hold count. Any buffer that is
    in the LRU will have a non-zero count, thereby allowing the lockless
    fast path to be taken in most cache hit situations. If the buffer
    hold count is zero, then it is likely going through the release path
    so in that case we fall back to the existing lookup miss slow path.
    
    The slow path will then do an atomic lookup and insert under the
    buffer hash lock and hence serialise correctly against buffer
    release freeing the buffer.
    
    The use of rcu protected lookups means that buffer handles now need
    to be freed by RCU callbacks (same as inodes). We still free the
    buffer pages before the RCU callback - we won't be trying to access
    them at all on a buffer that has zero references - but we need the
    buffer handle itself to be present for the entire rcu protected read
    side to detect a zero hold count correctly.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    298f3422
xfs_buf.c 58 KB