1. 27 May, 2020 3 commits
    • Dave Chinner's avatar
      xfs: separate read-only variables in struct xfs_mount · b0dff466
      Dave Chinner authored
      Seeing massive cpu usage from xfs_agino_range() on one machine;
      instruction level profiles look similar to another machine running
      the same workload, only one machine is consuming 10x as much CPU as
      the other and going much slower. The only real difference between
      the two machines is core count per socket. Both are running
      identical 16p/16GB virtual machine configurations
      
      Machine A:
      
        25.83%  [k] xfs_agino_range
        12.68%  [k] __xfs_dir3_data_check
         6.95%  [k] xfs_verify_ino
         6.78%  [k] xfs_dir2_data_entry_tag_p
         3.56%  [k] xfs_buf_find
         2.31%  [k] xfs_verify_dir_ino
         2.02%  [k] xfs_dabuf_map.constprop.0
         1.65%  [k] xfs_ag_block_count
      
      And takes around 13 minutes to remove 50 million inodes.
      
      Machine B:
      
        13.90%  [k] __pv_queued_spin_lock_slowpath
         3.76%  [k] do_raw_spin_lock
         2.83%  [k] xfs_dir3_leaf_check_int
         2.75%  [k] xfs_agino_range
         2.51%  [k] __raw_callee_save___pv_queued_spin_unlock
         2.18%  [k] __xfs_dir3_data_check
         2.02%  [k] xfs_log_commit_cil
      
      And takes around 5m30s to remove 50 million inodes.
      
      Suspect is cacheline contention on m_sectbb_log which is used in one
      of the macros in xfs_agino_range. This is a read-only variable but
      shares a cacheline with m_active_trans which is a global atomic that
      gets bounced all around the machine.
      
      The workload is trying to run hundreds of thousands of transactions
      per second and hence cacheline contention will be occurring on this
      atomic counter. Hence xfs_agino_range() is likely just be an
      innocent bystander as the cache coherency protocol fights over the
      cacheline between CPU cores and sockets.
      
      On machine A, this rearrangement of the struct xfs_mount
      results in the profile changing to:
      
         9.77%  [kernel]  [k] xfs_agino_range
         6.27%  [kernel]  [k] __xfs_dir3_data_check
         5.31%  [kernel]  [k] __pv_queued_spin_lock_slowpath
         4.54%  [kernel]  [k] xfs_buf_find
         3.79%  [kernel]  [k] do_raw_spin_lock
         3.39%  [kernel]  [k] xfs_verify_ino
         2.73%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
      
      Vastly less CPU usage in xfs_agino_range(), but still 3x the amount
      of machine B and still runs substantially slower than it should.
      
      Current rm -rf of 50 million files:
      
      		vanilla		patched
      machine A	13m20s		6m42s
      machine B	5m30s		5m02s
      
      It's an improvement, hence indicating that separation and further
      optimisation of read-only global filesystem data is worthwhile, but
      it clearly isn't the underlying issue causing this specific
      performance degradation.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      b0dff466
    • Dave Chinner's avatar
      xfs: reduce free inode accounting overhead · f18c9a90
      Dave Chinner authored
      Shaokun Zhang reported that XFS was using substantial CPU time in
      percpu_count_sum() when running a single threaded benchmark on
      a high CPU count (128p) machine from xfs_mod_ifree(). The issue
      is that the filesystem is empty when the benchmark runs, so inode
      allocation is running with a very low inode free count.
      
      With the percpu counter batching, this means comparisons when the
      counter is less that 128 * 256 = 32768 use the slow path of adding
      up all the counters across the CPUs, and this is expensive on high
      CPU count machines.
      
      The summing in xfs_mod_ifree() is only used to fire an assert if an
      underrun occurs. The error is ignored by the higher level code.
      Hence this is really just debug code and we don't need to run it
      on production kernels, nor do we need such debug checks to return
      error values just to trigger an assert.
      
      Finally, xfs_mod_icount/xfs_mod_ifree are only called from
      xfs_trans_unreserve_and_mod_sb(), so get rid of them and just
      directly call the percpu_counter_add/percpu_counter_compare
      functions. The compare functions are now run only on debug builds as
      they are internal to ASSERT() checks and so only compiled in when
      ASSERTs are active (CONFIG_XFS_DEBUG=y or CONFIG_XFS_WARN=y).
      Reported-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      f18c9a90
    • Dave Chinner's avatar
      xfs: gut error handling in xfs_trans_unreserve_and_mod_sb() · dc3ffbb1
      Dave Chinner authored
      xfs: gut error handling in xfs_trans_unreserve_and_mod_sb()
      
      From: Dave Chinner <dchinner@redhat.com>
      
      The error handling in xfs_trans_unreserve_and_mod_sb() is largely
      incorrect - rolling back the changes in the transaction if only one
      counter underruns makes all the other counters incorrect. We still
      allow the change to proceed and committing the transaction, except
      now we have multiple incorrect counters instead of a single
      underflow.
      
      Further, we don't actually report the error to the caller, so this
      is completely silent except on debug kernels that will assert on
      failure before we even get to the rollback code.  Hence this error
      handling is broken, untested, and largely unnecessary complexity.
      
      Just remove it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      dc3ffbb1
  2. 19 May, 2020 22 commits
  3. 13 May, 2020 3 commits
  4. 08 May, 2020 12 commits