1. 18 Apr, 2008 10 commits
    • David Chinner's avatar
      [XFS] Use atomics for iclog reference counting · 155cc6b7
      David Chinner authored
      Now that we update the log tail LSN less frequently on transaction
      completion, we pass the contention straight to the global log state lock
      (l_iclog_lock) during transaction completion.
      
      We currently have to take this lock to decrement the iclog reference
      count. there is a reference count on each iclog, so we need to take þhe
      global lock for all refcount changes.
      
      When large numbers of processes are all doing small trnasctions, the iclog
      reference counts will be quite high, and the state change that absolutely
      requires the l_iclog_lock is the except rather than the norm.
      
      Change the reference counting on the iclogs to use atomic_inc/dec so that
      we can use atomic_dec_and_lock during transaction completion and avoid the
      need for grabbing the l_iclog_lock for every reference count decrement
      except the one that matters - the last.
      
      SGI-PV: 975671
      SGI-Modid: xfs-linux-melb:xfs-kern:30505a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      155cc6b7
    • David Chinner's avatar
      [XFS] Prevent AIL lock contention during transaction completion · b589334c
      David Chinner authored
      When hundreds of processors attempt to commit transactions at the same
      time, they can contend on the AIL lock when updating the tail LSN held in
      the in-core log structure.
      
      At the moment, the tail LSN is only needed when actually writing out an
      iclog, so it really does not need to be updated on every single
      transaction completion - only those that result in switching iclogs and
      flushing them to disk.
      
      The result is that we reduce the number of times we need to grab the AIL
      lock and the log grant lock by up to two orders of magnitude on large
      processor count machines. The problem has previously been hidden by AIL
      lock contention walking the AIL list which was recently solved and
      uncovered this issue.
      
      SGI-PV: 975671
      SGI-Modid: xfs-linux-melb:xfs-kern:30504a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      b589334c
    • David Chinner's avatar
      [XFS] Use xfs_inode_clean() in more places · 33540408
      David Chinner authored
      Remove open coded checks for the whether the inode is clean and replace
      them with an inlined function.
      
      SGI-PV: 977461
      SGI-Modid: xfs-linux-melb:xfs-kern:30503a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      33540408
    • David Chinner's avatar
      [XFS] Remove the xfs_icluster structure · bad55843
      David Chinner authored
      Remove the xfs_icluster structure and replace with a radix tree lookup.
      
      We don't need to keep a list of inodes in each cluster around anymore as
      we can look them up quickly when we need to. The only time we need to do
      this now is during inode writeback.
      
      Factor the inode cluster writeback code out of xfs_iflush and convert it
      to use radix_tree_gang_lookup() instead of walking a list of inodes built
      when we first read in the inodes.
      
      This remove 3 pointers from each xfs_inode structure and the xfs_icluster
      structure per inode cluster. Hence we reduce the cache footprint of the
      xfs_inodes by between 5-10% depending on cluster sparseness.
      
      To be truly efficient we need a radix_tree_gang_lookup_range() call to
      stop searching once we are past the end of the cluster instead of trying
      to find a full cluster's worth of inodes.
      
      Before (ia64):
      
      $ cat /sys/slab/xfs_inode/object_size 536
      
      After:
      
      $ cat /sys/slab/xfs_inode/object_size 512
      
      SGI-PV: 977460
      SGI-Modid: xfs-linux-melb:xfs-kern:30502a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      bad55843
    • David Chinner's avatar
      [XFS] Don't block pdflush when writing back inodes · a3f74ffb
      David Chinner authored
      When pdflush is writing back inodes, it can get stuck on inode cluster
      buffers that are currently under I/O. This occurs when we write data to
      multiple inodes in the same inode cluster at the same time.
      
      Effectively, delayed allocation marks the inode dirty during the data
      writeback. Hence if the inode cluster was flushed during the writeback of
      the first inode, the writeback of the second inode will block waiting for
      the inode cluster write to complete before writing it again for the newly
      dirtied inode.
      
      Basically, we want to avoid this from happening so we don't block pdflush
      and slow down all of writeback. Hence we introduce a non-blocking async
      inode flush flag that pdflush uses. If this flag is set, we use
      non-blocking operations (e.g. try locks) whereever we can to avoid
      blocking or extra I/O being issued.
      
      SGI-PV: 970925
      SGI-Modid: xfs-linux-melb:xfs-kern:30501a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      a3f74ffb
    • David Chinner's avatar
      [XFS] Factor xfs_itobp() and xfs_inotobp(). · 4ae29b43
      David Chinner authored
      The only difference between the functions is one passes an inode for the
      lookup, the other passes an inode number. However, they don't do the same
      validity checking or set all the same state on the buffer that is returned
      yet they should.
      
      Factor the functions into a common implementation.
      
      SGI-PV: 970925
      SGI-Modid: xfs-linux-melb:xfs-kern:30500a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      4ae29b43
    • Lachlan McIlroy's avatar
      [XFS] Fix regression due to refcache removal · e9a56b7c
      Lachlan McIlroy authored
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30490a
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: default avatarDonald Douwsma <donaldd@sgi.com>
      e9a56b7c
    • Donald Douwsma's avatar
      [XFS] Remove the xfs_refcache · 163d3686
      Donald Douwsma authored
      Remove the xfs_refcache, it was only needed while we were still
      building for 2.4 kernels.
      
      SGI-PV: 971186
      SGI-Modid: xfs-linux-melb:xfs-kern:30472a
      Signed-off-by: default avatarDonald Douwsma <donaldd@sgi.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      163d3686
    • Lachlan McIlroy's avatar
      [XFS] make inode reclaim synchronise with xfs_iflush_done() · 461aa8a2
      Lachlan McIlroy authored
      On a forced shutdown, xfs_finish_reclaim() will skip flushing the inode.
      If the inode flush lock is not already held and there is an outstanding
      xfs_iflush_done() then we might free the inode prematurely. By acquiring
      and releasing the flush lock we will synchronise with xfs_iflush_done().
      
      SGI-PV: 909874
      SGI-Modid: xfs-linux-melb:xfs-kern:30468a
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      461aa8a2
    • Niv Sardi's avatar
      [XFS] actually check error returned by xfs_flush_pages, clean up and · e12070a5
      Niv Sardi authored
      bailout if fails.
      
      SGI-PV: 973041
      SGI-Modid: xfs-linux-melb:xfs-kern:30462a
      Signed-off-by: default avatarNiv Sardi <xaiki@sgi.com>
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      e12070a5
  2. 17 Apr, 2008 2 commits
  3. 16 Apr, 2008 28 commits