1. 29 Nov, 2012 3 commits
    • Dave Chinner's avatar
      xfs: fix stray dquot unlock when reclaiming dquots · b870553c
      Dave Chinner authored
      When we fail to get a dquot lock during reclaim, we jump to an error
      handler that unlocks the dquot. This is wrong as we didn't lock the
      dquot, and unlocking it means who-ever is holding the lock has had
      it silently taken away, and hence it results in a lock imbalance.
      
      Found by inspection while modifying the code for the numa-lru
      patchset. This fixes a random hang I've been seeing on xfstest 232
      for the past several months.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      b870553c
    • Dave Chinner's avatar
      xfs: fix direct IO nested transaction deadlock. · 437a255a
      Dave Chinner authored
      The direct IO path can do a nested transaction reservation when
      writing past the EOF. The first transaction is the append
      transaction for setting the filesize at IO completion, but we can
      also need a transaction for allocation of blocks. If the log is low
      on space due to reservations and small log, the append transaction
      can be granted after wating for space as the only active transaction
      in the system. This then attempts a reservation for an allocation,
      which there isn't space in the log for, and the reservation sleeps.
      The result is that there is nothing left in the system to wake up
      all the processes waiting for log space to come free.
      
      The stack trace that shows this deadlock is relatively innocuous:
      
       xlog_grant_head_wait
       xlog_grant_head_check
       xfs_log_reserve
       xfs_trans_reserve
       xfs_iomap_write_direct
       __xfs_get_blocks
       xfs_get_blocks_direct
       do_blockdev_direct_IO
       __blockdev_direct_IO
       xfs_vm_direct_IO
       generic_file_direct_write
       xfs_file_dio_aio_writ
       xfs_file_aio_write
       do_sync_write
       vfs_write
      
      This was discovered on a filesystem with a log of only 10MB, and a
      log stripe unit of 256k whih increased the base reservations by
      512k. Hence a allocation transaction requires 1.2MB of log space to
      be available instead of only 260k, and so greatly increased the
      chance that there wouldn't be enough log space available for the
      nested transaction to succeed. The key to reproducing it is this
      mkfs command:
      
      mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV
      
      The test case was a 1000 fsstress processes running with random
      freeze and unfreezes every few seconds. Thanks to Eryu Guan
      (eguan@redhat.com) for writing the test that found this on a system
      with a somewhat unique default configuration....
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAndrew Dahl <adahl@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      437a255a
    • Dave Chinner's avatar
      xfs: byte range granularity for XFS_IOC_ZERO_RANGE · ef9d8733
      Dave Chinner authored
      XFS_IOC_ZERO_RANGE simply does not work properly for non page cache
      aligned ranges. Neither test 242 or 290 exercise this correctly, so
      the behaviour is completely busted even though the tests pass.
      
      Fix it to support full byte range granularity as was originally
      intended for this ioctl.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ef9d8733
  2. 26 Nov, 2012 2 commits
  3. 20 Nov, 2012 2 commits
    • Christoph Hellwig's avatar
      xfs: add CRC checks to the log · 0e446be4
      Christoph Hellwig authored
      Implement CRCs for the log buffers.  We re-use a field in
      struct xlog_rec_header that was used for a weak checksum of the
      log buffer payload in debug builds before.
      
      The new checksumming uses the crc32c checksum we will use elsewhere
      in XFS, and also protects the record header and addition cycle data.
      
      Due to this there are some interesting changes in xlog_sync, as we
      need to do the cycle wrapping for the split buffer case much earlier,
      as we would touch the buffer after generating the checksum otherwise.
      
      The CRC calculation is always enabled, even for non-CRC filesystems,
      as adding this CRC does not change the log format. On non-CRC
      filesystems, only issue an alert if a CRC mismatch is found and
      allow recovery to continue - this will act as an indicator that
      log recovery problems are a result of log corruption. On CRC enabled
      filesystems, however, log recovery will fail.
      
      Note that existing debug kernels will write a simple checksum value
      to the log, so the first time this is run on a filesystem taht was
      last used on a debug kernel it will through CRC mismatch warning
      errors. These can be ignored.
      
      Initially based on a patch from Dave Chinner, then modified
      significantly by Christoph Hellwig.  Modified again by Dave Chinner
      to get to this version.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      0e446be4
    • Christoph Hellwig's avatar
      xfs: add CRC infrastructure · bc02e869
      Christoph Hellwig authored
       - add a mount feature bit for CRC enabled filesystems
       - add some helpers for generating and verifying the CRCs
       - add a copy_uuid helper
      
      The checksumming helpers are loosely based on similar ones in sctp,
      all other bits come from Dave Chinner.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      bc02e869
  4. 16 Nov, 2012 22 commits
  5. 14 Nov, 2012 5 commits
  6. 13 Nov, 2012 6 commits
    • Dave Chinner's avatar
      xfs: make growfs initialise the AGFL header · de497688
      Dave Chinner authored
      For verification purposes, AGFLs need to be initialised to a known
      set of values. For upcoming CRC changes, they are also headers that
      need to be initialised. Currently, growfs does neither for the AGFLs
      - it ignores them completely. Add initialisation of the AGFL to be
      full of invalid block numbers (NULLAGBLOCK) to put the
      infrastructure in place needed for CRC support.
      
      Includes a comment clarification from Jeff Liu.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      de497688
    • Dave Chinner's avatar
      xfs: growfs: use uncached buffers for new headers · fd23683c
      Dave Chinner authored
      When writing the new AG headers to disk, we can't attach write
      verifiers because they have a dependency on the struct xfs-perag
      being attached to the buffer to be fully initialised and growfs
      can't fully initialise them until later in the process.
      
      The simplest way to avoid this problem is to use uncached buffers
      for writing the new headers. These buffers don't have the xfs-perag
      attached to them, so it's simple to detect in the write verifier and
      be able to skip the checks that need the xfs-perag.
      
      This enables us to attach the appropriate buffer ops to the buffer
      and hence calculate CRCs on the way to disk. IT also means that the
      buffer is torn down immediately, and so the first access to the AG
      headers will re-read the header from disk and perform full
      verification of the buffer. This way we also can catch corruptions
      due to problems that went undetected in growfs.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      fd23683c
    • Dave Chinner's avatar
      xfs: use btree block initialisation functions in growfs · b64f3a39
      Dave Chinner authored
      Factor xfs_btree_init_block() to be independent of the btree cursor,
      and use the function to initialise btree blocks in the growfs code.
      This makes adding support for different format btree blocks simple.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by Rich Johnston <rjohnston@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      b64f3a39
    • Dave Chinner's avatar
      xfs: add more attribute tree trace points. · ee73259b
      Dave Chinner authored
      Added when debugging recent attribute tree problems to more finely
      trace code execution through the maze of twisty passages that makes
      up the attr code.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ee73259b
    • Dave Chinner's avatar
      xfs: drop buffer io reference when a bad bio is built · 37eb17e6
      Dave Chinner authored
      Error handling in xfs_buf_ioapply_map() does not handle IO reference
      counts correctly. We increment the b_io_remaining count before
      building the bio, but then fail to decrement it in the failure case.
      This leads to the buffer never running IO completion and releasing
      the reference that the IO holds, so at unmount we can leak the
      buffer. This leak is captured by this assert failure during unmount:
      
      XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273
      
      This is not a new bug - the b_io_remaining accounting has had this
      problem for a long, long time - it's just very hard to get a
      zero length bio being built by this code...
      
      Further, the buffer IO error can be overwritten on a multi-segment
      buffer by subsequent bio completions for partial sections of the
      buffer. Hence we should only set the buffer error status if the
      buffer is not already carrying an error status. This ensures that a
      partial IO error on a multi-segment buffer will not be lost. This
      part of the problem is a regression, however.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      37eb17e6
    • Dave Chinner's avatar
      xfs: fix broken error handling in xfs_vm_writepage · 7bf7f352
      Dave Chinner authored
      When we shut down the filesystem, it might first be detected in
      writeback when we are allocating a inode size transaction. This
      happens after we have moved all the pages into the writeback state
      and unlocked them. Unfortunately, if we fail to set up the
      transaction we then abort writeback and try to invalidate the
      current page. This then triggers are BUG() in block_invalidatepage()
      because we are trying to invalidate an unlocked page.
      
      Fixing this is a bit of a chicken and egg problem - we can't
      allocate the transaction until we've clustered all the pages into
      the IO and we know the size of it (i.e. whether the last block of
      the IO is beyond the current EOF or not). However, we don't want to
      hold pages locked for long periods of time, especially while we lock
      other pages to cluster them into the write.
      
      To fix this, we need to make a clear delineation in writeback where
      errors can only be handled by IO completion processing. That is,
      once we have marked a page for writeback and unlocked it, we have to
      report errors via IO completion because we've already started the
      IO. We may not have submitted any IO, but we've changed the page
      state to indicate that it is under IO so we must now use the IO
      completion path to report errors.
      
      To do this, add an error field to xfs_submit_ioend() to pass it the
      error that occurred during the building on the ioend chain. When
      this is non-zero, mark each ioend with the error and call
      xfs_finish_ioend() directly rather than building bios. This will
      immediately push the ioends through completion processing with the
      error that has occurred.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      7bf7f352