1. 04 Sep, 2014 4 commits
  2. 02 Sep, 2014 6 commits
    • Zheng Liu's avatar
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu authored
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • Zheng Liu's avatar
      ext4: improve extents status tree trace point · e963bb1d
      Zheng Liu authored
      This commit improves the trace point of extents status tree.  We rename
      trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
      in ext4_es_scan() and we can not identify them from the result.
      
      Further this commit fixes a variable name in trace point in order to
      keep consistency with others.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e963bb1d
    • Seunghun Lee's avatar
      ext4: fix comments about get_blocks · d91bd2c1
      Seunghun Lee authored
      get_blocks is renamed to get_block.
      Signed-off-by: default avatarSeunghun Lee <waydi1@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      d91bd2c1
    • Darrick J. Wong's avatar
      ext4: enable block_validity by default · 45f1a9c3
      Darrick J. Wong authored
      Enable by default the block_validity feature, which checks for
      collisions between newly allocated blocks and critical system
      metadata.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      45f1a9c3
    • Theodore Ts'o's avatar
      jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() · 88fe1acb
      Theodore Ts'o authored
      __wait_cp_io() is only called by jbd2_log_do_checkpoint().  Fold it in
      to make it a bit easier to understand.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      88fe1acb
    • Theodore Ts'o's avatar
      jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() · be1158cc
      Theodore Ts'o authored
      __process_buffer() is only called by jbd2_log_do_checkpoint(), and it
      had a very complex locking protocol where it would be called with the
      j_list_lock, and sometimes exit with the lock held (if the return code
      was 0), or release the lock.
      
      This was confusing both to humans and to smatch (which erronously
      complained that the lock was taken twice).
      
      Folding __process_buffer() to the caller allows us to simplify the
      control flow, making the resulting function easier to read and reason
      about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150
      bytes (over 4% of the text size).
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      be1158cc
  3. 01 Sep, 2014 12 commits
    • Theodore Ts'o's avatar
      ext4: rename ext4_ext_find_extent() to ext4_find_extent() · ed8a1a76
      Theodore Ts'o authored
      Make the function name less redundant.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ed8a1a76
    • Theodore Ts'o's avatar
      ext4: reuse path object in ext4_move_extents() · 3bdf14b4
      Theodore Ts'o authored
      Reuse the path object in ext4_move_extents() so we don't unnecessarily
      free and reallocate it.
      
      Also clean up the get_ext_path() wrapper so that it has the same
      semantics of freeing the path object on error as ext4_ext_find_extent().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      3bdf14b4
    • Theodore Ts'o's avatar
      ext4: reuse path object in ext4_ext_shift_extents() · ee4bd0d9
      Theodore Ts'o authored
      Now that the semantics of ext4_ext_find_extent() are much cleaner,
      it's safe and more efficient to reuse the path object across the
      multiple calls to ext4_ext_find_extent() in ext4_ext_shift_extents().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ee4bd0d9
    • Theodore Ts'o's avatar
      ext4: teach ext4_ext_find_extent() to realloc path if necessary · 10809df8
      Theodore Ts'o authored
      This adds additional safety in case for some reason we end reusing a
      path structure which isn't big enough for current depth of the inode.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      10809df8
    • Theodore Ts'o's avatar
      ext4: allow a NULL argument to ext4_ext_drop_refs() · b7ea89ad
      Theodore Ts'o authored
      Teach ext4_ext_drop_refs() to accept a NULL argument, much like
      kfree().  This allows us to drop a lot of checks to make sure path is
      non-NULL before calling ext4_ext_drop_refs().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b7ea89ad
    • Theodore Ts'o's avatar
      ext4: call ext4_ext_drop_refs() from ext4_ext_find_extent() · 523f431c
      Theodore Ts'o authored
      In nearly all of the calls to ext4_ext_find_extent() where the caller
      is trying to recycle the path object, ext4_ext_drop_refs() gets called
      to release the buffer heads before the path object gets overwritten.
      To simplify things for the callers, and to avoid the possibility of a
      memory leak, make ext4_ext_find_extent() responsible for dropping the
      buffers.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      523f431c
    • Theodore Ts'o's avatar
      ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code · dfe50809
      Theodore Ts'o authored
      Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(),
      ext4_split_extent(), ext4_convert_unwritten_extents_endio().
      
      This requires fixing all of their callers to potentially
      ext4_ext_find_extent() to free the struct ext4_ext_path object in case
      of an error, and there are interlocking dependencies all the way up to
      ext4_ext_map_blocks(), ext4_swap_extents(), and
      ext4_ext_remove_space().
      
      Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it
      is no longer necessary.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      dfe50809
    • Theodore Ts'o's avatar
      ext4: drop EXT4_EX_NOFREE_ON_ERR in convert_initialized_extent() · 4f224b8b
      Theodore Ts'o authored
      Transfer responsibility of freeing struct ext4_ext_path on error to
      ext4_ext_find_extent().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4f224b8b
    • Theodore Ts'o's avatar
      ext4: collapse ext4_convert_initialized_extents() · e8b83d93
      Theodore Ts'o authored
      The function ext4_convert_initialized_extents() is only called by a
      single function --- ext4_ext_convert_initalized_extents().  Inline the
      code and get rid of the unnecessary bits in order to simplify the code.
      
      Rename ext4_ext_convert_initalized_extents() to
      convert_initalized_extents() since it's a static function that is
      actually only used in a single caller, ext4_ext_map_blocks().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e8b83d93
    • Theodore Ts'o's avatar
      ext4: teach ext4_ext_find_extent() to free path on error · 705912ca
      Theodore Ts'o authored
      Right now, there are a places where it is all to easy to leak memory
      on an error path, via a usage like this:
      
      	struct ext4_ext_path *path = NULL
      
      	while (...) {
      		...
      		path = ext4_ext_find_extent(inode, block, path, 0);
      		if (IS_ERR(path)) {
      			/* oops, if path was non-NULL before the call to
      			   ext4_ext_find_extent, we've leaked it!  :-(  */
      			...
      			return PTR_ERR(path);
      		}
      		...
      	}
      
      Unfortunately, there some code paths where we are doing the following
      instead:
      
      	path = ext4_ext_find_extent(inode, block, orig_path, 0);
      
      and where it's important that we _not_ free orig_path in the case
      where ext4_ext_find_extent() returns an error.
      
      So change the function signature of ext4_ext_find_extent() so that it
      takes a struct ext4_ext_path ** for its third argument, and by
      default, on an error, it will free the struct ext4_ext_path, and then
      zero out the struct ext4_ext_path * pointer.  In order to avoid
      causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes
      ext4_ext_find_extent() to use the original behavior of forcing the
      caller to deal with freeing the original path pointer on the error
      case.
      
      The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this
      allows for a gentle transition and makes the patches easier to verify.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      
      		
      705912ca
    • Theodore Ts'o's avatar
      ext4: fix accidental flag aliasing in ext4_map_blocks flags · bd30d702
      Theodore Ts'o authored
      Commit b8a86845 introduced an accidental flag aliasing between
      EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN.
      
      Fortunately, this didn't introduce any untorward side effects --- we
      got lucky.  Nevertheless, fix this and leave a warning to hopefully
      avoid this from happening in the future.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bd30d702
    • Theodore Ts'o's avatar
      ext4: fix ZERO_RANGE bug hidden by flag aliasing · 713e8dde
      Theodore Ts'o authored
      We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN
      falgs, which apparently was hiding a bug that was unmasked when this
      flag aliasing issue was addressed (see the subsequent commit).  The
      reproduction case was:
      
         fsx -N 10000 -l 500000 -r 4096 -t 4096 -w 4096 -Z -R -W /vdb/junk
      
      ... which would cause fsx to report corruption in the data file.
      
      The fix we have is a bit of an overkill, but I'd much rather be
      conservative for now, and we can optimize ZERO_RANGE_FL handling
      later.  The fact that we need to zap the extent_status cache for the
      inode is unfortunate, but correctness is far more important than
      performance.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      713e8dde
  4. 31 Aug, 2014 4 commits
  5. 30 Aug, 2014 6 commits
  6. 29 Aug, 2014 8 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · d4f03186
      Linus Torvalds authored
      Pull ext4 bugfixes from Ted Ts'o:
       "Ext4 bug fixes for 3.17, to provide better handling of memory
        allocation failures, and to fix some journaling bugs involving
        journal checksums and FALLOC_FL_ZERO_RANGE"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix same-dir rename when inline data directory overflows
        jbd2: fix descriptor block size handling errors with journal_csum
        jbd2: fix infinite loop when recovering corrupt journal blocks
        ext4: update i_disksize coherently with block allocation on error path
        ext4: fix transaction issues for ext4_fallocate and ext_zero_range
        ext4: fix incorect journal credits reservation in ext4_zero_range
        ext4: move i_size,i_disksize update routines to helper function
        ext4: fix BUG_ON in mb_free_blocks()
        ext4: propagate errors up to ext4_find_entry()'s callers
      d4f03186
    • Linus Torvalds's avatar
      Merge tag 'dm-3.17-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · ef13c8af
      Linus Torvalds authored
      Pull device mapper fix from Mike Snitzer:
       "Fix a 3.17-rc1 regression introduced by switching the DM crypt target
        to using per-bio data"
      
      * tag 'dm-3.17-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm crypt: fix access beyond the end of allocated space
      ef13c8af
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 522a15db
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "A smaller collection of fixes that have come up since the initial
        merge window pull request.  This contains:
      
         - error handling cleanup and support for larger than 16 byte cdbs in
           sg_io() from Christoph.  The latter just matches what bsg and
           friends support, sg_io() got left out in the merge.
      
         - an option for brd to expose partitions in /proc/partitions.  They
           are hidden by default for compat reasons.  From Dmitry Monakhov.
      
         - a few blk-mq fixes from me - killing a dead/unused flag, fix for
           merging happening even if turned off, and correction of a few
           comments.
      
         - removal of unnecessary ->owner setting in systemace.  From Michal
           Simek.
      
         - two related fixes for a problem with nesting freezing of queues in
           blk-mq.  One from Ming Lei removing an unecessary freeze operation,
           and another from Tejun fixing the nesting regression introduced in
           the merge window.
      
         - fix for a BUG_ON() at bio_endio time when protection info is
           attached and the IO has an error.  From Sagi Grimberg.
      
         - two scsi_ioctl bug fixes for regressions with scsi-mq from Tony
           Battersby.
      
         - a cfq weight update fix and subsequent comment update from Toshiaki
           Makita"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        cfq-iosched: Add comments on update timing of weight
        cfq-iosched: Fix wrong children_weight calculation
        block: fix error handling in sg_io
        fix regression in SCSI_IOCTL_SEND_COMMAND
        scsi-mq: fix requests that use a separate CDB buffer
        block: support > 16 byte CDBs for SG_IO
        block: cleanup error handling in sg_io
        brd: add ram disk visibility option
        block: systemace: Remove .owner field for driver
        blk-mq: blk_mq_freeze_queue() should allow nesting
        blk-mq: correct a few wrong/bad comments
        block: Fix BUG_ON when pi errors occur
        blk-mq: don't allow merges if turned off for the queue
        blk-mq: get rid of unused BLK_MQ_F_SHOULD_SORT flag
        blk-mq: fix WARNING "percpu_ref_kill() called more than once!"
      522a15db
    • Will Deacon's avatar
      alpha: io: implement relaxed accessor macros for writes · 9e36c633
      Will Deacon authored
      write{b,w,l,q}_relaxed are implemented by some architectures in order to
      permit memory-mapped I/O writes with weaker barrier semantics than the
      non-relaxed variants.
      
      This patch implements these write macros for Alpha, in the same vein as
      the relaxed read macros, which are already implemented.
      Acked-by: default avatarRichard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e36c633
    • Michael Cree's avatar
    • Darrick J. Wong's avatar
      ext4: fix same-dir rename when inline data directory overflows · d80d448c
      Darrick J. Wong authored
      When performing a same-directory rename, it's possible that adding or
      setting the new directory entry will cause the directory to overflow
      the inline data area, which causes the directory to be converted to an
      extent-based directory.  Under this circumstance it is necessary to
      re-read the directory when deleting the old dirent because the "old
      directory" context still points to i_block in the inode table, which
      is now an extent tree root!  The delete fails with an FS error, and
      the subsequent fsck complains about incorrect link counts and
      hardlinked directories.
      
      Test case (originally found with flat_dir_test in the metadata_csum
      test program):
      
      # mkfs.ext4 -O inline_data /dev/sda
      # mount /dev/sda /mnt
      # mkdir /mnt/x
      # touch /mnt/x/changelog.gz /mnt/x/copyright /mnt/x/README.Debian
      # sync
      # for i in /mnt/x/*; do mv $i $i.longer; done
      # ls -la /mnt/x/
      total 0
      -rw-r--r-- 1 root root 0 Aug 25 12:03 changelog.gz.longer
      -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright
      -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright.longer
      -rw-r--r-- 1 root root 0 Aug 25 12:03 README.Debian.longer
      
      (Hey!  Why are there four files now??)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d80d448c
    • Darrick J. Wong's avatar
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong authored
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: default avatarTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
    • Darrick J. Wong's avatar
      jbd2: fix infinite loop when recovering corrupt journal blocks · 022eaa75
      Darrick J. Wong authored
      When recovering the journal, don't fall into an infinite loop if we
      encounter a corrupt journal block.  Instead, just skip the block and
      return an error, which fails the mount and thus forces the user to run
      a full filesystem fsck.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      022eaa75