1. 27 Oct, 2011 4 commits
    • Eric Gouriou's avatar
      ext4: optimize memmmove lengths in extent/index insertions · 80e675f9
      Eric Gouriou authored
      ext4_ext_insert_extent() (respectively ext4_ext_insert_index())
      was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine
      how many entries needed to be moved beyond the insertion point.
      In practice this means that (320 - I) * 24 bytes were memmove()'d
      when I is the insertion point, rather than (#entries - I) * 24 bytes.
      
      This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead
      to only move existing entries. The code flow is also simplified
      slightly to highlight similarities and reduce code duplication in
      the insertion logic.
      
      This patch reduces system CPU consumption by over 25% on a 4kB
      synchronous append DIO write workload when used with the
      pre-2.6.39 x86_64 memmove() implementation. With the much faster
      2.6.39 memmove() implementation we still see a decrease in
      system CPU usage between 2% and 7%.
      
      Note that the ext_debug() output changes with this patch, splitting
      some log information between entries. Users of the ext_debug() output
      should note that the "move %d" units changed from reporting the number
      of bytes moved to reporting the number of entries moved.
      Signed-off-by: default avatarEric Gouriou <egouriou@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      80e675f9
    • Eric Gouriou's avatar
      ext4: optimize ext4_ext_convert_to_initialized() · 6f91bc5f
      Eric Gouriou authored
      This patch introduces a fast path in ext4_ext_convert_to_initialized()
      for the case when the conversion can be performed by transferring
      the newly initialized blocks from the uninitialized extent into
      an adjacent initialized extent. Doing so removes the expensive
      invocations of memmove() which occur during extent insertion and
      the subsequent merge.
      
      In practice this should be the common case for clients performing
      append writes into files pre-allocated via
      fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
      direct IO and when using a suboptimal implementation of memmove()
      (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
      consumption by 32%.
      
      Two new trace points are added to ext4_ext_convert_to_initialized()
      to offer visibility into its operations. No exit trace point has
      been added due to the multiplicity of return points. This can be
      revisited once the upstream cleanup is backported.
      Signed-off-by: default avatarEric Gouriou <egouriou@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      6f91bc5f
    • Thomas Gleixner's avatar
      jdb/jbd2: factor out common functions from the jbd[2] header files · 44606672
      Thomas Gleixner authored
      The state bits and the lock functions of jbd and jbd2 are
      identical.  Share them.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      44606672
    • Randy Dunlap's avatar
      jbd2: fix build when CONFIG_BUG is not enabled · 44705754
      Randy Dunlap authored
      Fix build error when CONFIG_BUG is not enabled:
      
      fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN'
      
      by changing __WARN() to WARN_ON(), as suggested by
      Arnaud Lacombe <lacombar@gmail.com>.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Arnaud Lacombe <lacombar@gmail.com>
      44705754
  2. 26 Oct, 2011 10 commits
  3. 25 Oct, 2011 3 commits
    • Darrick J. Wong's avatar
      ext4: prevent stack overrun in ext4_file_open · cf803903
      Darrick J. Wong authored
      In ext4_file_open, the filesystem records the mountpoint of the first
      file that is opened after mounting the filesystem.  It does this by
      allocating a 64-byte stack buffer, calling d_path() to grab the mount
      point through which this file was accessed, and then memcpy()ing 64
      bytes into the superblock's s_last_mounted field, starting from the
      return value of d_path(), which is stored as "cp".  However, if cp >
      buf (which it frequently is since path components are prepended
      starting at the end of buf) then we can end up copying stack data into
      the superblock.
      
      Writing stack variables into the superblock doesn't sound like a great
      idea, so use strlcpy instead.  Andi Kleen suggested using strlcpy
      instead of strncpy.
      Signed-off-by: default avatarDarrick J. Wong <djwong@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      cf803903
    • Dmitry Monakhov's avatar
      ext4: update EOFBLOCKS flag on fallocate properly · a4e5d88b
      Dmitry Monakhov authored
      EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
      Currently it happens only if new extent was allocated.
      
      TESTCASE:
      fallocate test_file -n -l4096
      fallocate test_file -l4096
      Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
      fsck will complain about that.
      
      Also remove ping pong in ext4_fallocate() in case of new extents,
      where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
      ext4_falloc_update_inode() restore it again.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a4e5d88b
    • Dmitry Monakhov's avatar
      ext4: remove messy logic from ext4_ext_rm_leaf · 750c9c47
      Dmitry Monakhov authored
      - Both callers(truncate and punch_hole) already aligned left end point
        so we no longer need split logic here.
      - Remove dead duplicated code.
      - Call ext4_ext_dirty only after we have updated eh_entries, otherwise
        we'll loose entries update. Regression caused by d583fb87
        266'th testcase in xfstests (http://patchwork.ozlabs.org/patch/120872)
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      750c9c47
  4. 22 Oct, 2011 1 commit
    • Dmitry Monakhov's avatar
      ext4: cleanup ext4_ext_grow_indepth code · 1939dd84
      Dmitry Monakhov authored
      Currently code make an impression what grow procedure is very complicated
      and some mythical paths, blocks are involved. But in fact grow in depth
      it relatively simple procedure:
       1) Just create new meta block and copy root data to that block.
       2) Convert root from extent to index if old depth == 0
       3) Update root block pointer
      
      This patch does:
       - Reorganize code to make it more self explanatory
       - Do not pass path parameter to new_meta_block() in order to
         provoke allocation from inode's group because top-level block
         should site closer to it's inode, but not to leaf data block.
      
         [ This happens anyway, due to logic in mballoc; we should drop
           the path parameter from new_meta_block() entirely.  -- tytso ]
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1939dd84
  5. 21 Oct, 2011 1 commit
  6. 20 Oct, 2011 2 commits
  7. 18 Oct, 2011 7 commits
  8. 17 Oct, 2011 1 commit
  9. 08 Oct, 2011 6 commits
  10. 06 Oct, 2011 2 commits
  11. 09 Sep, 2011 3 commits
    • Aditya Kali's avatar
      ext4: attempt to fix race in bigalloc code path · 5356f261
      Aditya Kali authored
      Currently, there exists a race between delayed allocated writes and
      the writeback when bigalloc feature is in use. The race was because we
      wanted to determine what blocks in a cluster are under delayed
      allocation and we were using buffer_delayed(bh) check for it. But, the
      writeback codepath clears this bit without any synchronization which
      resulted in a race and an ext4 warning similar to:
      
      EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
      		reserved data blocks
      
      The race existed in two places.
      (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
          writeback code path.
      (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
          buffer_delayed(bh) is set.
      
      To fix (1), this patch introduces a new buffer_head state bit -
      BH_Da_Mapped.  This bit is set under the protection of
      EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
      allocated blocks during the writeout time. We can now reliably check
      for this bit inside ext4_find_delalloc_range() to determine whether
      the reservation for the blocks have already been claimed or not.
      
      To fix (2), it was necessary to set buffer_delay(bh) under the
      protection of i_data_sem.  So, I extracted the very beginning of
      ext4_map_blocks into a new function - ext4_da_map_blocks() - and
      performed the required setting of bh_delay bit and the quota
      reservation under the protection of i_data_sem.  These two fixes makes
      the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
      thus removing the race.
      
      Tested: I was able to reproduce the problem by running 'dd' and
      'fsync' in parallel. Also, xfstests sometimes used to reproduce this
      race. After the fix both my test and xfstests were successful and no
      race (warning message) was observed.
      
      Google-Bug-Id: 4997027
      Signed-off-by: default avatarAditya Kali <adityakali@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5356f261
    • Aditya Kali's avatar
      ext4: add some tracepoints in ext4/extents.c · d8990240
      Aditya Kali authored
      This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
      ext4/inode.c.
      
      Tested: Built and ran the kernel and verified that these tracepoints work.
      Also ran xfstests.
      Signed-off-by: default avatarAditya Kali <adityakali@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
          
      d8990240
    • Theodore Ts'o's avatar
      ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() · df55c99d
      Theodore Ts'o authored
      Rename the function so it is more clear what is going on.  Also rename
      the various variables so it's clearer what's happening.
      
      Also fix a missing blocks to cluster conversion when reading the
      number of reserved blocks for root.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      df55c99d