1. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Fix the alloc on close after a truncate hueristic · 5534fb5b
      Theodore Ts'o authored
      In an attempt to avoid doing an unneeded flush after opening a
      (previously non-existent) file with O_CREAT|O_TRUNC, the code only
      triggered the hueristic if ei->disksize was non-zero.  Turns out that
      the VFS doesn't call ->truncate() if the file doesn't exist, and
      ei->disksize is always zero even if the file previously existed.  So
      remove the test, since it isn't necessary and in fact disabled the
      hueristic.
      
      Thanks to Clemens Eisserer that he was seeing problems with files
      written using kwrite and eclipse after sudden crashes caused by a
      buggy Intel video driver.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5534fb5b
  2. 16 Sep, 2009 1 commit
  3. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags · 1b9c12f4
      Theodore Ts'o authored
      EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
      and the hex value assigned to it collides with FS_DIRECTIO_FL (which
      is also stored in i_flags).  There's no reason for the
      EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
      i_state instead.
      
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1b9c12f4
  4. 16 Sep, 2009 5 commits
    • Eric Sandeen's avatar
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen authored
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
    • Akira Fujita's avatar
      ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT · c40ce3c9
      Akira Fujita authored
      If logical block offset of original file which is passed to
      EXT4_IOC_MOVE_EXT is different from donor file's,
      a calculation error occurs in ext4_calc_swap_extents(),
      therefore wrong block is exchanged between original file and donor file.
      As a result, we hit ext4_error() in check_block_validity().
      To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
      add checks to mext_calc_swap_extents() and handle it as error,
      since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT.
      Reported-by: default avatarPeng Tao <bergwolf@gmail.com>
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c40ce3c9
    • Akira Fujita's avatar
      ext4: Add null extent check to ext_get_path · 347fa6f1
      Akira Fujita authored
      There is the possibility that path structure which is taken
      by ext4_ext_find_extent() indicates null extents.
      Because during data block exchanging in ext4_move_extents(),
      constitution of an extent tree may be changed.
      As a solution, the patch adds null extent check
      to ext_get_path().
      Reported-by: default avatarPeng Tao <bergwolf@gmail.com>
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      347fa6f1
    • Akira Fujita's avatar
      ext4: Replace BUG_ON() with ext4_error() in move_extents.c · 2147b1a6
      Akira Fujita authored
      Replace BUG_ON calls with a call to ext4_error()
      to print an error message if EXT4_IOC_MOVE_EXT failed
      with some kind of reasons.  This will help to debug.
      Ted pointed this out, thanks.
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2147b1a6
    • Akira Fujita's avatar
      ext4: Replace get_ext_path macro with an inline funciton · e8505970
      Akira Fujita authored
      Replace get_ext_path macro with an inline function,
      since this macro looks like a function call but its arguments
      get modified. Ted pointed this out, thanks.
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e8505970
  5. 15 Sep, 2009 1 commit
  6. 11 Sep, 2009 3 commits
    • Theodore Ts'o's avatar
      ext4: Fix initalization of s_flex_groups · 7ad9bb65
      Theodore Ts'o authored
      The s_flex_groups array should have been initialized using atomic_add
      to sum up the free counts from the block groups that make up a
      flex_bg.  By using atomic_set, the value of the s_flex_groups array
      was set to the values of the last block group in the flex_bg.  
      
      The impact of this bug is that the block and inode allocation
      algorithms might not pick the best flex_bg for new allocation.
      
      Thanks to Damien Guibouret for pointing out this problem!
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7ad9bb65
    • Andreas Schlick's avatar
      ext4: Always set dx_node's fake_dirent explicitly. · 1f7bebb9
      Andreas Schlick authored
      When ext4_dx_add_entry() has to split an index node, it has to ensure that
      name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
      won't recognise it as an intermediate htree node and consider the htree to
      be corrupted.
      Signed-off-by: default avatarAndreas Schlick <schlick@lavabit.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1f7bebb9
    • Theodore Ts'o's avatar
      ext4: Fix async commit mode to be safe by using a barrier · 0e3d2a63
      Theodore Ts'o authored
      Previously the journal_async_commit mount option was equivalent to
      using barrier=0 (and just as unsafe).  This patch fixes it so that we
      eliminate the barrier before the commit block (by not using ordered
      mode), and explicitly issuing an empty barrier bio after writing the
      commit block.  Because of the journal checksum, it is safe to do this;
      if the journal blocks are not all written before a power failure, the
      checksum in the commit block will prevent the last transaction from
      being replayed.
      
      Using the fs_mark benchmark, using journal_async_commit shows a 50%
      improvement:
      
      FSUse%        Count         Size    Files/sec     App Overhead
           8         1000        10240         30.5            28242
      
      vs.
      
      FSUse%        Count         Size    Files/sec     App Overhead
           8         1000        10240         45.8            28620
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0e3d2a63
  7. 10 Sep, 2009 5 commits
  8. 12 Sep, 2009 1 commit
  9. 10 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() · c7acb4c1
      Theodore Ts'o authored
      When ext4 is using a journal, a metadata block which is deallocated
      must be passed into the journal layer so it can be dropped from the
      current transaction and/or revoked.  This is done by calling the
      functions ext4_journal_forget() and ext4_journal_revoke(), which call
      jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.
      
      Since the jbd2_journal_forget() and jbd2_journal_revoke() call
      bforget(), if ext4 is not using a journal, ext4_journal_forget() and
      ext4_journal_revoke() must call bforget() to avoid a dirty metadata
      block overwriting a block after it has been reallocated and reused for
      another inode's data block.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c7acb4c1
  10. 08 Sep, 2009 1 commit
  11. 10 Sep, 2009 1 commit
  12. 06 Sep, 2009 3 commits
  13. 16 Sep, 2009 1 commit
  14. 06 Sep, 2009 1 commit
  15. 05 Sep, 2009 1 commit
  16. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: fix tracepoint format string warnings · a3710fd1
      Theodore Ts'o authored
      Unlike on some other architectures ino_t is an unsigned int on s390.
      So add an explicit cast to avoid lots of compile warnings:
      
      In file included from include/trace/ftrace.h:285,
                       from include/trace/define_trace.h:61,
                       from include/trace/events/ext4.h:711,
                       from fs/ext4/super.c:50:
      include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
      include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a3710fd1
  17. 05 Sep, 2009 1 commit
  18. 01 Sep, 2009 1 commit
  19. 31 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Restore wbc->range_start in ext4_da_writepages() · de89de6e
      Theodore Ts'o authored
      To solve a lock inversion problem, we implement part of the
      range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
      for more details.)
      
      As part of that change wbc->range_start was modified by ext4's
      writepages function, which causes its callers to get confused since
      they aren't expecting the filesystem to modify it.  The simplest fix
      is to save and restore wbc->range_start in ext4_da_writepages.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      de89de6e
  20. 17 Sep, 2009 1 commit
  21. 30 Aug, 2009 1 commit
  22. 29 Aug, 2009 1 commit
  23. 28 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: fix extent sanity checking code with AGGRESSIVE_TEST · 55ad63bf
      Theodore Ts'o authored
      The extents sanity-checking code depends on the ext4_ext_space_*()
      functions returning the maximum alloable size for eh_max; however,
      when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the
      extent tree handling code, this prevents a normally created ext4
      filesystem from being mounted with the errors:
      
      Aug 26 15:43:50 bsd086 kernel: [   96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0)
      Aug 26 15:43:50 bsd086 kernel: [   96.070526] EXT4-fs (sda8): no journal found
      
      Bug reported by Akira Fujita.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      55ad63bf
  24. 26 Aug, 2009 3 commits
    • Eric Sandeen's avatar
      ext4: use ext4_grpblk_t more extensively · a36b4498
      Eric Sandeen authored
      unsigned  short is potentially too small to track blocks within
      a group; today it is safe due to restrictions in e2fsprogs but
      we have _lo / _hi bits for group blocks with the intent to go
      up to 32 bits, so clean this up now.
      
      There are many more places where we use unsigned/int/unsigned int
      to contain a group block but this should at least fix all the
      short types.
      
      I added a few comments to the struct ext4_group_info definition
      as well.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a36b4498
    • Eric Sandeen's avatar
      ext4: use variables not types in sizeofs() for allocations · 1927805e
      Eric Sandeen authored
      Precursor to changing some types; to keep things in sync, it 
      seems better to allocate/memset based on the size of the 
      variables we are using rather than on some disconnected 
      basic type like "unsigned short"
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      1927805e
    • Aneesh Kumar K.V's avatar
      ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84
      Aneesh Kumar K.V authored
      We need to unlock the new inode before iput.  This patch fixes the
      following warning when calling chattr +e to migrate a file to use
      extents.  It also fixes problems in when e4defrag attempts to
      defragment an inode.
      
      [  470.400044] ------------[ cut here ]------------
      [  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
      [  470.400072] Hardware name: N/A
      .....
      ...
      [  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
      [  470.400359] Call Trace:
      [  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
      [  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
      [  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
      [  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
      [  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
      [  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
      [  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
      [  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
      [  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
      [  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
      [  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
      [  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
      [  470.400557] ---[ end trace ab85723542352dac ]---
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a8526e84
  25. 18 Aug, 2009 2 commits
    • Eric Sandeen's avatar
      ext4: Add feature set check helper for mount & remount paths · a13fb1a4
      Eric Sandeen authored
      A user reported that although his root ext4 filesystem was mounting
      fine, other filesystems would not mount, with the:
      
      "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"
      
      error on his 32-bit box built without CONFIG_LBDAF.  This is because
      the test at mount time for this situation was not being re-checked
      on remount, and the normal boot process makes an ro->rw transition,
      so this was being missed.
      
      Refactor to make a common helper function to test the filesystem
      features against the type of mount request (RO vs. RW) so that we 
      stay consistent.
      
      Addresses Red-Hat-Bugzilla: #517650
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a13fb1a4
    • Eric Sandeen's avatar
      simplify some logic in ext4_mb_normalize_request · 38877f4e
      Eric Sandeen authored
      While reading through some of the mballoc code it seems that a couple
      spots in the size normalization function could be streamlined.
      
      The test for non-overlapping PAs can be or'd for the start & end
      conditions, and the tests for adjacent PAs can be else-if'd - 
      it's essentially independently testing:
      
      	if (A + B <= C)
      		...
      	if (A > C)
      		...
      
      These cannot both be true so it seems like the else-if might
      be slightly more efficient and/or informative.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      38877f4e