1. 06 Jan, 2009 2 commits
    • Aneesh Kumar K.V's avatar
      ext4: Use new buffer_head flag to check uninit group bitmaps initialization · 2ccb5fb9
      Aneesh Kumar K.V authored
      For uninit block group, the on-disk bitmap is not initialized. That
      implies we cannot depend on the uptodate flag on the bitmap
      buffer_head to find bitmap validity.  Use a new buffer_head flag which
      would be set after we properly initialize the bitmap.  This also
      prevents (re-)initializing the uninit group bitmap every time we call 
      ext4_read_block_bitmap().
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2ccb5fb9
    • Aneesh Kumar K.V's avatar
      ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() · 39341867
      Aneesh Kumar K.V authored
      We need to make sure we update the inode bitmap and clear
      EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
      ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
      whether to initialize the inode bitmap each time it is called.
      (introduced by commit c806e68f.)
      
      ext4_read_inode_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
      	ext4_init_inode_bitmap(sb, bh, block_group, desc);
      
      and ext4_new_inode does
      if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                         ino, inode_bitmap_bh->b_data))
      		   ......
      		   ...
      spin_lock(sb_bgl_lock(sbi, group));
      
      gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
      i.e., on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
      parallel ext4_read_inode_bitmap can zero out the bitmap in between
      the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
      EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
      EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
      # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71
      ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      39341867
  2. 04 Jan, 2009 1 commit
  3. 06 Jan, 2009 3 commits
    • Aneesh Kumar K.V's avatar
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V authored
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • Aneesh Kumar K.V's avatar
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V authored
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • Aneesh Kumar K.V's avatar
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V authored
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  4. 24 Nov, 2008 1 commit
    • Aneesh Kumar K.V's avatar
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V authored
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  5. 06 Jan, 2009 1 commit
  6. 06 Nov, 2008 1 commit
  7. 25 Nov, 2008 1 commit
  8. 06 Jan, 2009 2 commits
  9. 22 Nov, 2008 1 commit
  10. 05 Nov, 2008 2 commits
  11. 06 Jan, 2009 1 commit
  12. 04 Nov, 2008 1 commit
  13. 04 Jan, 2009 1 commit
  14. 17 Dec, 2008 1 commit
  15. 26 Nov, 2008 1 commit
    • Josef Bacik's avatar
      jbd2: improve jbd2 fsync batching · e07f7183
      Josef Bacik authored
      This patch removes the static sleep time in favor of a more self
      optimizing approach where we measure the average amount of time it
      takes to commit a transaction to disk and the ammount of time a
      transaction has been running.  If somebody does a sync write or an
      fsync() traditionally we would sleep for 1 jiffies, which depending on
      the value of HZ could be a significant amount of time compared to how
      long it takes to commit a transaction to the underlying storage.  With
      this patch instead of sleeping for a jiffie, we check to see if the
      amount of time this transaction has been running is less than the
      average commit time, and if it is we sleep for the delta using
      schedule_hrtimeout to give us a higher precision sleep time.  This
      greatly benefits high end storage where you could end up sleeping for
      longer than it takes to commit the transaction and therefore sitting
      idle instead of allowing the transaction to be committed by keeping
      the sleep time to a minimum so you are sure to always be doing
      something.
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e07f7183
  16. 06 Jan, 2009 3 commits
    • Aneesh Kumar K.V's avatar
      ext4: Don't overwrite allocation_context ac_status · 032115fc
      Aneesh Kumar K.V authored
      We can call ext4_mb_check_limits even after successfully allocating
      the requested blocks.  In that case, make sure we don't overwrite
      ac_status if it already has the status AC_STATUS_FOUND.  This fixes
      the lockdep warning:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28-rc6-autokern1 #1
      ---------------------------------------------
      fsstress/11948 is trying to acquire lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      .....
      
      stack backtrace:
      .....
       [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
      .....
      
      but task is already holding lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      032115fc
    • Theodore Ts'o's avatar
      ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() · fde4d95a
      Theodore Ts'o authored
      This removes annoying blank syslog entries emitted by ext4_error() or
      ext4_warning(), since these functions add their own newline.
      Signed-off-by: default avatarNick Warne <nick@ukfsn.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      fde4d95a
    • Theodore Ts'o's avatar
      jbd2: Add barrier not supported test to journal_wait_on_commit_record · fd98496f
      Theodore Ts'o authored
      Xen doesn't report that barriers are not supported until buffer I/O is
      reported as completed, instead of when the buffer I/O is submitted.
      Add a check and a fallback codepath to journal_wait_on_commit_record()
      to detect this case, so that attempts to mount ext4 filesystems on
      LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
      journal on device XXX"; "Remounting filesystem read-only" error.
      
      Thanks to Andreas Sundstrom for reporting this issue.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      fd98496f
  17. 07 Jan, 2009 1 commit
    • Frank Mayhar's avatar
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar authored
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: default avatarFrank Mayhar <fmayhar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  18. 17 Dec, 2008 1 commit
  19. 27 Nov, 2008 1 commit
  20. 26 Nov, 2008 1 commit
  21. 25 Nov, 2008 2 commits
  22. 06 Jan, 2009 2 commits
  23. 05 Nov, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: tone down ext4_da_writepages warnings · 2a21e37e
      Theodore Ts'o authored
      If the filesystem has errors, ext4_da_writepages() will return a *lot*
      of errors, including lots and lots of stack dumps.  While it's true
      that we are dropping user data on the floor, which is unfortunate, the
      stack dumps aren't helpful, and they tend to obscure the true original
      root cause of the problem.  So in the case where the filesystem has
      aborted, return an EROFS right away.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2a21e37e
  24. 12 Dec, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o authored
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  25. 07 Dec, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o authored
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  26. 02 Jan, 2009 1 commit
  27. 06 Jan, 2009 1 commit
  28. 06 Dec, 2008 1 commit
  29. 28 Oct, 2008 2 commits
  30. 29 Oct, 2008 1 commit