1. 06 Jan, 2009 5 commits
  2. 05 Jan, 2009 1 commit
  3. 04 Jan, 2009 2 commits
  4. 06 Jan, 2009 3 commits
    • Theodore Ts'o's avatar
      ext4: Remove code to create the journal inode · c3191067
      Theodore Ts'o authored
      This code has been obsolete in quite some time, since the supported
      method for adding a journal inode is to use tune2fs (or to creating
      new filesystem with a journal via mke2fs or mkfs.ext4).
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c3191067
    • Toshiyuki Okajima's avatar
      ext4: provide function to release metadata pages under memory pressure · c39a7f84
      Toshiyuki Okajima authored
      Pages in the page cache belonging to ext4 data files are released via
      the ext4_releasepage() function specified in the ext4 inode's
      address_space_ops.  However, metadata blocks (such as indirect blocks,
      directory blocks, etc) are managed via the block device
      address_space_ops, and they can not be released by
      try_to_free_buffers() if they have a journal head attached to them.
      
      To address this, we supply a release_metadata function which calls
      jbd2_journal_try_to_free_buffers() function to free the metadata, and
      which is called by the block device's blkdev_releasepage() function.
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      c39a7f84
    • Toshiyuki Okajima's avatar
      ext3: provide function to release metadata pages under memory pressure · 6b082b53
      Toshiyuki Okajima authored
      Pages in the page cache belonging to ext3 data files are released via
      the ext3_releasepage() function specified in the ext3 inode's
      address_space_ops.  However, metadata blocks (such as indirect blocks,
      directory blocks, etc) are managed via the block device
      address_space_ops, and they can not be released by
      try_to_free_buffers() if they have a journal head attached to them.
      
      To address this, we supply a try_to_free_pages() function which calls
      journal_try_to_free_buffers() function to free the metadata, and which
      is called by the block device's blkdev_releasepage() function.
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      6b082b53
  5. 03 Jan, 2009 1 commit
  6. 06 Jan, 2009 6 commits
  7. 04 Jan, 2009 1 commit
  8. 06 Jan, 2009 3 commits
    • Aneesh Kumar K.V's avatar
      ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0
      Aneesh Kumar K.V authored
      Rename the lower bits with suffix _lo and add helper
      to access the values. Also rename bg_itable_unused_hi
      to bg_pad as in e2fsprogs.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      560671a0
    • Aneesh Kumar K.V's avatar
      ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27
      Aneesh Kumar K.V authored
      We need to make sure we update the block bitmap and clear
      EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
      ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
      whether to initialize the block bitmap each time it is called
      (introduced by commit c806e68f), and this can race with block
      allocations in ext4_mb_mark_diskspace_used().
      
      ext4_read_block_bitmap does:
      
      spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
      if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	ext4_init_block_bitmap(sb, bh, block_group, desc);
      
      Now on the block allocation side we do
      
      mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
      			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
      ....
      spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
      if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
      	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
      
      ie on allocation we update the bitmap then we take the sb_bgl_lock
      and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
      parallel ext4_read_block_bitmap can zero out the bitmap in between
      the above mb_set_bits and spin_lock(sb_bg_lock..)
      
      The race results in below user visible errors
      EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
      EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      e8134b27
    • Aneesh Kumar K.V's avatar
      ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f
      Aneesh Kumar K.V authored
      The mballoc code likes to call ext4_error while it is holding locked
      block groups.  This can causes a scheduling in atomic context BUG.  We
      can't just unlock the block group and relock it after/if ext4_error
      returns since that might result in race conditions in the case where
      the filesystem is set to continue after finding errors.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5d1b1b3f
  9. 24 Nov, 2008 1 commit
    • Aneesh Kumar K.V's avatar
      ext4: Fix lockdep recursive locking warning · b7be019e
      Aneesh Kumar K.V authored
      In ext4_mb_init_group(), if the filesystem block size is less than
      PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
      groups in a loop.  We need to allow for this by using
      down_write_nested() and passing in the loop index as a lock subclass
      number.  This works because no other code path needs to take multiple
      alloc_sem's.  Note that lockdep will fail for filesystem blocksize
      smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
      a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
      etc.)
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b7be019e
  10. 06 Jan, 2009 1 commit
  11. 06 Nov, 2008 1 commit
  12. 25 Nov, 2008 1 commit
  13. 06 Jan, 2009 2 commits
  14. 22 Nov, 2008 1 commit
  15. 05 Nov, 2008 2 commits
  16. 06 Jan, 2009 1 commit
  17. 04 Nov, 2008 1 commit
  18. 04 Jan, 2009 1 commit
  19. 17 Dec, 2008 1 commit
  20. 26 Nov, 2008 1 commit
    • Josef Bacik's avatar
      jbd2: improve jbd2 fsync batching · e07f7183
      Josef Bacik authored
      This patch removes the static sleep time in favor of a more self
      optimizing approach where we measure the average amount of time it
      takes to commit a transaction to disk and the ammount of time a
      transaction has been running.  If somebody does a sync write or an
      fsync() traditionally we would sleep for 1 jiffies, which depending on
      the value of HZ could be a significant amount of time compared to how
      long it takes to commit a transaction to the underlying storage.  With
      this patch instead of sleeping for a jiffie, we check to see if the
      amount of time this transaction has been running is less than the
      average commit time, and if it is we sleep for the delta using
      schedule_hrtimeout to give us a higher precision sleep time.  This
      greatly benefits high end storage where you could end up sleeping for
      longer than it takes to commit the transaction and therefore sitting
      idle instead of allowing the transaction to be committed by keeping
      the sleep time to a minimum so you are sure to always be doing
      something.
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e07f7183
  21. 06 Jan, 2009 3 commits
    • Aneesh Kumar K.V's avatar
      ext4: Don't overwrite allocation_context ac_status · 032115fc
      Aneesh Kumar K.V authored
      We can call ext4_mb_check_limits even after successfully allocating
      the requested blocks.  In that case, make sure we don't overwrite
      ac_status if it already has the status AC_STATUS_FOUND.  This fixes
      the lockdep warning:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28-rc6-autokern1 #1
      ---------------------------------------------
      fsstress/11948 is trying to acquire lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      .....
      
      stack backtrace:
      .....
       [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
      .....
      
      but task is already holding lock:
       (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      032115fc
    • Theodore Ts'o's avatar
      ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() · fde4d95a
      Theodore Ts'o authored
      This removes annoying blank syslog entries emitted by ext4_error() or
      ext4_warning(), since these functions add their own newline.
      Signed-off-by: default avatarNick Warne <nick@ukfsn.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      fde4d95a
    • Theodore Ts'o's avatar
      jbd2: Add barrier not supported test to journal_wait_on_commit_record · fd98496f
      Theodore Ts'o authored
      Xen doesn't report that barriers are not supported until buffer I/O is
      reported as completed, instead of when the buffer I/O is submitted.
      Add a check and a fallback codepath to journal_wait_on_commit_record()
      to detect this case, so that attempts to mount ext4 filesystems on
      LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
      journal on device XXX"; "Remounting filesystem read-only" error.
      
      Thanks to Andreas Sundstrom for reporting this issue.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      fd98496f
  22. 07 Jan, 2009 1 commit
    • Frank Mayhar's avatar
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar authored
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: default avatarFrank Mayhar <fmayhar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0390131b