1. 10 Feb, 2008 4 commits
    • Eric Sandeen's avatar
      ext4: allocate struct ext4_allocation_context from a kmem cache · 256bdb49
      Eric Sandeen authored
      struct ext4_allocation_context is rather large, and this bloats
      the stack of many functions which use it.  Allocating it from
      a named slab cache will alleviate this.
      
      For example, with this change (on top of the noinline patch sent earlier):
      
      -ext4_mb_new_blocks		200
      +ext4_mb_new_blocks		 40
      
      -ext4_mb_free_blocks		344
      +ext4_mb_free_blocks		168
      
      -ext4_mb_release_inode_pa	216
      +ext4_mb_release_inode_pa	 40
      
      -ext4_mb_release_group_pa	192
      +ext4_mb_release_group_pa	 24
      
      Most of these stack-allocated structs are actually used only for
      mballoc history; and in those cases often a smaller struct would do.
      So changing that may be another way around it, at least for those
      functions, if preferred.  For now, in those cases where the ac
      is only for history, an allocation failure simply skips the history
      recording, and does not cause any other failures.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      
      256bdb49
    • Dave Kleikamp's avatar
      JBD2: Clear buffer_ordered flag for barried IO request on success · c4e35e07
      Dave Kleikamp authored
      In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered
      flag for the bh after barried IO has succeed. This prevents later, if
      the same buffer head were submitted to the underlying device, which has
      been reconfigured to not support barrier request, the JBD2 commit code
      could treat it as a normal IO (without barrier).
      
      This is a port from JBD/ext3 fix from Neil Brown.
      
      More details from Neil:
      
      Some devices - notably dm and md - can change their behaviour in
      response to BIO_RW_BARRIER requests.  They might start out accepting
      such requests but on reconfiguration, they find out that they cannot
      any more. JBD2 deal with this by always testing if BIO_RW_BARRIER
      requests fail with EOPNOTSUPP, and retrying the write
      requests without the barrier (probably after waiting for any pending
      writes to complete).
      
      However there is a bug in the handling this in JBD2 for ext4 .
      
      When ext4/JBD2 to submit a BIO_RW_BARRIER request,
      it sets the buffer_ordered flag on the buffer head.
      If the request completes successfully, the flag STAYS SET.
      
      Other code might then write the same buffer_head after the device has
      been reconfigured to not accept barriers.  This write will then fail,
      but the "other code" is not ready to handle EOPNOTSUPP errors and the
      error will be treated as fatal.
      
      Cc:  Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c4e35e07
    • Jan Kara's avatar
      ext4: Fix Direct I/O locking · 7fb5409d
      Jan Kara authored
      We cannot start transaction in ext4_direct_IO() and just let it last
      during the whole write because dio_get_page() acquires mmap_sem which
      ranks above transaction start (e.g. because we have dependency chain
      mmap_sem->PageLock->journal_start, or because we update atime while
      holding mmap_sem) and thus deadlocks could happen. We solve the problem
      by starting a transaction separately for each ext4_get_block() call.
      
      We *could* have a problem that we allocate a block and before its data
      are written out the machine crashes and thus we expose stale data. But
      that does not happen because for hole-filling generic code falls back to
      buffered writes and for file extension, we add inode to orphan list and
      thus in case of crash, journal replay will truncate inode back to the
      original size.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7fb5409d
    • Aneesh Kumar K.V's avatar
      ext4: Fix circular locking dependency with migrate and rm. · 8009f9fb
      Aneesh Kumar K.V authored
      In order to prevent a circular locking dependency when an unlink
      operation is racing with an ext4 migration, we delay taking i_data_sem
      until just before switch the inode format, and use i_mutex to prevent
      writes and truncates during the first part of the migration operation.
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8009f9fb
  2. 06 Feb, 2008 1 commit
    • Eric Sandeen's avatar
      allow in-inode EAs on ext4 root inode · 0040d987
      Eric Sandeen authored
      The ext3 root inode was treated specially with respect
      to in-inode extended attributes, for reasons detailed
      in the removed comment below.  The first mkfs-created
      inodes would not get extra_i_size or the EXT3_STATE_XATTR
      flag set in ext3_read_inode, which disallowed reading or
      setting in-inode EAs on the root.
      
      However, in ext4, ext4_mark_inode_dirty calls
      ext4_expand_extra_isize for all inodes; once this is done
      EAs may be placed in the root ext4 inode body.
      
      But for reasons above, it won't be found after a reboot.
      
      testcase:
      
      setfattr -n user.name -v value mntpt/
      setfattr -n user.name2 -v value2 mntpt/
      umount mntpt/; remount mntpt/
      getfattr -d mntpt/
      
      name2/value2 has gone missing; debugfs shows it in the
      inode body, but it is not found there by getattr.
      
      The following fixes it up; newer mkfs appears to properly
      zero the inodes, so this workaround isn't needed for ext4.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0040d987
  3. 10 Feb, 2008 1 commit
  4. 05 Feb, 2008 3 commits
  5. 01 Feb, 2008 1 commit
  6. 05 Feb, 2008 1 commit
    • Mingming Cao's avatar
      jbd2: Add error check to journal_wait_on_commit_record to avoid oops · b048d846
      Mingming Cao authored
      The buffer head pointer passed to journal_wait_on_commit_record() could
      be NULL if the previous journal_submit_commit_record() failed or journal
      has already aborted.
      
      Looking at the jbd2 debug messages, before the oops happened, the jbd2
      is aborted due to trying to access the next log block beyond the end
      of device. This might be caused by using a corrupted image.
      
      We need to check the error returns from journal_submit_commit_record()
      and avoid calling journal_wait_on_commit_record() in the failure case.
      
      This addresses Kernel Bugzilla #9849
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b048d846
  7. 09 Feb, 2008 29 commits