1. 02 Aug, 2010 40 commits
    • Eric Sandeen's avatar
      ext4: don't scan/accumulate more pages than mballoc will allocate · 342c35e4
      Eric Sandeen authored
      commit c445e3e0 upstream (as of v2.6.34-git13)
      
      There was a bug reported on RHEL5 that a 10G dd on a 12G box
      had a very, very slow sync after that.
      
      At issue was the loop in write_cache_pages scanning all the way
      to the end of the 10G file, even though the subsequent call
      to mpage_da_submit_io would only actually write a smallish amt; then
      we went back to the write_cache_pages loop ... wasting tons of time
      in calling __mpage_da_writepage for thousands of pages we would
      just revisit (many times) later.
      
      Upstream it's not such a big issue for sys_sync because we get
      to the loop with a much smaller nr_to_write, which limits the loop.
      
      However, talking with Aneesh he realized that fsync upstream still
      gets here with a very large nr_to_write and we face the same problem.
      
      This patch makes mpage_add_bh_to_extent stop the loop after we've
      accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
      causes the write_cache_pages loop to break.
      
      Repeating the test with a dirty_ratio of 80 (to leave something for
      fsync to do), I don't see huge IO performance gains, but the reduction
      in cpu usage is striking: 80% usage with stock, and 2% with the
      below patch.  Instrumenting the loop in write_cache_pages clearly
      shows that we are wasting time here.
      
      Eventually we need to change mpage_da_map_pages() also submit its I/O
      to the block layer, subsuming mpage_da_submit_io(), and then change it
      call ext4_get_blocks() multiple times.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      342c35e4
    • Eric Sandeen's avatar
      ext4: stop issuing discards if not supported by device · 7a5fb6dc
      Eric Sandeen authored
      commit a30eec2a upstream (as of v2.6.34-git13)
      
      Turn off issuance of discard requests if the device does
      not support it - similar to the action we take for barriers.
      This will save a little computation time if a non-discardable
      device is mounted with -o discard, and also makes it obvious
      that it's not doing what was asked at mount time ...
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7a5fb6dc
    • Eric Sandeen's avatar
      ext4: don't return to userspace after freezing the fs with a mutex held · e6afda19
      Eric Sandeen authored
      commit 6b0310fb upstream (as of v2.6.34-git13)
      
      ext4_freeze() used jbd2_journal_lock_updates() which takes
      the j_barrier mutex, and then returns to userspace.  The
      kernel does not like this:
      
      ================================================
      [ BUG: lock held when returning to user space! ]
      ------------------------------------------------
      lvcreate/1075 is leaving the kernel with locks still held!
      1 lock held by lvcreate/1075:
       #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
      jbd2_journal_lock_updates+0xe1/0xf0
      
      Use vfs_check_frozen() added to ext4_journal_start_sb() and
      ext4_force_commit() instead.
      
      Addresses-Red-Hat-Bugzilla: #568503
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e6afda19
    • Eric Sandeen's avatar
      ext4: check s_log_groups_per_flex in online resize code · ad86a230
      Eric Sandeen authored
      commit 42007efd upstream (as of v2.6.34-git13)
      
      If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
      and every other access to this first tests s_log_groups_per_flex;
      same thing needs to happen in resize or we'll wander off into
      a null pointer when doing an online resize of the file system.
      
      Thanks to Christoph Biedl, who came up with the trivial testcase:
      
      # truncate --size 128M fsfile
      # mkfs.ext3 -F fsfile
      # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile
      # e2fsck -yDf -C0 fsfile
      # truncate --size 132M fsfile
      # losetup /dev/loop0 fsfile
      # mount /dev/loop0 mnt
      # resize2fs -p /dev/loop0
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=13549Reported-by: default avatarAlessandro Polverini <alex@nibbles.it>
      Test-case-by: default avatarChristoph Biedl  <bugzilla.kernel.bpeb@manchmal.in-ulm.de>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ad86a230
    • Dmitry Monakhov's avatar
      ext4: fix quota accounting in case of fallocate · 6c6671bf
      Dmitry Monakhov authored
      commit 35121c98 upstream (as of v2.6.34-git13)
      
      allocated_meta_data is already included in 'used' variable.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6c6671bf
    • Christian Borntraeger's avatar
      ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode · a625298d
      Christian Borntraeger authored
      commit b684b2ee upstream (as of v2.6.34-git13)
      
      I have an x86_64 kernel with i386 userspace. e4defrag fails on the
      EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat
      case. It seems that struct move_extent is compat save, only types
      with fixed widths are used:
      {
              __u32 reserved;         /* should be zero */
              __u32 donor_fd;         /* donor file descriptor */
              __u64 orig_start;       /* logical start offset in block for orig */
              __u64 donor_start;      /* logical start offset in block for donor */
              __u64 len;              /* block length to be moved */
              __u64 moved_len;        /* moved block length */
      };
      
      Lets just wire up EXT4_IOC_MOVE_EXT for the compat case.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      CC: Akira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      a625298d
    • Jing Zhang's avatar
      ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy() · 60986a16
      Jing Zhang authored
      commit e39e07fd upstream (as of v2.6.34-git13)
      
      This function cleans up after ext4_mb_load_buddy(), so the renaming
      makes the code clearer.
      Signed-off-by: default avatarJing Zhang <zj.barak@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      60986a16
    • Jing Zhang's avatar
      ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc · 99e25a99
      Jing Zhang authored
      commit 62e823a2 upstream (as of v2.6.34-git13)
      Signed-off-by: default avatarJing Zhang <zj.barak@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      99e25a99
    • Jing Zhang's avatar
      ext4: fix memory leaks in error path handling of ext4_ext_zeroout() · caf4fd5d
      Jing Zhang authored
      commit b720303d upstream (as of v2.6.34-git13)
      
      When EIO occurs after bio is submitted, there is no memory free
      operation for bio, which results in memory leakage. And there is also
      no check against bio_alloc() for bio.
      Acked-by: default avatarDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: default avatarJing Zhang <zj.barak@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      caf4fd5d
    • Dmitry Monakhov's avatar
      ext4: check missed return value in ext4_sync_file() · ee025869
      Dmitry Monakhov authored
      commit 0671e704 upstream (as of v2.6.34-git13)
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ee025869
    • Theodore Ts'o's avatar
      ext4: Issue the discard operation *before* releasing the blocks to be reused · 43a1669a
      Theodore Ts'o authored
      commit b90f6870 upstream (as of v2.6.34-rc6)
      
      Otherwise, we can end up having data corruption because the blocks
      could get reused and then discarded!
      
      https://bugzilla.kernel.org/show_bug.cgi?id=15579Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      43a1669a
    • Curt Wohlgemuth's avatar
      ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() · adaf14be
      Curt Wohlgemuth authored
      commit fd2dd9fb upstream (as of v2.6.34-rc6)
      
      Calls to ext4_get_inode_loc() returns with a reference to a buffer
      head in iloc->bh.  The callers of this function in ext4_write_inode()
      when in no journal mode and in ext4_xattr_fiemap() don't release the
      buffer head after using it.
      
      Addresses-Google-Bug: #2548165
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      adaf14be
    • Curt Wohlgemuth's avatar
      ext4: Fix possible lost inode write in no journal mode · e3001492
      Curt Wohlgemuth authored
      commit 8b472d73 upstream (as of v2.6.34-rc6)
      
      In the no-journal case, ext4_write_inode() will fetch the bh and call
      sync_dirty_buffer() on it.  However, if the bh has already been
      written and the bh reclaimed for some other purpose, AND if the inode
      is the only one in the inode table block in use, then
      ext4_get_inode_loc() will not read the inode table block from disk,
      but as an optimization, fill the block with zero's assuming that its
      caller will copy in the on-disk version of the inode.  This is not
      done by ext4_write_inode(), so the contents of the inode can simply
      get lost.  The fix is to use __ext4_get_inode_loc() with in_mem set to
      0, instead of ext4_get_inode_loc().  Long term the API needs to be
      fixed so it's obvious why latter is not safe.
      
      Addresses-Google-Bug: #2526446
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e3001492
    • Eric Sandeen's avatar
      ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs · 84b29e4a
      Eric Sandeen authored
      commit c4caae25 upstream (as of v2.6.34-rc3)
      
      When used_dirs was introduced for the flex_groups struct, it looks
      like the accounting was not put into place properly, in some places
      manipulating free_inodes rather than used_dirs.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      84b29e4a
    • Jan Kara's avatar
    • Akira Fujita's avatar
      ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl · 77795ad5
      Akira Fujita authored
      commit c437b273 upstream (as of v2.6.33-git11)
      
      a) Fix sparse warning in ext4_ioctl()
      b) Remove unneeded variable in mext_leaf_block()
      c) Fix spelling typo in mext_check_arguments()
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      77795ad5
    • Akira Fujita's avatar
      ext4: Fix the NULL reference in double_down_write_data_sem() · bff3bb48
      Akira Fujita authored
      commit 7247c0ca upstream (as of v2.6.33-git11)
      
      If EXT4_IOC_MOVE_EXT ioctl is called with NULL donor_fd, fget() in
      ext4_ioctl() gets inappropriate file structure for donor; so we need
      to do this check earlier, before calling double_down_write_data_sem().
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      bff3bb48
    • Akira Fujita's avatar
      ext4: Fix insertion point of extent in mext_insert_across_blocks() · 68285a49
      Akira Fujita authored
      commit 5fd5249a upstream (as of v2.6.33-git11)
      
      If the leaf node has 2 extent space or fewer and EXT4_IOC_MOVE_EXT
      ioctl is called with the file offset where after the 2nd extent
      covers, mext_insert_across_blocks() always tries to insert extent into
      the first extent.  As a result, the file gets corrupted because of
      wrong extent order.  The patch fixes this problem.
      Signed-off-by: default avatarAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      68285a49
    • Toshiyuki Okajima's avatar
      ext4: make "offset" consistent in ext4_check_dir_entry() · 2b8aa5ed
      Toshiyuki Okajima authored
      commit b8b8afe2 upstream (as of v2.6.33-git11)
      
      The callers of ext4_check_dir_entry() usually pass in the "file
      offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock,
      ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf,
      ext4_delete_entry) only pass in the buffer offset.
      
      To accomodate those last two (which would be hard to fix otherwise),
      this patch changes ext4_check_dir_entry() to print the physical block
      number and the relative offset as well as the passed-in offset.
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2b8aa5ed
    • Dmitry Monakhov's avatar
      ext4: Handle non empty on-disk orphan link · 44914757
      Dmitry Monakhov authored
      commit 6e3617e5 upstream (as of v2.6.33-git11)
      
      In case of truncate errors we explicitly remove inode from in-core
      orphan list via orphan_del(NULL, inode) without modifying the on-disk list.
      
      But later on, the same inode may be inserted in the orphan list again
      which will result the on-disk linked list getting corrupted.  If inode
      i_dtime contains valid value, then skip on-disk list modification.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      44914757
    • Dmitry Monakhov's avatar
      ext4: explicitly remove inode from orphan list after failed direct io · 2c20b117
      Dmitry Monakhov authored
      commit da1dafca upstream (as of v2.6.33-git11)
      
      Otherwise non-empty orphan list will be triggered on umount.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2c20b117
    • Dmitry Monakhov's avatar
      ext4: fix error handling in migrate · 338ae6b7
      Dmitry Monakhov authored
      commit f39490bc upstream (as of v2.6.33-git11)
      
      Set i_nlink to zero for temporary inode from very beginning.
      otherwise we may fail to start new journal handle and this
      inode will be unreferenced but with i_nlink == 1
      Since we hold inode reference it can not be pruned.
      
      Also add missed journal_start retval check.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      338ae6b7
    • Tao Ma's avatar
      ext4: Fix fencepost error in chosing choosing group vs file preallocation. · cc4e25d1
      Tao Ma authored
      commit cc483f10 upstream (as of v2.6.33-git11)
      
      The ext4 multiblock allocator decides whether to use group or file
      preallocation based on the file size.  When the file size reaches
      s_mb_stream_request (default is 16 blocks), it changes to use a
      file-specific preallocation. This is cool, but it has a tiny problem.
      
      See a simple script:
      mkfs.ext4 -b 1024 /dev/sda8 1000000
      mount -t ext4 -o nodelalloc /dev/sda8 /mnt/ext4
      for((i=0;i<5;i++))
      do
      cat /mnt/4096>>/mnt/ext4/a	#4096 is a file with 4096 characters.
      cat /mnt/4096>>/mnt/ext4/b
      done
      debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
      
      And you get
      BLOCKS:
      (0-14):8705-8719, (15):2356, (16-19):8465-8468
      
      So there are 3 extents, a bit strange for the lonely 15th logical
      block.  As we write to the 16 blocks, we choose file preallocation in
      ext4_mb_group_or_file, but in ext4_mb_normalize_request, we meet with
      the 16*1024 range, so no preallocation will be carried. file b then
      reserves the space after '2356', so when when write 16, we start from
      another part.
      
      This patch just change the check in ext4_mb_group_or_file, so
      that for the lonely 15 we will still use group preallocation.
      After the patch, we will get:
      debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
      BLOCKS:
      (0-15):8705-8720, (16-19):8465-8468
      
      Looks more sane. Thanks.
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      cc4e25d1
    • Jiaying Zhang's avatar
      ext4: Add flag to files with blocks intentionally past EOF · 19673924
      Jiaying Zhang authored
      commit c8d46e41 upstream (as of v2.6.33-git11)
      
      fallocate() may potentially instantiate blocks past EOF, depending
      on the flags used when it is called.
      
      e2fsck currently has a test for blocks past i_size, and it
      sometimes trips up - noticeably on xfstests 013 which runs fsstress.
      
      This patch from Jiayang does fix it up - it (along with
      e2fsprogs updates and other patches recently from Aneesh) has
      survived many fsstress runs in a row.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJiaying Zhang <jiayingz@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      19673924
    • Curt Wohlgemuth's avatar
      ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode · 7085239d
      Curt Wohlgemuth authored
      commit 73b50c1c upstream (as of v2.6.33-git11)
      
      Calls to ext4_handle_dirty_metadata should only pass in an inode
      pointer for inode-specific metadata, and not for shared metadata
      blocks such as inode table blocks, block group descriptors, the
      superblock, etc.
      
      The BUG_ON can get tripped when updating a special device (such as a
      block device) that is opened (so that i_mapping is set in
      fs/block_dev.c) and the file system is mounted in no journal mode.
      
      Addresses-Google-Bug: #2404870
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7085239d
    • Theodore Ts'o's avatar
      ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 884aefaa
      Theodore Ts'o authored
      commit 19f5fb7a upstream (as of v2.6.33-git11)
      
      At several places we modify EXT4_I(inode)->i_state without holding
      i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
      ext4_do_update_inode, ...). These modifications are racy and we can
      lose updates to i_state. So convert handling of i_state to use bitops
      which are atomic.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      884aefaa
    • Aneesh Kumar K.V's avatar
      ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag · 6595162b
      Aneesh Kumar K.V authored
      commit 1296cc85 upstream (as of v2.6.33-rc6)
      
      We should update reserve space if it is delalloc buffer
      and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag.
      So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of
      EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE
      
      [ Stable note: This fixes a corruption cuased by the following
        reproduction case:
      
        rm -f $TEST_FN
        touch $TEST_FN
        fallocate -n -o 656712 -l 858907 $TEST_FN
        dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=1011020 count=36983
        sync
        dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=332121 count=24005
        dd if=/dev/zero of=$TEST_FN conv=notrunc bs=1 seek=1040179 count=93319
      
        If the filesystem is then unmounted and e2fsck run forced, the
        i_blocks field for the file $TEST_FN will be found to be incorrect. ]
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6595162b
    • Aneesh Kumar K.V's avatar
      ext4: Fix quota accounting error with fallocate · 8b121392
      Aneesh Kumar K.V authored
      commit 5f634d06 upstream (as of v2.6.33-rc6)
      
      When we fallocate a region of the file which we had recently written,
      and which is still in the page cache marked as delayed allocated blocks
      we need to make sure we don't do the quota update on writepage path.
      This is because the needed quota updated would have already be done
      by fallocate.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      8b121392
    • Aneesh Kumar K.V's avatar
      ext4: Handle -EDQUOT error on write · d285d892
      Aneesh Kumar K.V authored
      commit 1db91382 upstream (as of v2.6.33-rc6)
      
      We need to release the journal before we do a write_inode.  Otherwise
      we could deadlock.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d285d892
    • Theodore Ts'o's avatar
      ext4: Calculate metadata requirements more accurately · c9b83238
      Theodore Ts'o authored
      commit 9d0be502 upstream (as of v2.6.33-rc3)
      
      In the past, ext4_calc_metadata_amount(), and its sub-functions
      ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
      badly over-estimated the number of metadata blocks that might be
      required for delayed allocation blocks.  This didn't matter as much
      when functions which managed the reserved metadata blocks were more
      aggressive about dropping reserved metadata blocks as delayed
      allocation blocks were written, but unfortunately they were too
      aggressive.  This was fixed in commit 0637c6f4, but as a result the
      over-estimation by ext4_calc_metadata_amount() would lead to reserving
      2-3 times the number of pending delayed allocation blocks as
      potentially required metadata blocks.  So if there are 1 megabytes of
      blocks which have been not yet been allocation, up to 3 megabytes of
      space would get reserved out of the user's quota and from the file
      system free space pool until all of the inode's data blocks have been
      allocated.
      
      This commit addresses this problem by much more accurately estimating
      the number of metadata blocks that will be required.  It will still
      somewhat over-estimate the number of blocks needed, since it must make
      a worst case estimate not knowing which physical blocks will be
      needed, but it is much more accurate than before.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c9b83238
    • Theodore Ts'o's avatar
      ext4: Fix accounting of reserved metadata blocks · 5d11fc79
      Theodore Ts'o authored
      commit ee5f4d9c upstream (as of v2.6.33-rc3)
      
      Commit 0637c6f4 had a typo which caused the reserved metadata blocks to
      not be released correctly.   Fix this.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      5d11fc79
    • Theodore Ts'o's avatar
      ext4: Patch up how we claim metadata blocks for quota purposes · beb62f7a
      Theodore Ts'o authored
      commit 0637c6f4 upstream (as of v2.6.33-rc3)
      
      As reported in Kernel Bugzilla #14936, commit d21cd8f1 triggered a BUG
      in the function ext4_da_update_reserve_space() found in
      fs/ext4/inode.c.  The root cause of this BUG() was caused by the fact
      that ext4_calc_metadata_amount() can severely over-estimate how many
      metadata blocks will be needed, especially when using direct
      block-mapped files.
      
      In addition, it can also badly *under* estimate how much space is
      needed, since ext4_calc_metadata_amount() assumes that the blocks are
      contiguous, and this is not always true.  If the application is
      writing blocks to a sparse file, the number of metadata blocks
      necessary can be severly underestimated by the functions
      ext4_da_reserve_space(), ext4_da_update_reserve_space() and
      ext4_da_release_space().  This was the cause of the dq_claim_space
      reports found on kerneloops.org.
      
      Unfortunately, doing this right means that we need to massively
      over-estimate the amount of free space needed.  So in some cases we
      may need to force the inode to be written to disk asynchronously in
      to avoid spurious quota failures.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=14936Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      beb62f7a
    • Aneesh Kumar K.V's avatar
      ext4: Ensure zeroout blocks have no dirty metadata · 13a4fbba
      Aneesh Kumar K.V authored
      commit 515f41c3 upstream (as of v2.6.33-rc3)
      
      This fixes a bug (found by Curt Wohlgemuth) in which new blocks
      returned from an extent created with ext4_ext_zeroout() can have dirty
      metadata still associated with them.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      13a4fbba
    • Richard Kennedy's avatar
      ext4: return correct wbc.nr_to_write in ext4_da_writepages · fe018ae1
      Richard Kennedy authored
      commit 2faf2e19 upstream (as of v2.6.33-rc3)
      
      When ext4_da_writepages increases the nr_to_write in writeback_control
      then it must always re-base the return value.  Originally there was a
      (misguided) attempt prevent wbc.nr_to_write from going negative.  In
      fact, it's necessary to allow nr_to_write to be negative so that
      wb_writeback() can correctly calculate how many pages were actually
      written.
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fe018ae1
    • Julia Lawall's avatar
      ext4: Eliminate potential double free on error path · b387026e
      Julia Lawall authored
      commit d3533d72 upstream (as of v2.6.33-rc3)
      
      b_entry_name and buffer are initially NULL, are initialized within a loop
      to the result of calling kmalloc, and are freed at the bottom of this loop.
      The loop contains gotos to cleanup, which also frees b_entry_name and
      buffer.  Some of these gotos are before the reinitializations of
      b_entry_name and buffer.  To maintain the invariant that b_entry_name and
      buffer are NULL at the top of the loop, and thus acceptable arguments to
      kfree, these variables are now set to NULL after the kfrees.
      
      This seems to be the simplest solution.  A more complicated solution
      would be to introduce more labels in the error handling code at the end of
      the function.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier E;
      expression E1;
      iterator I;
      statement S;
      @@
      
      *kfree(E);
      ... when != E = E1
          when != I(E,...) S
          when != &E
      *kfree(E);
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b387026e
    • Theodore Ts'o's avatar
      ext4, jbd2: Add barriers for file systems with exernal journals · 7589529d
      Theodore Ts'o authored
      commit cc3e1bea upstream (as of v2.6.33-rc3)
      
      This is a bit complicated because we are trying to optimize when we
      send barriers to the fs data disk.  We could just throw in an extra
      barrier to the data disk whenever we send a barrier to the journal
      disk, but that's not always strictly necessary.
      
      We only need to send a barrier during a commit when there are data
      blocks which are must be written out due to an inode written in
      ordered mode, or if fsync() depends on the commit to force data blocks
      to disk.  Finally, before we drop transactions from the beginning of
      the journal during a checkpoint operation, we need to guarantee that
      any blocks that were flushed out to the data disk are firmly on the
      rust platter before we drop the transaction from the journal.
      
      Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      7589529d
    • Surbhi Palande's avatar
      ext4: replace BUG() with return -EIO in ext4_ext_get_blocks · b235a77c
      Surbhi Palande authored
      commit 034fb4c9 upstream (as of v2.6.33-rc3)
      
      This patch fixes the Kernel BZ #14286.  When the address of an extent
      corresponding to a valid block is corrupted, a -EIO should be reported
      instead of a BUG().  This situation should not normally not occur
      except in the case of a corrupted filesystem.  If however it does,
      then the system should not panic directly but depending on the mount
      time options appropriate action should be taken. If the mount options
      so permit, the I/O should be gracefully aborted by returning a -EIO.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=14286Signed-off-by: default avatarSurbhi Palande <surbhi.palande@canonical.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b235a77c
    • Dmitry Monakhov's avatar
      ext4: Fix potential quota deadlock · e1d532a2
      Dmitry Monakhov authored
      commit d21cd8f1 upstream (as of v2.6.33-rc2)
      
      We have to delay vfs_dq_claim_space() until allocation context destruction.
      Currently we have following call-trace:
      ext4_mb_new_blocks()
        /* task is already holding ac->alloc_semp */
       ->ext4_mb_mark_diskspace_used
          ->vfs_dq_claim_space()  /*  acquire dqptr_sem here. Possible deadlock */
       ->ext4_mb_release_context() /* drop ac->alloc_semp here */
      
      Let's move quota claiming to ext4_da_update_reserve_space()
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.32-rc7 #18
       -------------------------------------------------------
       write-truncate-/3465 is trying to acquire lock:
        (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
      
       but task is already holding lock:
        (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
              [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
              [<c029c9d3>] ext4_free_blocks+0x73/0x130
              [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
              [<c02a8087>] ext4_truncate+0x187/0x5e0
              [<c01e0f7b>] vmtruncate+0x6b/0x70
              [<c022ec02>] inode_setattr+0x62/0x190
              [<c02a2d7a>] ext4_setattr+0x25a/0x370
              [<c022ee81>] notify_change+0x151/0x340
              [<c021349d>] do_truncate+0x6d/0xa0
              [<c0221034>] may_open+0x1d4/0x200
              [<c022412b>] do_filp_open+0x1eb/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #2 (&ei->i_data_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02a5787>] ext4_get_blocks+0x47/0x450
              [<c02a74c1>] ext4_getblk+0x61/0x1d0
              [<c02a7a7f>] ext4_bread+0x1f/0xa0
              [<c02bcddc>] ext4_quota_write+0x12c/0x310
              [<c0262d23>] qtree_write_dquot+0x93/0x120
              [<c0261708>] v2_write_dquot+0x28/0x30
              [<c025d3fb>] dquot_commit+0xab/0xf0
              [<c02be977>] ext4_write_dquot+0x77/0x90
              [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
              [<c025e321>] dquot_alloc_inode+0x101/0x180
              [<c029fec2>] ext4_new_inode+0x602/0xf00
              [<c02ad789>] ext4_create+0x89/0x150
              [<c0221ff2>] vfs_create+0xa2/0xc0
              [<c02246e7>] do_filp_open+0x7a7/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0526505>] mutex_lock_nested+0x65/0x2d0
              [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
              [<c02610af>] vfs_quota_on_path+0x5f/0x70
              [<c02bc812>] ext4_quota_on+0x112/0x190
              [<c026345a>] sys_quotactl+0x44a/0x8a0
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #0 (&s->s_dquot.dqptr_sem){++++..}:
              [<c017d361>] __lock_acquire+0x1091/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c025e73b>] dquot_claim_space+0x3b/0x1b0
              [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
              [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
              [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
              [<c02a5966>] ext4_get_blocks+0x226/0x450
              [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
              [<c02a6ed6>] ext4_da_writepages+0x506/0x790
              [<c01de272>] do_writepages+0x22/0x50
              [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
              [<c01d7b9b>] filemap_flush+0x2b/0x30
              [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
              [<c029e595>] ext4_release_file+0x75/0xb0
              [<c0216b59>] __fput+0xf9/0x210
              [<c0216c97>] fput+0x27/0x30
              [<c02122dc>] filp_close+0x4c/0x80
              [<c014510e>] put_files_struct+0x6e/0xd0
              [<c01451b7>] exit_files+0x47/0x60
              [<c0146a24>] do_exit+0x144/0x710
              [<c0147028>] do_group_exit+0x38/0xa0
              [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
              [<c0102849>] do_notify_resume+0xb9/0x890
              [<c01032d2>] work_notifysig+0x13/0x21
      
       other info that might help us debug this:
      
       3 locks held by write-truncate-/3465:
        #0:  (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
        #1:  (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
        #2:  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       stack backtrace:
       Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
       Call Trace:
        [<c0524cb3>] ? printk+0x1d/0x22
        [<c017ac9a>] print_circular_bug+0xca/0xd0
        [<c017d361>] __lock_acquire+0x1091/0x1260
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c0527191>] down_read+0x51/0x90
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c025e73b>] dquot_claim_space+0x3b/0x1b0
        [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
        [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
        [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
        [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c052712c>] ? down_write+0x8c/0xa0
        [<c02a5966>] ext4_get_blocks+0x226/0x450
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c017908b>] ? trace_hardirqs_off+0xb/0x10
        [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
        [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
        [<c01d6860>] ? find_get_pages_tag+0x0/0x180
        [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
        [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
        [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
        [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
        [<c02a6ed6>] ext4_da_writepages+0x506/0x790
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
        [<c01de272>] do_writepages+0x22/0x50
        [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
        [<c01d7b9b>] filemap_flush+0x2b/0x30
        [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
        [<c029e595>] ext4_release_file+0x75/0xb0
        [<c0216b59>] __fput+0xf9/0x210
        [<c0216c97>] fput+0x27/0x30
        [<c02122dc>] filp_close+0x4c/0x80
        [<c014510e>] put_files_struct+0x6e/0xd0
        [<c01451b7>] exit_files+0x47/0x60
        [<c0146a24>] do_exit+0x144/0x710
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0528137>] ? _spin_unlock_irq+0x27/0x30
        [<c0147028>] do_group_exit+0x38/0xa0
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
        [<c0102849>] do_notify_resume+0xb9/0x890
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0165b50>] ? autoremove_wake_function+0x0/0x50
        [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0300ba4>] ? security_file_permission+0x14/0x20
        [<c0215761>] ? vfs_write+0x131/0x190
        [<c0214f50>] ? do_sync_write+0x0/0x120
        [<c0103115>] ? sysenter_do_call+0x27/0x32
        [<c01032d2>] work_notifysig+0x13/0x21
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e1d532a2
    • Ben Hutchings's avatar
      ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFH · 2441cdd9
      Ben Hutchings authored
      commit bf988435 upstream.
      
      struct ethtool_rxnfc was originally defined in 2.6.27 for the
      ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
      fields.  It was then extended in 2.6.30 to support various additional
      commands.  These commands should have been defined to use a new
      structure, but it is too late to change that now.
      
      Since user-space may still be using the old structure definition
      for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
      additional fields, only copy the originally defined fields to and
      from user-space.
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2441cdd9
    • Corey Minyard's avatar
      USB: FTDI: Add support for the RT System VX-7 radio programming cable · 699be799
      Corey Minyard authored
      commit fcc6cb78 upstream.
      
      RT Systems has put out bunch of ham radio cables based on the FT232RL
      chip.  Each cable type has a unique PID, this adds one for the Yaesu VX-7
      radios.
      Signed-off-by: default avatarCorey Minyard <minyard@acm.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      699be799