1. 26 May, 2011 1 commit
  2. 25 May, 2011 9 commits
    • Ding Dinghua's avatar
      jbd2: fix a potential leak of a journal_head on an error path · 3991b400
      Ding Dinghua authored
      drop jh->b_jcount in error path
      Signed-off-by: default avatarDing Dinghua <dingdinghua@nrchpc.ac.cn>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      3991b400
    • Yongqiang Yang's avatar
      ext4: teach ext4_ext_split to calculate extents efficiently · 1b16da77
      Yongqiang Yang authored
      Make ext4_ext_split() get extents to be moved by calculating in a statement
      instead of counting in a loop.
      Signed-off-by: default avatarYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1b16da77
    • Jan Kara's avatar
      ext4: Convert ext4 to new truncate calling convention · ae24f28d
      Jan Kara authored
      
      Trivial conversion.  Fixup one error handling case calling vmtruncate()
      and remove ->truncate callback. We also fix a bug that IS_IMMUTABLE and
      IS_APPEND files could not be truncated during failed writes. In fact, the
      test can be completely removed as upper layers do necessary permission
      checks for truncate in do_sys_[f]truncate() and may_open() anyway.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ae24f28d
    • Vivek Haldar's avatar
      ext4: do not normalize block requests from fallocate() · 556b27ab
      Vivek Haldar authored
      Currently, an fallocate request of size slightly larger than a power of
      2 is turned into two block requests, each a power of 2, with the extra
      blocks pre-allocated for future use. When an application calls
      fallocate, it already has an idea about how large the file may grow so
      there is usually little benefit to reserve extra blocks on the
      preallocation list. This reduces disk fragmentation.
      
      Tested: fsstress. Also verified manually that fallocat'ed files are
      contiguously laid out with this change (whereas without it they begin at
      power-of-2 boundaries, leaving blocks in between). CPU usage of
      fallocate is not appreciably higher.  In a tight fallocate loop, CPU
      usage hovers between 5%-8% with this change, and 5%-7% without it.
      
      Using a simulated file system aging program which the file system to
      70%, the percentage of free extents larger than 8MB (as measured by
      e2freefrag) increased from 38.8% without this change, to 69.4% with
      this change.
      Signed-off-by: default avatarVivek Haldar <haldar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      556b27ab
    • Allison Henderson's avatar
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson authored
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: default avatarAllison Henderson <achender@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • Allison Henderson's avatar
      ext4: add "punch hole" flag to ext4_map_blocks() · e861304b
      Allison Henderson authored
      This patch adds a new flag to ext4_map_blocks() that specifies the
      given range of blocks should be punched out.  Extents are first
      converted to uninitialized extents before they are punched
      out. Because punching a hole may require that the extent be split, it
      is possible that the splitting may need more blocks than are
      available.  To deal with this, use of reserved blocks are enabled to
      allow the split to proceed.
      
      The routine then returns the number of blocks successfully
      punched out.
      
      [ext4 punch hole patch series 4/5 v7]
      Signed-off-by: default avatarAllison Henderson <achender@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarMingming Cao <cmm@us.ibm.com>
      e861304b
    • Allison Henderson's avatar
      ext4: punch out extents · d583fb87
      Allison Henderson authored
      This patch modifies the truncate routines to support hole punching
      Below is a brief summary of the patches changes:
      
      - Added end param to ext_ext4_rm_leaf
              This function has been modified to accept an end parameter
              which enables it to punch holes in leafs instead of just
              truncating them.
      
      - Implemented the "remove head" case in the ext_remove_blocks routine
              This routine is used by ext_ext4_rm_leaf to remove the tail
              of an extent during a truncate.  The new ext_ext4_rm_leaf
              routine will now also use it to remove the head of an extent in the
              case that the hole covers a region of blocks at the beginning
              of an extent.
      
      - Added "end" param to ext4_ext_remove_space routine
              This function has been modified to accept a stop parameter, which
              is passed through to ext4_ext_rm_leaf.
      
      [ext4 punch hole patch series 3/5 v6] 
      Signed-off-by: default avatarAllison Henderson <achender@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      d583fb87
    • Allison Henderson's avatar
      ext4: add new function ext4_block_zero_page_range() · 30848851
      Allison Henderson authored
      This patch modifies the existing ext4_block_truncate_page() function
      which was used by the truncate code path, and which zeroes out block
      unaligned data, by adding a new length parameter, and renames it to
      ext4_block_zero_page_rage().  This function can now be used to zero out the
      head of a block, the tail of a block, or the middle
      of a block.
      
      The ext4_block_truncate_page() function is now a wrapper to
      ext4_block_zero_page_range().
      
      [ext4 punch hole patch series 2/5 v7] 
      Signed-off-by: default avatarAllison Henderson <achender@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarMingming Cao <cmm@us.ibm.com>
      30848851
    • Allison Henderson's avatar
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson authored
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: default avatarAllison Henderson <achender@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarMingming Cao <cmm@us.ibm.com>
      55f020db
  3. 24 May, 2011 10 commits
    • Aditya Kali's avatar
      ext4: reserve inodes and feature code for 'quota' feature · ae812306
      Aditya Kali authored
      I am working on patch to add quota as a built-in feature for ext4
      filesystem. The implementation is based on the design given at
      https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4.
      This patch reserves the inode numbers 3 and 4 for quota purposes and
      also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code.
      Signed-off-by: default avatarAditya Kali <adityakali@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ae812306
    • Johann Lombardi's avatar
      ext4: add support for multiple mount protection · c5e06d10
      Johann Lombardi authored
      Prevent an ext4 filesystem from being mounted multiple times.
      A sequence number is stored on disk and is periodically updated (every 5
      seconds by default) by a mounted filesystem.
      At mount time, we now wait for s_mmp_update_interval seconds to make sure
      that the MMP sequence does not change.
      In case of failure, the nodename, bdevname and the time at which the MMP
      block was last updated is displayed.
      Signed-off-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Signed-off-by: default avatarJohann Lombardi <johann@whamcloud.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c5e06d10
    • Kazuya Mio's avatar
      ext4: ensure f_bfree returned by ext4_statfs() is non-negative · d02a9391
      Kazuya Mio authored
      I found the issue that the number of free blocks went negative.
      # stat -f /mnt/mp1/
        File: "/mnt/mp1/"
          ID: e175ccb83a872efe Namelen: 255     Type: ext2/ext3
      Block size: 4096       Fundamental block size: 4096
      Blocks: Total: 258022     Free: -15        Available: -13122
      Inodes: Total: 65536      Free: 63029
      
      f_bfree in struct statfs will go negative when the filesystem has
      few free blocks. Because the number of dirty blocks is bigger than
      the number of free blocks in the following two cases.
      
      CASE 1:
      ext4_da_writepages
        mpage_da_map_and_submit
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
              <--- interrupt statfs systemcall --->
              ext4_da_update_reserve_space
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter,
                                  used + ei->i_allocated_meta_blocks);
      
      CASE 2:
      ext4_write_begin
        __block_write_begin
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
                  <--- interrupt statfs systemcall --->
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
      
      To avoid the issue, this patch ensures that f_bfree is non-negative.
      Signed-off-by: default avatarKazuya Mio <k-mio@sx.jp.nec.com>
      d02a9391
    • Lukas Czerner's avatar
      ext4: protect bb_first_free in ext4_trim_all_free() with group lock · 28739eea
      Lukas Czerner authored
      We should protect reading bd_info->bb_first_free with the group lock
      because otherwise we might miss some free blocks. This is not a big deal
      at all, but the change to do right thing is really simple, so lets do
      that.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      28739eea
    • Lukas Czerner's avatar
      ext4: only load buddy bitmap in ext4_trim_fs() when it is needed · 78944086
      Lukas Czerner authored
      Currently we are loading buddy ext4_mb_load_buddy() for every block
      group we are going through in ext4_trim_fs() in many cases just to find
      out that there is not enough space to be bothered with. As Amir Goldstein
      suggested we can use bb_free information directly from ext4_group_info.
      
      This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather
      get the ext4_group_info via ext4_get_group_info() and use the bb_free
      information directly from that. This avoids unnecessary call to load
      buddy in the case the group does not have enough free space to trim.
      Loading buddy is now moved to ext4_trim_all_free().
      
      Tested by me with xfstests 251.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      78944086
    • Eryu Guan's avatar
      jbd2: Fix comment to match the code in jbd2__journal_start() · c867516d
      Eryu Guan authored
      jbd2__journal_start() returns an ERR_PTR() value rather than NULL on
      failure.
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c867516d
    • Jan Kara's avatar
      ext4: fix waiting and sending of a barrier in ext4_sync_file() · 93628ffb
      Jan Kara authored
      jbd2_log_start_commit() returns 1 only when we really start a
      transaction.  But we also need to wait for a transaction when the
      commit is already running.  Fix this problem by waiting for
      transaction commit unconditionally (which is just a quick check if the
      transaction is already committed).
      
      Also we have to be more careful with sending of a barrier because when
      transaction is being committed in parallel to ext4_sync_file()
      running, we cannot be sure that the barrier the journalling code sends
      happens after we wrote all the data for fsync (note that not every
      data writeout needs to trigger metadata changes thus commit of some
      metadata changes can be running while other data is still written
      out). So use jbd2_will_send_data_barrier() helper to detect the common
      cases when we can be sure barrier will be issued by the commit code
      and issue the barrier ourselves in the remaining cases.
      Reported-by: default avatarEdward Goggin <egoggin@vmware.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      93628ffb
    • Jan Kara's avatar
      jbd2: Add function jbd2_trans_will_send_data_barrier() · bbd2be36
      Jan Kara authored
      Provide a function which returns whether a transaction with given tid
      will send a flush to the filesystem device.  The function will be used
      by ext4 to detect whether fsync needs to send a separate flush or not.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      bbd2be36
    • Jan Kara's avatar
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara authored
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
    • Yongqiang Yang's avatar
      ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly · b221349f
      Yongqiang Yang authored
      To get delayed-extent information, ext4_ext_fiemap_cb() looks up
      pagecache, it thus collects information starting from a page's
      head block.
      
      If blocksize < pagesize, the beginning blocks of a page may lies
      before the request range. So ext4_ext_fiemap_cb() should proceed
      ignoring them, because they has been handled before. If no mapped
      buffer in the range is found in the 1st page, we need to look up
      the 2nd page, otherwise delayed-extents after a hole will be ignored.
      
      Without this patch, xfstests 225 will hung on ext4 with 1K block.
      Reported-by: default avatarAmir Goldstein <amir73il@users.sourceforge.net>
      Signed-off-by: default avatarYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b221349f
  4. 23 May, 2011 5 commits
  5. 22 May, 2011 1 commit
    • Theodore Ts'o's avatar
      ext4: don't show mount options in /proc/mounts if there is no journal · 373cd5c5
      Theodore Ts'o authored
      After creating an ext4 file system without a journal:
      
        # mke2fs -t ext4 -O ^has_journal /dev/sda
        # mount -t ext4 /dev/sda /test
      
      the /proc/mounts will show:
      "/dev/sda /test ext4 rw,relatime,user_xattr,acl,barrier=1,data=writeback 0 0"
      which can fool users into thinking that the fs is using writeback mode.
      
      So don't set the writeback option when the journal has not been
      enabled; we don't depend on the writeback option being set, since
      ext4_should_writeback_data() in ext4_jbd2.h tests to see if the
      journal is not present before returning true.
      Reported-by: default avatarRobin Dong <sanbai@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      
      373cd5c5
  6. 20 May, 2011 4 commits
  7. 18 May, 2011 3 commits
  8. 16 May, 2011 2 commits
  9. 15 May, 2011 1 commit
  10. 10 May, 2011 4 commits