An error occurred fetching the project authors.
  1. 01 Oct, 2018 1 commit
    • Eric Whitney's avatar
      ext4: generalize extents status tree search functions · ad431025
      Eric Whitney authored
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ad431025
  2. 21 May, 2018 1 commit
  3. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  4. 28 Feb, 2017 1 commit
  5. 27 Apr, 2016 1 commit
  6. 10 Mar, 2016 1 commit
  7. 23 Sep, 2015 1 commit
  8. 03 May, 2015 1 commit
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d
      Lukas Czerner authored
      Currently it is possible to lose whole file system block worth of data
      when we hit the specific interaction with unwritten and delayed extents
      in status extent tree.
      
      The problem is that when we insert delayed extent into extent status
      tree the only way to get rid of it is when we write out delayed buffer.
      However there is a limitation in the extent status tree implementation
      so that when inserting unwritten extent should there be even a single
      delayed block the whole unwritten extent would be marked as delayed.
      
      At this point, there is no way to get rid of the delayed extents,
      because there are no delayed buffers to write out. So when a we write
      into said unwritten extent we will convert it to written, but it still
      remains delayed.
      
      When we try to write into that block later ext4_da_map_blocks() will set
      the buffer new and delayed and map it to invalid block which causes
      the rest of the block to be zeroed loosing already written data.
      
      For now we can fix this by simply not allowing to set delayed status on
      written extent in the extent status tree. Also add WARN_ON() to make
      sure that we notice if this happens in the future.
      
      This problem can be easily reproduced by running the following xfs_io.
      
      xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
                -c "falloc 0 131072" \
                -c "pwrite -S 0xbb 65536 2048" \
                -c "fsync" /mnt/test/fff
      
      echo 3 > /proc/sys/vm/drop_caches
      xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
      
      This can be theoretically also reproduced by at random by running fsx,
      but it's not very reliable, though on machines with bigger page size
      (like ppc) this can be seen more often (especially xfstest generic/127)
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d2dc317d
  9. 03 Apr, 2015 1 commit
  10. 25 Nov, 2014 5 commits
    • Jan Kara's avatar
      ext4: introduce aging to extent status tree · 2be12de9
      Jan Kara authored
      Introduce a simple aging to extent status tree. Each extent has a
      REFERENCED bit which gets set when the extent is used. Shrinker then
      skips entries with referenced bit set and clears the bit. Thus
      frequently used extents have higher chances of staying in memory.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      2be12de9
    • Jan Kara's avatar
      ext4: cleanup flag definitions for extent status tree · 624d0f1d
      Jan Kara authored
      Currently flags for extent status tree are defined twice, once shifted
      and once without a being shifted. Consolidate these definitions into one
      place and make some computations automatic to make adding flags less
      error prone. Compiler should be clever enough to figure out these are
      constants and generate the same code.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      624d0f1d
    • Jan Kara's avatar
      ext4: limit number of scanned extents in status tree shrinker · dd475925
      Jan Kara authored
      Currently we scan extent status trees of inodes until we reclaim nr_to_scan
      extents. This can however require a lot of scanning when there are lots
      of delayed extents (as those cannot be reclaimed).
      
      Change shrinker to work as shrinkers are supposed to and *scan* only
      nr_to_scan extents regardless of how many extents did we actually
      reclaim. We however need to be careful and avoid scanning each status
      tree from the beginning - that could lead to a situation where we would
      not be able to reclaim anything at all when first nr_to_scan extents in
      the tree are always unreclaimable. We remember with each inode offset
      where we stopped scanning and continue from there when we next come
      across the inode.
      
      Note that we also need to update places calling __es_shrink() manually
      to pass reasonable nr_to_scan to have a chance of reclaiming anything and
      not just 1.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      dd475925
    • Jan Kara's avatar
      ext4: move handling of list of shrinkable inodes into extent status code · b0dea4c1
      Jan Kara authored
      Currently callers adding extents to extent status tree were responsible
      for adding the inode to the list of inodes with freeable extents. This
      is error prone and puts list handling in unnecessarily many places.
      
      Just add inode to the list automatically when the first non-delay extent
      is added to the tree and remove inode from the list when the last
      non-delay extent is removed.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b0dea4c1
    • Zheng Liu's avatar
      ext4: change LRU to round-robin in extent status tree shrinker · edaa53ca
      Zheng Liu authored
      In this commit we discard the lru algorithm for inodes with extent
      status tree because it takes significant effort to maintain a lru list
      in extent status tree shrinker and the shrinker can take a long time to
      scan this lru list in order to reclaim some objects.
      
      We replace the lru ordering with a simple round-robin.  After that we
      never need to keep a lru list.  That means that the list needn't be
      sorted if the shrinker can not reclaim any objects in the first round.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      edaa53ca
  11. 02 Sep, 2014 2 commits
    • Zheng Liu's avatar
      ext4: track extent status tree shrinker delay statictics · eb68d0e2
      Zheng Liu authored
      This commit adds some statictics in extent status tree shrinker.  The
      purpose to add these is that we want to collect more details when we
      encounter a stall caused by extent status tree shrinker.  Here we count
      the following statictics:
        stats:
          the number of all objects on all extent status trees
          the number of reclaimable objects on lru list
          cache hits/misses
          the last sorted interval
          the number of inodes on lru list
        average:
          scan time for shrinking some objects
          the number of shrunk objects
        maximum:
          the inode that has max nr. of objects on lru list
          the maximum scan time for shrinking some objects
      
      The output looks like below:
        $ cat /proc/fs/ext4/sda1/es_shrinker_info
        stats:
          28228 objects
          6341 reclaimable objects
          5281/631 cache hits/misses
          586 ms last sorted interval
          250 inodes on lru list
        average:
          153 us scan time
          128 shrunk objects
        maximum:
          255 inode (255 objects, 198 reclaimable)
          125723 us max scan time
      
      If the lru list has never been sorted, the following line will not be
      printed:
          586ms last sorted interval
      If there is an empty lru list, the following lines also will not be
      printed:
          250 inodes on lru list
        ...
        maximum:
          255 inode (255 objects, 198 reclaimable)
          0 us max scan time
      
      Meanwhile in this commit a new trace point is defined to print some
      details in __ext4_es_shrink().
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      eb68d0e2
    • Zheng Liu's avatar
      ext4: improve extents status tree trace point · e963bb1d
      Zheng Liu authored
      This commit improves the trace point of extents status tree.  We rename
      trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
      in ext4_es_scan() and we can not identify them from the result.
      
      Further this commit fixes a variable name in trace point in order to
      keep consistency with others.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e963bb1d
  12. 01 Sep, 2014 2 commits
  13. 12 Jul, 2014 1 commit
    • Theodore Ts'o's avatar
      ext4: fix a potential deadlock in __ext4_es_shrink() · 3f1f9b85
      Theodore Ts'o authored
      This fixes the following lockdep complaint:
      
      [ INFO: possible circular locking dependency detected ]
      3.16.0-rc2-mm1+ #7 Tainted: G           O  
      -------------------------------------------------------
      kworker/u24:0/4356 is trying to acquire lock:
       (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0
      
      but task is already holding lock:
       (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180
      
      which lock already depends on the new lock.
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&ei->i_es_lock);
                                     lock(&(&sbi->s_es_lru_lock)->rlock);
                                     lock(&ei->i_es_lock);
        lock(&(&sbi->s_es_lru_lock)->rlock);
      
       *** DEADLOCK ***
      
      6 locks held by kworker/u24:0/4356:
       #0:  ("writeback"){.+.+.+}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
       #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
       #2:  (&type->s_umount_key#22){++++++}, at: [<ffffffff811a9c74>] grab_super_passive+0x44/0x90
       #3:  (jbd2_handle){+.+...}, at: [<ffffffff812979f9>] start_this_handle+0x189/0x5f0
       #4:  (&ei->i_data_sem){++++..}, at: [<ffffffff81247062>] ext4_map_blocks+0x132/0x550
       #5:  (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180
      
      stack backtrace:
      CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G           O   3.16.0-rc2-mm1+ #7
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Workqueue: writeback bdi_writeback_workfn (flush-253:0)
       ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007
       ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568
       ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930
      Call Trace:
       [<ffffffff815df0bb>] dump_stack+0x4e/0x68
       [<ffffffff815db3dd>] print_circular_bug+0x1fb/0x20c
       [<ffffffff810a7a3e>] __lock_acquire+0x163e/0x1d00
       [<ffffffff815e89dc>] ? retint_restore_args+0xe/0xe
       [<ffffffff815ddc7b>] ? __slab_alloc+0x4a8/0x4ce
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff810a8707>] lock_acquire+0x87/0x120
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff8128592d>] ? ext4_es_free_extent+0x5d/0x70
       [<ffffffff815e6f09>] _raw_spin_lock+0x39/0x50
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff8119760b>] ? kmem_cache_alloc+0x18b/0x1a0
       [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff812869b8>] ext4_es_insert_extent+0xc8/0x180
       [<ffffffff812470f4>] ext4_map_blocks+0x1c4/0x550
       [<ffffffff8124c4c4>] ext4_writepages+0x6d4/0xd00
      	...
      Reported-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reported-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: stable@vger.kernel.org
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      3f1f9b85
  14. 13 May, 2014 1 commit
  15. 21 Apr, 2014 1 commit
    • Lukas Czerner's avatar
      ext4: rename uninitialized extents to unwritten · 556615dc
      Lukas Czerner authored
      Currently in ext4 there is quite a mess when it comes to naming
      unwritten extents. Sometimes we call it uninitialized and sometimes we
      refer to it as unwritten.
      
      The right name for the extent which has been allocated but does not
      contain any written data is _unwritten_. Other file systems are
      using this name consistently, even the buffer head state refers to it as
      unwritten. We need to fix this confusion in ext4.
      
      This commit changes every reference to an uninitialized extent (meaning
      allocated but unwritten) to unwritten extent. This includes comments,
      function names and variable names. It even covers abbreviation of the
      word uninitialized (such as uninit) and some misspellings.
      
      This commit does not change any of the code paths at all. This has been
      confirmed by comparing md5sums of the assembly code of each object file
      after all the function names were stripped from it.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      556615dc
  16. 07 Apr, 2014 1 commit
  17. 20 Feb, 2014 2 commits
  18. 10 Sep, 2013 1 commit
    • Dave Chinner's avatar
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner authored
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
  19. 28 Aug, 2013 1 commit
    • Zheng Liu's avatar
      ext4: isolate ext4_extents.h file · d7b2a00c
      Zheng Liu authored
      After applied the commit (4a092d73), we have reduced the number of
      source files that need to #include ext4_extents.h.  But we can do
      better.
      
      This commit defines ext4_zeroout_es() in extents.c and move
      EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
      indirect.c and ioctl.c.  Meanwhile we just need to include this file in
      extent_status.c when ES_AGGRESSIVE_TEST is defined.  Otherwise, this
      commit removes a duplicated declaration in trace/events/ext4.h.
      
      After applied this patch, we just need to include ext4_extents.h file
      in {super,migrate,move_extents,extents}.c, and it is easy for us to
      define a new extent disk layout.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      d7b2a00c
  20. 17 Aug, 2013 3 commits
    • Theodore Ts'o's avatar
      ext4: add support for extent pre-caching · 7869a4a6
      Theodore Ts'o authored
      Add a new fiemap flag which forces the all of the extents in an inode
      to be cached in the extent_status tree.  This is critically important
      when using AIO to a preallocated file, since if we need to read in
      blocks from the extent tree, the io_submit(2) system call becomes
      synchronous, and the AIO is no longer "A", which is bad.
      
      In addition, for most files which have an external leaf tree block,
      the cost of caching the information in the extent status tree will be
      less than caching the entire 4k block in the buffer cache.  So it is
      generally a win to keep the extent information cached.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7869a4a6
    • Theodore Ts'o's avatar
      ext4: cache all of an extent tree's leaf block upon reading · 107a7bd3
      Theodore Ts'o authored
      When we read in an extent tree leaf block from disk, arrange to have
      all of its entries cached.  In nearly all cases the in-memory
      representation will be more compact than the on-disk representation in
      the buffer cache, and it allows us to get the information without
      having to traverse the extent tree for successive extents.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      107a7bd3
    • Theodore Ts'o's avatar
      ext4: use unsigned int for es_status values · 3be78c73
      Theodore Ts'o authored
      Don't use an unsigned long long for the es_status flags; this requires
      that we pass 64-bit values around which is painful on 32-bit systems.
      Instead pass the extent status flags around using the low 4 bits of an
      unsigned int, and shift them into place when we are reading or writing
      es_pblk.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      3be78c73
  21. 15 Jul, 2013 1 commit
    • Theodore Ts'o's avatar
      ext4: make the extent_status code more robust against ENOMEM failures · e15f742c
      Theodore Ts'o authored
      Some callers of ext4_es_remove_extent() and ext4_es_insert_extent()
      may not be completely robust against ENOMEM failures (or the
      consequences of reflecting ENOMEM back up to userspace may lead to
      xfstest or user application failure).
      
      To mitigate against this, when trying to insert an entry in the extent
      status tree, try to shrink the inode's extent status tree before
      returning ENOMEM.  If there are entries which don't record information
      about extents under delayed allocations, freeing one of them is
      preferable to returning ENOMEM.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      e15f742c
  22. 13 Jul, 2013 1 commit
  23. 01 Jul, 2013 1 commit
    • Zheng Liu's avatar
      ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77
      Zheng Liu authored
      Now we maintain an proper in-order LRU list in ext4 to reclaim entries
      from extent status tree when we are under heavy memory pressure.  For
      keeping this order, a spin lock is used to protect this list.  But this
      lock burns a lot of CPU time.  We can use the following steps to trigger
      it.
      
        % cd /dev/shm
        % dd if=/dev/zero of=ext4-img bs=1M count=2k
        % mkfs.ext4 ext4-img
        % mount -t ext4 -o loop ext4-img /mnt
        % cd /mnt
        % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
        % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
        % perf record -a -g
        % perf report
      
      This commit tries to fix this problem.  Now a new member called
      i_touch_when is added into ext4_inode_info to record the last access
      time for an inode.  Meanwhile we never need to keep a proper in-order
      LRU list.  So this can avoid to burns some CPU time.  When we try to
      reclaim some entries from extent status tree, we use list_sort() to get
      a proper in-order list.  Then we traverse this list to discard some
      entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
      time of sorting this list.  When we traverse the list, we skip the inode
      that is newer than this time, and move this inode to the tail of LRU
      list.  When the head of the list is newer than s_es_last_sorted, we will
      sort the LRU list again.
      
      In this commit, we break the loop if s_extent_cache_cnt == 0 because
      that means that all extents in extent status tree have been reclaimed.
      
      Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
      changed to save a local variable in these functions.
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      d3922a77
  24. 03 May, 2013 1 commit
    • Yan, Zheng's avatar
      ext4: fix fio regression · e30b5dca
      Yan, Zheng authored
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: default avatarLKP project <lkp@linux.intel.com>
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  25. 11 Mar, 2013 3 commits
  26. 02 Mar, 2013 1 commit
  27. 01 Mar, 2013 1 commit
    • Theodore Ts'o's avatar
      ext4: optimize ext4_es_shrink() · 24630774
      Theodore Ts'o authored
      When the system is under memory pressure, ext4_es_srhink() will get
      called very often.  So optimize returning the number of items in the
      file system's extent status cache by keeping a per-filesystem count,
      instead of calculating it each time by scanning all of the inodes in
      the extent status cache.
      
      Also rename the slab used for the extent status cache to be
      "ext4_extent_status" so it's obviousl the slab in question is created
      by ext4.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      24630774
  28. 22 Feb, 2013 1 commit
  29. 18 Feb, 2013 1 commit
    • Zheng Liu's avatar
      ext4: reclaim extents from extent status tree · 74cd15cd
      Zheng Liu authored
      Although extent status is loaded on-demand, we also need to reclaim
      extent from the tree when we are under a heavy memory pressure because
      in some cases fragmented extent tree causes status tree costs too much
      memory.
      
      Here we maintain a lru list in super_block.  When the extent status of
      an inode is accessed and changed, this inode will be move to the tail
      of the list.  The inode will be dropped from this list when it is
      cleared.  In the inode, a counter is added to count the number of
      cached objects in extent status tree.  Here only written/unwritten/hole
      extent is counted because delayed extent doesn't be reclaimed due to
      fiemap, bigalloc and seek_data/hole need it.  The counter will be
      increased as a new extent is allocated, and it will be decreased as a
      extent is freed.
      
      In this commit we use normal shrinker framework to reclaim memory from
      the status tree.  ext4_es_reclaim_extents_count() traverses the lru list
      to count the number of reclaimable extents.  ext4_es_shrink() tries to
      reclaim written/unwritten/hole extents from extent status tree.  The
      inode that has been shrunk is moved to the tail of lru list.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      74cd15cd