1. 05 Aug, 2015 13 commits
  2. 04 Aug, 2015 27 commits
    • Chao Yu's avatar
      f2fs: expose f2fs_write_cache_pages · 8f46dcae
      Chao Yu authored
      If there are gced dirty pages and normal dirty pages in the mapping
      of one inode, we might writeback them alternately with discontinuous
      block address, resulting in low performance.
      
      This patch introduces f2fs_write_cache_pages with codes copied from
      write_cache_pages in mm/page-writeback.c.
      
      In this function, we refactor flow with two steps:
      1) writeback all cold type pages.
      2) writeback all non-cold type pages.
      
      By using this method, f2fs will writeback dirty pages with the same
      temperature in bunch mode, it makes writeouted block being with
      more continuous address, so they can be merged as much as possible
      in f2fs bio cache, and also it will reduce the chance of submiting
      small IO from block layer.
      
      Test environment: 8g nokia sd card (very old sd card, but it shows
      better effect when testing with this patch, and with a 32g kingston
      sd card, I didn't see much more improvement).
      
      Test step:
      1. touch testfile;
      2. truncate -s 512K testfile;
      3. write all pages with odd index;
      4. trigger gc by ioctl;
      5. write all pages with even index;
      6. time fsync testfile.
      
      before:
      real	0m0.402s
      user	0m0.000s
      sys	0m0.000s
      
      after:
      real	0m0.143s
      user	0m0.004s
      sys	0m0.004s
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8f46dcae
    • Chao Yu's avatar
      f2fs: correct return value of ->setxattr · 037fe70c
      Chao Yu authored
      This patch fixes to return correct error number of ->setxattr, which
      is reported by xfstest tests/generic/026 as below:
      
      generic/026      - output mismatch
          --- tests/generic/026.out
          +++ results/generic/026.out.bad
          @@ -4,6 +4,6 @@
           1 below acl max
           acl max
           1 above acl max
          -chacl: cannot set access acl on "largeaclfile": Argument list too long
          +chacl: cannot set access acl on "largeaclfile": Numerical result out of range
           use 16 aces
           use 17 aces
          ...
      Ran: generic/026
      Failures: generic/026
      Failed 1 of 1 tests
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      037fe70c
    • Chao Yu's avatar
      f2fs: cleanup write_orphan_inodes · bd936f84
      Chao Yu authored
      Previously, since 'commit 4531929e ("f2fs: move grabing orphan
      pages out of protection region")' was committed, in write_orphan_inodes(),
      we will grab all meta page in a batch before we use them under spinlock,
      so that we can avoid large time delay of grabbing meta pages under
      spinlock.
      
      Now, 'commit d6c67a4f ("f2fs: revmove spin_lock for
      write_orphan_inodes")' remove the spinlock in write_orphan_inodes,
      so there is no issue we describe above, we'd better recover to move
      the grab operation to original place for readability.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bd936f84
    • Chao Yu's avatar
      f2fs: warm up cold page after mmaped write · 5b339124
      Chao Yu authored
      With cost-benifit method, background gc will consider old section with
      fewer valid blocks as candidate victim, these old blocks in section will
      be treated as cold data, and laterly will be moved into cold segment.
      
      But if the gcing page is attached by user through buffered or mmaped
      write, we should reset the page as non-cold one, because this page may
      have more opportunity for further updating.
      
      So fix to add clearing code for the missed 'mmap' case.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5b339124
    • Chao Yu's avatar
      f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT · c1c1b583
      Chao Yu authored
      When background gc is off, the only way to trigger gc is executing
      a force gc in some operations who wants to grab space in disk.
      
      The executing condition is limited: to execute force gc, we should
      wait for the time when there is almost no more free section for LFS
      allocation. This seems not reasonable for our user who wants to
      control triggering gc by himself.
      
      This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for
      triggering garbage collection by using ioctl. It provides our users
      one more option to trigger gc.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c1c1b583
    • Chao Yu's avatar
      f2fs: maintain extent cache in separated file · a28ef1f5
      Chao Yu authored
      This patch moves extent cache related code from data.c into extent_cache.c
      since extent cache is independent feature, and its codes are not relate to
      others in data.c, it's better for us to maintain them in separated place.
      
      There is no functionality change, but several small coding style fixes
      including:
      * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
      * rename misspelled word 'untill' to 'until';
      * remove unneeded 'return' in the end of f2fs_destroy_extent_tree().
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a28ef1f5
    • Fan Li's avatar
      f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN · 3c7df87d
      Fan Li authored
      Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
      be kept in extent cache after split, extents already shorter than
      F2FS_MIN_EXTENT_LEN don't need to try split at all.
      Signed-off-by: default avatarFan Li <fanofcode.li@samsung.com>
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3c7df87d
    • Chao Yu's avatar
      f2fs: fix to update page flag · 90d4388a
      Chao Yu authored
      This patch fixes to update page flag (e.g. Uptodate/cold flag) in
      ->write_begin.
      
      Otherwise, page will be non-uptodate when we try to write entire
      page, and cold data flag in page will not be clean when gced page
      is being rewritten.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      90d4388a
    • Jaegeuk Kim's avatar
      f2fs: shrink unreferenced extent_caches first · 7023a1ad
      Jaegeuk Kim authored
      If an extent_tree entry has a zero reference count, we can drop it from the
      cache in higher priority rather than currently referencing entries.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7023a1ad
    • Chao Yu's avatar
      f2fs: enhance multithread performance · bb96a8d5
      Chao Yu authored
      In ->writepages, we use writepages mutex lock to serialize all block
      address allocation and page submitting pairs from different inodes.
      This method makes our delayed dirty pages of one inode being written
      continously as many as possible.
      
      But there is one problem that we did not submit current cached bio in
      protection region of writepages mutex lock, so there is a small chance
      that we submit the one of other thread's as below, resulting in
      splitting more bios.
      
      thread 1			thread 2
      ->writepages
        lock(writepages)
        ->write_cache_pages
        unlock(writepages)
      				  lock(writepages)
      				  ->write_cache_pages
        ->f2fs_submit_merged_bio
      				    ->writepage
      				  unlock(writepages)
      
      fs_mark-6535  [002] ....  2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288
      fs_mark-6536  [000] ....  2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096
      fs_mark-6536  [000] ....  2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096
      fs_mark-6535  [002] ....  2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096
      
      This may really increase time of block layer works, and may cause
      larger IO lantency.
      
      This patch moves the submitting operation into region of writepages
      mutex lock to avoid bio splits when concurrently writebacking is
      intensive.
      
      my test environment: virtual machine,
      intel cpu i5 2500, 8GB size memory, 4GB size ramdisk
      
      time fs_mark  -t  16  -L  1  -s  524288  -S  1  -d  /mnt/f2fs/
      
      before:
      real	0m4.244s
      user	0m0.088s
      sys	0m12.336s
      
      after:
      real	0m3.822s
      user	0m0.072s
      sys	0m10.760s
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bb96a8d5
    • Chao Yu's avatar
      f2fs: restrict multimedia filename · 741a7bea
      Chao Yu authored
      When testing with fs_mark, some blocks were written out as cold
      data which were mixed with warm data, resulting in splitting more
      bios.
      
      This is because fs_mark will create file with random filename as
      below:
      
      559551ee~~~~~~~~15Z29OCC05JCKQP60JQ42MKV
      559551ee~~~~~~~~NZAZ6X8OA8LHIIP6XD0L58RM
      559551ef~~~~~~~~B15YDSWAK789HPSDZKYTW6WM
      559551f1~~~~~~~~2DAE5DPS79785BUNTFWBEMP3
      559551f1~~~~~~~~1MYDY0BKSQCJPI32Q8C514RM
      559551f1~~~~~~~~YQOTMAOMN5CVRFOUNI026MP4
      559551f3~~~~~~~~1WF42LPRTQJNPPGR3EINKMPE
      559551f3~~~~~~~~8Y2NRK7CEPPAA02LY936PJPG
      
      They are regarded as cold file since their filename are ended with
      multimedia files' extension, but this should be wrong as we only
      match the extension of filename, not the whole one.
      
      In this patch, we try to fix the format of multimedia filename to:
      "filename + '.' + extension", then we set cold file only its
      filename matches the format.
      
      So after this change, it will reduce the probability we set the
      wrong cold file, also it helps a little for fs_mark's performance
      on f2fs.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      741a7bea
    • Chao Yu's avatar
      MAINTAINERS: add missed trace file for f2fs · 62d43eeb
      Chao Yu authored
      This patch adds missed trace file in maintainer-ship of f2fs,
      so it completes the description of files maintained in f2fs,
      and also it allows people to find correct mailing list by using
      get_maintainer.pl when only patching the trace file of f2fs.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      62d43eeb
    • Nicholas Krause's avatar
      f2fs: make the function check_dnode have a return type of bool and change it's name to is_alive · c1079892
      Nicholas Krause authored
      This makes the function check_dnode have a return type of bool
      due to this particular function only ever returning either one
      or zero as its return value and changes the name of the function
      to is_alive in order to better explain this function's intended
      work of checking if a dnode is still in use by the filesystem.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      [Jaegeuk Kim: change the return value check for the renamed function]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c1079892
    • Jaegeuk Kim's avatar
      f2fs: check the largest extent at look-up time · 84bc926c
      Jaegeuk Kim authored
      Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
      that the largest extent would be cached in the tree all the time.
      
      Instead of relying on extent_tree, we can simply check the cached one in extent
      tree accordingly.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      84bc926c
    • Jaegeuk Kim's avatar
      f2fs: use extent_cache by default · 3e72f721
      Jaegeuk Kim authored
      We don't need to handle the duplicate extent information.
      
      The integrated rule is:
       - update on-disk extent with largest one tracked by in-memory extent_cache
       - destroy extent_tree for the truncation case
       - drop per-inode extent_cache by shrinker
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3e72f721
    • Jaegeuk Kim's avatar
      f2fs: add noextent_cache mount option · 7daaea25
      Jaegeuk Kim authored
      This patch adds noextent_cache mount option.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7daaea25
    • Jaegeuk Kim's avatar
      f2fs: shrink extent_cache entries · 554df79e
      Jaegeuk Kim authored
      This patch registers shrinking extent_caches.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      554df79e
    • Jaegeuk Kim's avatar
      f2fs: shrink nat_cache entries · 1b38dc8e
      Jaegeuk Kim authored
      This patch registers shrinking nat_cache entries.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1b38dc8e
    • Jaegeuk Kim's avatar
      f2fs: introduce a shrinker for mounted fs · 2658e50d
      Jaegeuk Kim authored
      This patch introduces a shrinker targeting to reduce memory footprint consumed
      by a number of in-memory f2fs data structures.
      
      In addition, it newly adds:
       - sbi->umount_mutex to avoid data races on shrinker and put_super
       - sbi->shruinker_run_no to not revisit objects
      
      Note that the basic implementation was copied from fs/ubifs/shrinker.c
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2658e50d
    • Jaegeuk Kim's avatar
      f2fs: set cached_en after checking finally · 244f4fc1
      Jaegeuk Kim authored
      This patch relocates cached_en not only to be covered by spin_lock, but also
      to set once after checking out completely.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      244f4fc1
    • Jaegeuk Kim's avatar
      f2fs: update on-disk extents even under extent_cache · cbe91923
      Jaegeuk Kim authored
      Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
      time, and then finally preserves its up-to-date extent into on-disk one during
      f2fs_evict_inode.
      
      But, in the following scenario:
      
      1. mount
      2. open & write an extent X
      3. f2fs_evict_inode; on-disk extent is X
      4. open & update the extent X with Y
      5. sync; trigger checkpoint
      6. power-cut
      
      after power-on, f2fs should serve extent Y, but we have an on-disk extent X.
      
      This causes a failure on xfstests/311.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cbe91923
    • Jaegeuk Kim's avatar
      f2fs: fix wrong block address calculation for a split extent · 7a2cb678
      Jaegeuk Kim authored
      This patch fixes wrong calculation on block address field when an extent is
      split.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7a2cb678
    • Jaegeuk Kim's avatar
      f2fs: convert inline_data for various fallocate · 97a7b2c2
      Jaegeuk Kim authored
      For newly added fallocate types, it should convert inline_data before handling
      block swapping.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      97a7b2c2
    • Jaegeuk Kim's avatar
      f2fs: avoid to use failed inode immediately · c9b63bd0
      Jaegeuk Kim authored
      Before iput is called, the inode number used by a bad inode can be reassigned
      to other new inode, resulting in any abnormal behaviors on the new inode.
      This should not happen for the new inode.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c9b63bd0
    • Jaegeuk Kim's avatar
      f2fs: avoid freed stat information · eca616f8
      Jaegeuk Kim authored
      The write_checkpoint can update stat information, so we should destroy the stat
      structure after it.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      eca616f8
    • Chao Yu's avatar
      f2fs: fix to record dirty page count for symlink · 5ac9f36f
      Chao Yu authored
      Dirty page can be exist in mapping of newly created symlink, but previously
      we did not maintain the counting of dirty page for symlink like we maintained
      for regular/directory, so the counting we lookuped should be wrong.
      
      This patch adds missed dirty page counting for symlink to fix this issue.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5ac9f36f
    • Markus Elfring's avatar
      f2fs crypto: delete an unnecessary check before the function call "key_put" · 92859a5e
      Markus Elfring authored
      The key_put() function tests whether its argument is NULL and then
      returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      92859a5e