1. 22 Oct, 2015 1 commit
  2. 21 Oct, 2015 1 commit
  3. 20 Oct, 2015 2 commits
  4. 13 Oct, 2015 2 commits
    • Jaegeuk Kim's avatar
      f2fs: relocate the tracepoint for background_gc · 84e4214f
      Jaegeuk Kim authored
      Once f2fs_gc is done, wait_ms is changed once more.
      So, its tracepoint would be located after it.
      Reported-by: default avatarHe YunLei <heyunlei@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      84e4214f
    • Chao Yu's avatar
      f2fs crypto: fix racing of accessing encrypted page among · 08b39fbd
      Chao Yu authored
       different competitors
      
      Since we use different page cache (normally inode's page cache for R/W
      and meta inode's page cache for GC) to cache the same physical block
      which is belong to an encrypted inode. Writeback of these two page
      cache should be exclusive, but now we didn't handle writeback state
      well, so there may be potential racing problem:
      
      a)
      kworker:				f2fs_gc:
       - f2fs_write_data_pages
        - f2fs_write_data_page
         - do_write_data_page
          - write_data_page
           - f2fs_submit_page_mbio
      (page#1 in inode's page cache was queued
      in f2fs bio cache, and be ready to write
      to new blkaddr)
      					 - gc_data_segment
      					  - move_encrypted_block
      					   - pagecache_get_page
      					(page#2 in meta inode's page cache
      					was cached with the invalid datas
      					of physical block located in new
      					blkaddr)
      					   - f2fs_submit_page_mbio
      					(page#1 was submitted, later, page#2
      					with invalid data will be submitted)
      
      b)
      f2fs_gc:
       - gc_data_segment
        - move_encrypted_block
         - f2fs_submit_page_mbio
      (page#1 in meta inode's page cache was
      queued in f2fs bio cache, and be ready
      to write to new blkaddr)
      					user thread:
      					 - f2fs_write_begin
      					  - f2fs_submit_page_bio
      					(we submit the request to block layer
      					to update page#2 in inode's page cache
      					with physical block located in new
      					blkaddr, so here we may read gabbage
      					data from new blkaddr since GC hasn't
      					writebacked the page#1 yet)
      
      This patch fixes above potential racing problem for encrypted inode.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      08b39fbd
  5. 12 Oct, 2015 9 commits
  6. 09 Oct, 2015 25 commits
    • Jaegeuk Kim's avatar
      f2fs: merge meta writes as many possible · 6066d8cd
      Jaegeuk Kim authored
      This patch tries to merge IOs as many as possible when background flusher
      conducts flushing the dirty meta pages.
      
      [Before]
      
      ...
      2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size = 1007616
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768
      ...
      
      [After]
      
      ...
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336
      f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224
      ...
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6066d8cd
    • Jaegeuk Kim's avatar
      f2fs: introduce a periodic checkpoint flow · 60b99b48
      Jaegeuk Kim authored
      This patch introduces a periodic checkpoint feature.
      Note that, this is not enforcing to conduct checkpoints very strictly in terms
      of trigger timing, instead just hope to help user experiences.
      The default value is 60 seconds.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      60b99b48
    • Jaegeuk Kim's avatar
      f2fs: add a tracepoint for background gc · 5c267434
      Jaegeuk Kim authored
      This patch introduces a tracepoint to monitor background gc behaviors.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5c267434
    • Jaegeuk Kim's avatar
      f2fs: introduce background_gc=sync mount option · 6aefd93b
      Jaegeuk Kim authored
      This patch introduce background_gc=sync enabling synchronous cleaning in
      background.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6aefd93b
    • Chao Yu's avatar
      f2fs: introduce a new ioctl F2FS_IOC_WRITE_CHECKPOINT · 456b88e4
      Chao Yu authored
      This patch introduce a new ioctl for those users who want to trigger
      checkpoint from userspace through ioctl.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      456b88e4
    • Chao Yu's avatar
      f2fs: support synchronous gc in ioctl · d530d4d8
      Chao Yu authored
      This patch drops in batches gc triggered through ioctl, since user
      can easily control the gc by designing the loop around the ->ioctl.
      
      We support synchronous gc by forcing using FG_GC in f2fs_gc, so with
      it, user can make sure that in this round all blocks gced were
      persistent in the device until ioctl returned.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d530d4d8
    • Chao Yu's avatar
      f2fs: skip searching dirty map if dirty segment is not exist · 3342bb30
      Chao Yu authored
      When searching victim during gc, if there are no dirty segments in
      filesystem, we will still take the time to search the whole dirty segment
      map, it's not needed, it's better to skip in this condition.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3342bb30
    • Chao Yu's avatar
      f2fs: fix to avoid redundant searching in dirty map during gc · a43f7ec3
      Chao Yu authored
      When doing gc, we search a victim in dirty map, starting from position of
      last victim, we will reset the current searching position until we touch
      the end of dirty map, and then search the whole diryt map. So sometimes we
      will search the range [victim, last] twice, it's redundant, this patch
      avoids this issue.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a43f7ec3
    • Chao Yu's avatar
      f2fs: use atomic64_t for extent cache hit stat · 5b7ee374
      Chao Yu authored
      Our hit stat of extent cache will increase all the time until remount,
      and we use atomic_t type for the stat variable, so it may easily incur
      overflow when we query extent cache frequently in a long time running
      fs.
      
      So to avoid that, this patch uses atomic64_t for hit stat variables.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5b7ee374
    • Jaegeuk Kim's avatar
      f2fs: use vmalloc to handle -ENOMEM error · 39307a8e
      Jaegeuk Kim authored
      This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      39307a8e
    • Jaegeuk Kim's avatar
      f2fs: should get a victim from retrials · ab126cfc
      Jaegeuk Kim authored
      If we do not call get_victim first, we cannot get a new victim for retrial
      path.
      Reviewed-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ab126cfc
    • Chao Yu's avatar
      f2fs: fix to correct freed section number during gc · 45fe8492
      Chao Yu authored
      This patch fixes to maintain the right section count freed in garbage
      collecting when triggering a foreground gc.
      
      Besides, when a foreground gc is running on current selected section, once
      we fail to gc one segment, it's better to abandon gcing the left segments
      in current section, because anyway we will select next victim for
      foreground gc, so gc on the left segments in previous section will become
      overhead and also cause the long latency for caller.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      45fe8492
    • Chao Yu's avatar
      f2fs: fix to update {m,c}time correctly when truncating larger · 345a6b2e
      Chao Yu authored
      This patch fixes to update ctime and atime correctly when truncating
      larger in ->setattr.
      
      The bug is reported by xfstest generic/313 as below:
      
      generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
          --- tests/generic/313.out   2015-08-04 15:28:53.430798882 +0800
          +++ results/generic/313.out.bad   2015-09-28 17:04:27.294278016 +0800
          @@ -1,2 +1,4 @@
           QA output created by 313
           Silence is golden
          +ctime not updated after truncate up
          +mtime not updated after truncate up
          ...
          (Run 'diff -u tests/generic/313.out tests/generic/313.out.bad'  to see the entire diff)
      Ran: generic/313
      Failures: generic/313
      Failed 1 of 1 tests
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      345a6b2e
    • Jaegeuk Kim's avatar
      f2fs: do not skip dentry block writes · 90b803e6
      Jaegeuk Kim authored
      Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
      pressure and the number of dirty pages is pretty small.
      
      But, we didn't skip for normal data writes, which gives us not much big impact
      on overall performance.
      Moreover, by skipping some data writes, kworker falls into infinite loop to try
      to write blocks, when many dir inodes have only one dentry block.
      
      So, this patch removes skipping data writes.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      90b803e6
    • Chao Yu's avatar
      f2fs: remove unneeded f2fs_{,un}lock_op in do_recover_data() · 72235541
      Chao Yu authored
      Protecting recovery flow by using cp_rwsem is not needed, since we have
      prevent triggering any checkpoint by locking cp_mutex previously.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      72235541
    • Chao Yu's avatar
      f2fs: fix incorrect bimodal calculation · 1d7e10d5
      Chao Yu authored
      In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
      div_u64 can only handle 32-bits divisor, so our divisor with u64 type
      passed to div_u64 will overflow, result in the wrong calculation when
      show debug info of f2fs as below:
      
      BDF: 464, avg. vblocks: 23509
      (BDF should never exceed 100)
      
      So change to use div64_u64 to handle this case correctly.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1d7e10d5
    • Chao Yu's avatar
      f2fs: introduce __try_update_largest_extent · 4abd3f5a
      Chao Yu authored
      This patch adds a new helper __try_update_largest_extent for cleanup.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4abd3f5a
    • Nicholas Krause's avatar
      f2fs: fix error handling for calls to various functions in the function recover_inline_data · 545fe421
      Nicholas Krause authored
      This fixes error handling for calls to various functions in the
      function  recover_inline_data to check if these particular functions
      either return a error code or the boolean value false to signal their
      caller they have failed internally and if this arises return false
      to signal failure immediately to the caller of recover_inline_data
      as we cannot continue after failures to calling either the function
      truncate_inline_inode or truncate_blocks.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      545fe421
    • Chao Yu's avatar
      f2fs: disallow switch extent_cache option dynamically · 9cd81ce3
      Chao Yu authored
      Swith extent_cache option dynamically when remount may casue consistency
      issue between extent cache and dnode page. Fix in this patch to avoid
      that condition.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9cd81ce3
    • Chao Yu's avatar
      f2fs: use correct flag in f2fs_map_blocks() · 46c9e141
      Chao Yu authored
      We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc ("f2fs: fix
      incorrect mapping for bmap"), but forget to use this flag in the right
      place, fix it.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      46c9e141
    • Chao Yu's avatar
      f2fs: fix to handle io error in ->direct_IO · f9811703
      Chao Yu authored
      Here is a oops reported as following message when testing generic/019 of
      xfstest:
      
       ------------[ cut here ]------------
       kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
       invalid opcode: 0000 [#1] SMP
       Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
      nf_def
       CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
       task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
       RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
       RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
       RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
       RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
       RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
       R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
       FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
       Stack:
        000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
        0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
        ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
       Call Trace:
        [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
        [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
        [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
        [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
        [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
        [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
        [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
        [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
        [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
        [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
        [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
        [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
        [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
        [<ffffffff8120b913>] do_io_submit+0x373/0x630
        [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
        [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
        [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
       Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
       RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
        RSP <ffff8803fd61f918>
       ---[ end trace 2e577d7f711ddb86 ]---
      
      The reason is that: in the test of generic/019, we will trigger a manmade
      IO error in block layer through debugfs, after that, prefree segment will
      no longer be freed, because we always skip doing gc or checkpoint when
      there occurs an IO error.
      
      Meanwhile fio with aio engine generated a large number of direct IOs,
      which continue allocating spaces in free segment until we run out of them,
      eventually, results in panic in new_curseg as no more free segment was
      found.
      
      So, this patch changes to return EIO in direct_IO for this condition.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f9811703
    • Chao Yu's avatar
      f2fs: do in batches truncation in truncate_hole · ea58711e
      Chao Yu authored
      truncate_data_blocks_range can do in batches truncation which makes all
      changes in dnode page content, dnode page status, extent cache, block
      count updating together.
      
      But previously, truncate_hole() always truncates one block in dnode page
      at a time by invoking truncate_data_blocks_range(,1), which make thing
      slow.
      
      This patch changes truncate_hole() to do in batches truncation for all
      target blocks in one direct node inside truncate_data_blocks_range, which
      can make our punch hole operation in ->fallocate more efficent.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ea58711e
    • Fan Li's avatar
      f2fs: optimize code of f2fs_update_extent_tree_range · 4d1fa815
      Fan Li authored
      Fix 2 potential problems:
      1. when largest extent needs to be invalidated, it will be reset in
         __drop_largest_extent, which makes __is_extent_same after always
         return false, and largest extent unchanged. Now we update it properly.
      
      2. when extent is split and the latter part remains in tree, next_en
         should be the latter part instead of next extent of original extent.
         It will cause merge failure if there is in-place update, although
         there is not, I think this fix will still makes codes less ambiguous.
      
      This patch also simplifies codes of invalidating extents, and optimizes the
      procedues that split extent into two.
      There are a few modifications after last patch:
      1. prev_en now is updated properly.
      2. more codes and branches are simplified.
      Signed-off-by: default avatarFan li <fanofcode.li@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4d1fa815
    • Fan Li's avatar
      f2fs: drop largest extent by range · 41a099de
      Fan Li authored
      now we update extent by range, fofs may not be on the largest
      extent if the new extent overlaps with it. so add a new function
      to drop largest extent properly.
      Signed-off-by: default avatarFan li <fanofcode.li@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      41a099de
    • Jaegeuk Kim's avatar
      f2fs: check end_io for metapages before making next checkpoint blocks · a7230d16
      Jaegeuk Kim authored
      This patch avoids to produce new checkpoint blocks before the previous meta
      pages were written completely.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a7230d16