1. 30 May, 2019 2 commits
    • Chao Yu's avatar
      f2fs: fix to do sanity check on segment bitmap of LFS curseg · c854f4d6
      Chao Yu authored
      As Jungyeon Reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Reproduces
      gcc poc_13.c
      ./run.sh f2fs
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2133!
       RIP: 0010:update_sit_entry+0x35d/0x3e0
       Call Trace:
        f2fs_allocate_data_block+0x16c/0x5a0
        do_write_page+0x57/0x100
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x364/0x13a0
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0x110
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The testcase fails because that, in fuzzed image, current segment was
      allocated with LFS type, its .next_blkoff should point to an unused
      block address, but actually, its bitmap shows it's not. So during
      allocation, f2fs crash when setting bitmap.
      
      Introducing sanity_check_curseg() to check such inconsistence of
      current in-used segment.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c854f4d6
    • Jaegeuk Kim's avatar
      f2fs: add missing sysfs entries in documentation · 4d11d13e
      Jaegeuk Kim authored
      This patch cleans up documentation to cover missing sysfs entries.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4d11d13e
  2. 23 May, 2019 6 commits
    • Chao Yu's avatar
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu authored
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • Park Ju Hyung's avatar
      f2fs: always assume that the device is idle under gc_urgent · f7dfd9f3
      Park Ju Hyung authored
      This allows more aggressive discards and balancing job to be done
      under gc_urgent.
      Signed-off-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f7dfd9f3
    • Chao Yu's avatar
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu authored
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
    • Jaegeuk Kim's avatar
      f2fs: allow ssr block allocation during checkpoint=disable period · 49dd883c
      Jaegeuk Kim authored
      This patch allows to use ssr during checkpoint is disabled.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      49dd883c
    • Chao Yu's avatar
      f2fs: fix to check layout on last valid checkpoint park · 5dae2d39
      Chao Yu authored
      As Ju Hyung reported:
      
      "
      I was semi-forced today to use the new kernel and test f2fs.
      
      My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
      fix some stuffs. The live CD was using 4.15 kernel, and just mounting
      the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
      f2fs-stable merged) refused to mount with "SIT is corrupted node"
      message.
      
      I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
      repair cp_loads blocks at correct position"
      
      It spit out 140M worth of output, but at least I didn't have to run it
      twice. Everything returned "Ok" in the 2nd run.
      The new log is at
      http://arter97.com/f2fs/final
      
      After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
      f2fs-stable merged and it mounted.
      
      But, I got this:
      [    1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
      deprecated, run fsck to repair, chksum_offset: 4092
      [    1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00
      
      But after doing a reboot, the message is gone:
      [    1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
      [    1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
      [    1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda
      
      I'm not exactly sure why the kernel detected that I'm still using the
      old layout on the first boot. Maybe fsck didn't fix it properly, or
      the check from the kernel is improper.
      "
      
      Although we have rebuild the old deprecated checkpoint with new layout
      during repair, we only repair last checkpoint park, the other old one is
      remained.
      
      Once the image was mounted, we will 1) sanity check layout and 2) decide
      which checkpoint park to use according to cp_ver. So that we will print
      reported message unnecessarily at step 1), to avoid it, we simply move
      layout check into f2fs_sanity_check_ckpt() after step 2).
      Reported-by: default avatarPark Ju Hyung <qkrwngud825@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5dae2d39
    • Jaegeuk Kim's avatar
      f2fs: link f2fs quota ops for sysfile · bc88ac96
      Jaegeuk Kim authored
      This patch reverts:
      commit fb40d618 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").
      
      We were missing error handlers used in f2fs quota ops.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bc88ac96
  3. 14 May, 2019 32 commits