1. 14 Oct, 2020 1 commit
    • Jaegeuk Kim's avatar
      f2fs: handle errors of f2fs_get_meta_page_nofail · 86f33603
      Jaegeuk Kim authored
      First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
      f2fs_get_meta_page_nofail().
      
      Quick fix was not to give any error with infinite loop, but syzbot caught
      a case where it goes to that loop from fuzzed image. In turned out we abused
      f2fs_get_meta_page_nofail() like in the below call stack.
      
      - f2fs_fill_super
       - f2fs_build_segment_manager
        - build_sit_entries
         - get_current_sit_page
      
      INFO: task syz-executor178:6870 can't die for more than 143 seconds.
      task:syz-executor178 state:R
       stack:26960 pid: 6870 ppid:  6869 flags:0x00004006
      Call Trace:
      
      Showing all locks held in the system:
      1 lock held by khungtaskd/1179:
       #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
      1 lock held by systemd-journal/3920:
      1 lock held by in:imklog/6769:
       #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
      1 lock held by syz-executor178/6870:
       #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229
      
      Actually, we didn't have to use _nofail in this case, since we could return
      error to mount(2) already with the error handler.
      
      As a result, this patch tries to 1) remove _nofail callers as much as possible,
      2) deal with error case in last remaining caller, f2fs_get_sum_page().
      
      Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      86f33603
  2. 09 Oct, 2020 3 commits
    • Chao Yu's avatar
      f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode · d662fad1
      Chao Yu authored
      If compressed inode has inconsistent fields on i_compress_algorithm,
      i_compr_blocks and i_log_cluster_size, we missed to set SBI_NEED_FSCK
      to notice fsck to repair the inode, fix it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d662fad1
    • Eric Biggers's avatar
      f2fs: reject CASEFOLD inode flag without casefold feature · f6322f3f
      Eric Biggers authored
      syzbot reported:
      
          general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
          KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
          CPU: 0 PID: 6860 Comm: syz-executor835 Not tainted 5.9.0-rc8-syzkaller #0
          Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
          RIP: 0010:utf8_casefold+0x43/0x1b0 fs/unicode/utf8-core.c:107
          [...]
          Call Trace:
           f2fs_init_casefolded_name fs/f2fs/dir.c:85 [inline]
           __f2fs_setup_filename fs/f2fs/dir.c:118 [inline]
           f2fs_prepare_lookup+0x3bf/0x640 fs/f2fs/dir.c:163
           f2fs_lookup+0x10d/0x920 fs/f2fs/namei.c:494
           __lookup_hash+0x115/0x240 fs/namei.c:1445
           filename_create+0x14b/0x630 fs/namei.c:3467
           user_path_create fs/namei.c:3524 [inline]
           do_mkdirat+0x56/0x310 fs/namei.c:3664
           do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [...]
      
      The problem is that an inode has F2FS_CASEFOLD_FL set, but the
      filesystem doesn't have the casefold feature flag set, and therefore
      super_block::s_encoding is NULL.
      
      Fix this by making sanity_check_inode() reject inodes that have
      F2FS_CASEFOLD_FL when the filesystem doesn't have the casefold feature.
      
      Reported-by: syzbot+05139c4039d0679e19ff@syzkaller.appspotmail.com
      Fixes: 2c2eb7a3 ("f2fs: Support case-insensitive file name lookups")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f6322f3f
    • Jaegeuk Kim's avatar
      f2fs: fix memory alignment to support 32bit · 48046cb5
      Jaegeuk Kim authored
      In 32bit system, 64-bits key breaks memory alignment.
      This fixes the commit "f2fs: support 64-bits key in f2fs rb-tree node entry".
      Reported-by: default avatarNicolas Chauvet <kwizart@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      48046cb5
  3. 29 Sep, 2020 15 commits
    • Jaegeuk Kim's avatar
      f2fs: fix slab leak of rpages pointer · adfc6943
      Jaegeuk Kim authored
      This fixes the below mem leak.
      
      [  130.157600] =============================================================================
      [  130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G        W  O     ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown()
      [  130.162742] -----------------------------------------------------------------------------
      [  130.162742]
      [  130.164979] Disabling lock debugging due to kernel taint
      [  130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200
      [  130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G    B   W  O      5.9.0-rc4+ #35
      [  130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
      [  130.171941] Call Trace:
      [  130.172528]  dump_stack+0x74/0x9a
      [  130.173298]  slab_err+0xb7/0xdc
      [  130.174044]  ? kernel_poison_pages+0xc0/0xc0
      [  130.175065]  ? on_each_cpu_cond_mask+0x48/0x90
      [  130.176096]  __kmem_cache_shutdown.cold+0x34/0x141
      [  130.177190]  kmem_cache_destroy+0x59/0x100
      [  130.178223]  f2fs_destroy_page_array_cache+0x15/0x20 [f2fs]
      [  130.179527]  f2fs_put_super+0x1bc/0x380 [f2fs]
      [  130.180538]  generic_shutdown_super+0x72/0x110
      [  130.181547]  kill_block_super+0x27/0x50
      [  130.182438]  kill_f2fs_super+0x76/0xe0 [f2fs]
      [  130.183448]  deactivate_locked_super+0x3b/0x80
      [  130.184456]  deactivate_super+0x3e/0x50
      [  130.185363]  cleanup_mnt+0x109/0x160
      [  130.186179]  __cleanup_mnt+0x12/0x20
      [  130.187003]  task_work_run+0x70/0xb0
      [  130.187841]  exit_to_user_mode_prepare+0x18f/0x1b0
      [  130.188917]  syscall_exit_to_user_mode+0x31/0x170
      [  130.189989]  do_syscall_64+0x45/0x90
      [  130.190828]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  130.191986] RIP: 0033:0x7faf868ea2eb
      [  130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01
      [  130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [  130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb
      [  130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50
      [  130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0
      [  130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50
      [  130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000
      [  130.210515] INFO: Object 0x00000000a980843a @offset=744
      [  130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297
      [  130.215030] 	__slab_alloc+0x20/0x40
      [  130.216566] 	kmem_cache_alloc+0x2a0/0x2e0
      [  130.218217] 	page_array_alloc+0x3d/0xe0 [f2fs]
      [  130.219940] 	f2fs_init_compress_ctx+0x1f/0x40 [f2fs]
      [  130.221736] 	f2fs_write_cache_pages+0x3db/0x860 [f2fs]
      [  130.223591] 	f2fs_write_data_pages+0x2c9/0x300 [f2fs]
      [  130.225414] 	do_writepages+0x43/0xd0
      [  130.226907] 	__filemap_fdatawrite_range+0xd5/0x110
      [  130.228632] 	filemap_write_and_wait_range+0x48/0xb0
      [  130.230336] 	__generic_file_write_iter+0x18a/0x1d0
      [  130.232035] 	f2fs_file_write_iter+0x226/0x550 [f2fs]
      [  130.233737] 	new_sync_write+0x113/0x1a0
      [  130.235204] 	vfs_write+0x1a6/0x200
      [  130.236579] 	ksys_write+0x67/0xe0
      [  130.237898] 	__x64_sys_write+0x1a/0x20
      [  130.239309] 	do_syscall_64+0x38/0x90
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      adfc6943
    • Chao Yu's avatar
      f2fs: compress: fix to disallow enabling compress on non-empty file · 519a5a2f
      Chao Yu authored
      Compressed inode and normal inode has different layout, so we should
      disallow enabling compress on non-empty file to avoid race condition
      during inode .i_addr array parsing and updating.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Fix missing condition]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      519a5a2f
    • Chao Yu's avatar
      f2fs: compress: introduce cic/dic slab cache · c68d6c88
      Chao Yu authored
      Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory
      allocation of compress_io_ctx and decompress_io_ctx structure.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c68d6c88
    • Chao Yu's avatar
      f2fs: compress: introduce page array slab cache · 31083031
      Chao Yu authored
      Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory
      allocation of page pointer array in compress context.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: Fix wrong memory allocation]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      31083031
    • Chao Yu's avatar
      f2fs: fix to do sanity check on segment/section count · 3a22e9ac
      Chao Yu authored
      As syzbot reported:
      
      BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
      BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
      Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878
      
      CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
       f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
       f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
       mount_bdev+0x32e/0x3f0 fs/super.c:1417
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1547
       do_new_mount fs/namespace.c:2875 [inline]
       path_mount+0x1387/0x20a0 fs/namespace.c:3192
       do_mount fs/namespace.c:3205 [inline]
       __do_sys_mount fs/namespace.c:3413 [inline]
       __se_sys_mount fs/namespace.c:3390 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The root cause is: if segs_per_sec is larger than one, and segment count
      in last section is less than segs_per_sec, we will suffer out-of-boundary
      memory access on sit_i->sentries[] in init_min_max_mtime().
      
      Fix this by adding sanity check among segment count, section count and
      segs_per_sec value in sanity_check_raw_super().
      
      Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3a22e9ac
    • Chao Yu's avatar
      f2fs: fix to check segment boundary during SIT page readahead · 6a257471
      Chao Yu authored
      As syzbot reported:
      
      kernel BUG at fs/f2fs/segment.h:657!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657
      Call Trace:
       build_sit_entries fs/f2fs/segment.c:4195 [inline]
       f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779
       f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633
       mount_bdev+0x32e/0x3f0 fs/super.c:1417
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1547
       do_new_mount fs/namespace.c:2875 [inline]
       path_mount+0x1387/0x2070 fs/namespace.c:3192
       do_mount fs/namespace.c:3205 [inline]
       __do_sys_mount fs/namespace.c:3413 [inline]
       __se_sys_mount fs/namespace.c:3390 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      @blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic
      in following sanity check in current_sit_addr(), add check condition to
      avoid this issue.
      
      Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6a257471
    • Chao Yu's avatar
      f2fs: fix uninit-value in f2fs_lookup · 6d7ab88a
      Chao Yu authored
      As syzbot reported:
      
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x21c/0x280 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
       f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503
       lookup_open fs/namei.c:3082 [inline]
       open_last_lookups fs/namei.c:3177 [inline]
       path_openat+0x2729/0x6a90 fs/namei.c:3365
       do_filp_open+0x2b8/0x710 fs/namei.c:3395
       do_sys_openat2+0xa88/0x1140 fs/open.c:1168
       do_sys_open fs/open.c:1184 [inline]
       __do_compat_sys_openat fs/open.c:1242 [inline]
       __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240
       __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
       do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
       __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
       do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
       do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
       entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
      
      In f2fs_lookup(), @res_page could be used before being initialized,
      because in __f2fs_find_entry(), once F2FS_I(dir)->i_current_depth was
      been fuzzed to zero, then @res_page will never be initialized, causing
      this kmsan warning, relocating @res_page initialization place to fix
      this bug.
      
      Reported-by: syzbot+0eac6f0bbd558fd866d7@syzkaller.appspotmail.com
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6d7ab88a
    • Chao Yu's avatar
      f2fs: remove unneeded parameter in find_in_block() · 17f930e0
      Chao Yu authored
      We can relocate @res_page assignment in find_in_block() to
      its caller, so unneeded parameter could be removed for cleanup.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      17f930e0
    • Wang Xiaojun's avatar
      f2fs: fix wrong total_sections check and fsmeta check · f99ba9ad
      Wang Xiaojun authored
      Meta area is not included in section_count computation.
      So the minimum number of total_sections is 1 meanwhile it cannot be
      greater than segment_count_main.
      
      The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA).
      Signed-off-by: default avatarWang Xiaojun <wangxiaojun11@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f99ba9ad
    • Wang Xiaojun's avatar
      f2fs: remove duplicated code in sanity_check_area_boundary · d89f5891
      Wang Xiaojun authored
      Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count <<
      log_blocks_per_seg)".
      Signed-off-by: default avatarWang Xiaojun <wangxiaojun11@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d89f5891
    • Wang Xiaojun's avatar
      f2fs: remove unused check on version_bitmap · e6e42187
      Wang Xiaojun authored
      A NULL will not be return by __bitmap_ptr here.
      Remove the unused check.
      Signed-off-by: default avatarWang Xiaojun <wangxiaojun11@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e6e42187
    • Chao Yu's avatar
      f2fs: relocate blkzoned feature check · d0660122
      Chao Yu authored
      Relocate blkzoned feature check into parse_options() like
      other feature check.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d0660122
    • Chao Yu's avatar
      f2fs: do sanity check on zoned block device path · 07eb1d69
      Chao Yu authored
      sbi->devs would be initialized only if image enables multiple device
      feature or blkzoned feature, if blkzoned feature flag was set by fuzz
      in non-blkzoned device, we will suffer below panic:
      
      get_zone_idx fs/f2fs/segment.c:4892 [inline]
      f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline]
      f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999
      Call Trace:
       check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704
       build_sit_entries fs/f2fs/segment.c:4403 [inline]
       f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100
       f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684
       mount_bdev+0x32e/0x3f0 fs/super.c:1417
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1547
       do_new_mount fs/namespace.c:2896 [inline]
       path_mount+0x12ae/0x1e70 fs/namespace.c:3216
       do_mount fs/namespace.c:3229 [inline]
       __do_sys_mount fs/namespace.c:3437 [inline]
       __se_sys_mount fs/namespace.c:3414 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      
      Add sanity check to inconsistency on factors: blkzoned flag, device
      path and device character to avoid above panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      07eb1d69
    • Zhang Qilong's avatar
      f2fs: add trace exit in exception path · 9b664822
      Zhang Qilong authored
      Missing the trace exit in f2fs_sync_dirty_inodes
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9b664822
    • Xiaojun Wang's avatar
      f2fs: change return value of reserved_segments to unsigned int · 4470eb28
      Xiaojun Wang authored
      The type of SM_I(sbi)->reserved_segments is unsigned int,
      so change the return value to unsigned int.
      The type cast can be removed in reserved_sections as a result.
      Signed-off-by: default avatarXiaojun Wang <wangxiaojun11@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4470eb28
  4. 14 Sep, 2020 1 commit
  5. 11 Sep, 2020 13 commits
    • Daeho Jeong's avatar
      f2fs: change virtual mapping way for compression pages · 6fcaebac
      Daeho Jeong authored
      By profiling f2fs compression works, I've found vmap() callings have
      unexpected hikes in the execution time in our test environment and
      those are bottlenecks of f2fs decompression path. Changing these with
      vm_map_ram(), we can enhance f2fs decompression speed pretty much.
      
      [Verification]
      Android Pixel 3(ARM64, 6GB RAM, 128GB UFS)
      Turned on only 0-3 little cores(at 1.785GHz)
      
      dd if=/dev/zero of=dummy bs=1m count=1000
      echo 3 > /proc/sys/vm/drop_caches
      dd if=dummy of=/dev/zero bs=512k
      
      - w/o compression -
      1048576000 bytes (0.9 G) copied, 2.082554 s, 480 M/s
      1048576000 bytes (0.9 G) copied, 2.081634 s, 480 M/s
      1048576000 bytes (0.9 G) copied, 2.090861 s, 478 M/s
      
      - before patch -
      1048576000 bytes (0.9 G) copied, 7.407527 s, 135 M/s
      1048576000 bytes (0.9 G) copied, 7.283734 s, 137 M/s
      1048576000 bytes (0.9 G) copied, 7.291508 s, 137 M/s
      
      - after patch -
      1048576000 bytes (0.9 G) copied, 1.998959 s, 500 M/s
      1048576000 bytes (0.9 G) copied, 1.987554 s, 503 M/s
      1048576000 bytes (0.9 G) copied, 1.986380 s, 503 M/s
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6fcaebac
    • Daeho Jeong's avatar
      f2fs: change return value of f2fs_disable_compressed_file to bool · 78134d03
      Daeho Jeong authored
      The returned integer is not required anywhere. So we need to change
      the return value to bool type.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      78134d03
    • Daeho Jeong's avatar
      f2fs: change i_compr_blocks of inode to atomic value · c2759eba
      Daeho Jeong authored
      writepages() can be concurrently invoked for the same file by different
      threads such as a thread fsyncing the file and a kworker kernel thread.
      So, changing i_compr_blocks without protection is racy and we need to
      protect it by changing it with atomic type value. Plus, we don't need
      a 64bit value for i_compr_blocks, so just we will use a atomic value,
      not atomic64.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c2759eba
    • Chao Yu's avatar
      f2fs: trace: fix typo · 32c0fec1
      Chao Yu authored
      Fixes a typo from 'compreesed' to 'compressed'.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      32c0fec1
    • Chao Yu's avatar
      f2fs: ignore compress mount option on image w/o compression feature · 69c0dd29
      Chao Yu authored
      to keep consistent with behavior when passing compress mount option
      to kernel w/o compression feature, so that mount may not fail on
      such condition.
      Reported-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      69c0dd29
    • Randy Dunlap's avatar
      f2fs: Documentation edits/fixes · ca313c82
      Randy Dunlap authored
      Correct grammar and spelling.
      
      Drop duplicate section for resize.f2fs.
      
      Change one occurrence of F2fs to F2FS for consistency.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Chao Yu <yuchao0@huawei.com>
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ca313c82
    • Chao Yu's avatar
      f2fs: allocate proper size memory for zstd decompress · 0e2b7385
      Chao Yu authored
      As 5kft <5kft@5kft.org> reported:
      
       kworker/u9:3: page allocation failure: order:9, mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
       CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G         C        5.8.3-sunxi #trunk
       Hardware name: Allwinner sun8i Family
       Workqueue: f2fs_post_read_wq f2fs_post_read_work
       [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] (show_stack+0x11/0x14)
       [<c0109a55>] (show_stack) from [<c056d489>] (dump_stack+0x75/0x84)
       [<c056d489>] (dump_stack) from [<c0243b53>] (warn_alloc+0xa3/0x104)
       [<c0243b53>] (warn_alloc) from [<c024473b>] (__alloc_pages_nodemask+0xb87/0xc40)
       [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] (kmalloc_order+0x19/0x38)
       [<c02267c5>] (kmalloc_order) from [<c02267fd>] (kmalloc_order_trace+0x19/0x90)
       [<c02267fd>] (kmalloc_order_trace) from [<c047c665>] (zstd_init_decompress_ctx+0x21/0x88)
       [<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>] (f2fs_decompress_pages+0x97/0x228)
       [<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>] (__read_end_io+0xfb/0x130)
       [<c045d0ab>] (__read_end_io) from [<c045d141>] (f2fs_post_read_work+0x61/0x84)
       [<c045d141>] (f2fs_post_read_work) from [<c0130b2f>] (process_one_work+0x15f/0x3b0)
       [<c0130b2f>] (process_one_work) from [<c0130e7b>] (worker_thread+0xfb/0x3e0)
       [<c0130e7b>] (worker_thread) from [<c0135c3b>] (kthread+0xeb/0x10c)
       [<c0135c3b>] (kthread) from [<c0100159>]
      
      zstd may allocate large size memory for {,de}compression, it may cause
      file copy failure on low-end device which has very few memory.
      
      For decompression, let's just allocate proper size memory based on current
      file's cluster size instead of max cluster size.
      Reported-by: default avatar5kft <5kft@5kft.org>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0e2b7385
    • Daeho Jeong's avatar
      f2fs: change compr_blocks of superblock info to 64bit · ae999bb9
      Daeho Jeong authored
      Current compr_blocks of superblock info is not 64bit value. We are
      accumulating each i_compr_blocks count of inodes to this value and
      those are 64bit values. So, need to change this to 64bit value.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ae999bb9
    • Daeho Jeong's avatar
      f2fs: add block address limit check to compressed file · 4eda1682
      Daeho Jeong authored
      Need to add block address range check to compressed file case and
      avoid calling get_data_block_bmap() for compressed file.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4eda1682
    • Dan Robertson's avatar
      f2fs: check position in move range ioctl · aad1383c
      Dan Robertson authored
      When the move range ioctl is used, check the input and output position and
      ensure that it is a non-negative value. Without this check
      f2fs_get_dnode_of_data may hit a memmory bug.
      Signed-off-by: default avatarDan Robertson <dan@dlrobertson.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      aad1383c
    • Jack Qiu's avatar
      f2fs: correct statistic of APP_DIRECT_IO/APP_DIRECT_READ_IO · 335cac8b
      Jack Qiu authored
      Miss to update APP_DIRECT_IO/APP_DIRECT_READ_IO when receiving async DIO.
      For example: fio -filename=/data/test.0 -bs=1m -ioengine=libaio -direct=1
      		-name=fill -size=10m -numjobs=1 -iodepth=32 -rw=write
      Signed-off-by: default avatarJack Qiu <jack.qiu@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      335cac8b
    • Matthew Wilcox (Oracle)'s avatar
      f2fs: Simplify SEEK_DATA implementation · 4cb03fec
      Matthew Wilcox (Oracle) authored
      Instead of finding the first dirty page and then seeing if it matches
      the index of a block that is NEW_ADDR, delay the lookup of the dirty
      bit until we've actually found a block that's NEW_ADDR.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4cb03fec
    • Chao Yu's avatar
      f2fs: support age threshold based garbage collection · 093749e2
      Chao Yu authored
      There are several issues in current background GC algorithm:
      - valid blocks is one of key factors during cost overhead calculation,
      so if segment has less valid block, however even its age is young or
      it locates hot segment, CB algorithm will still choose the segment as
      victim, it's not appropriate.
      - GCed data/node will go to existing logs, no matter in-there datas'
      update frequency is the same or not, it may mix hot and cold data
      again.
      - GC alloctor mainly use LFS type segment, it will cost free segment
      more quickly.
      
      This patch introduces a new algorithm named age threshold based
      garbage collection to solve above issues, there are three steps
      mainly:
      
      1. select a source victim:
      - set an age threshold, and select candidates beased threshold:
      e.g.
       0 means youngest, 100 means oldest, if we set age threshold to 80
       then select dirty segments which has age in range of [80, 100] as
       candiddates;
      - set candidate_ratio threshold, and select candidates based the
      ratio, so that we can shrink candidates to those oldest segments;
      - select target segment with fewest valid blocks in order to
      migrate blocks with minimum cost;
      
      2. select a target victim:
      - select candidates beased age threshold;
      - set candidate_radius threshold, search candidates whose age is
      around source victims, searching radius should less than the
      radius threshold.
      - select target segment with most valid blocks in order to avoid
      migrating current target segment.
      
      3. merge valid blocks from source victim into target victim with
      SSR alloctor.
      
      Test steps:
      - create 160 dirty segments:
       * half of them have 128 valid blocks per segment
       * left of them have 384 valid blocks per segment
      - run background GC
      
      Benefit: GC count and block movement count both decrease obviously:
      
      - Before:
        - Valid: 86
        - Dirty: 1
        - Prefree: 11
        - Free: 6001 (6001)
      
      GC calls: 162 (BG: 220)
        - data segments : 160 (160)
        - node segments : 2 (2)
      Try to move 41454 blocks (BG: 41454)
        - data blocks : 40960 (40960)
        - node blocks : 494 (494)
      
      IPU: 0 blocks
      SSR: 0 blocks in 0 segments
      LFS: 41364 blocks in 81 segments
      
      - After:
      
        - Valid: 87
        - Dirty: 0
        - Prefree: 4
        - Free: 6008 (6008)
      
      GC calls: 75 (BG: 76)
        - data segments : 74 (74)
        - node segments : 1 (1)
      Try to move 12813 blocks (BG: 12813)
        - data blocks : 12544 (12544)
        - node blocks : 269 (269)
      
      IPU: 0 blocks
      SSR: 12032 blocks in 77 segments
      LFS: 855 blocks in 2 segments
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      093749e2
  6. 10 Sep, 2020 7 commits