1. 30 Jun, 2023 5 commits
    • Chao Yu's avatar
      f2fs: fix to do sanity check on direct node in truncate_dnode() · a6ec8378
      Chao Yu authored
      syzbot reports below bug:
      
      BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
      Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000
      
      CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd5 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
       print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
       print_report mm/kasan/report.c:462 [inline]
       kasan_report+0x11c/0x130 mm/kasan/report.c:572
       f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
       truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
       f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
       f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
       f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
       f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
       f2fs_truncate include/linux/fs.h:825 [inline]
       f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
       notify_change+0xb2c/0x1180 fs/attr.c:483
       do_truncate+0x143/0x200 fs/open.c:66
       handle_truncate fs/namei.c:3295 [inline]
       do_open fs/namei.c:3640 [inline]
       path_openat+0x2083/0x2750 fs/namei.c:3791
       do_filp_open+0x1ba/0x410 fs/namei.c:3818
       do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_creat fs/open.c:1448 [inline]
       __se_sys_creat fs/open.c:1442 [inline]
       __x64_sys_creat+0xcd/0x120 fs/open.c:1442
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
      is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
      node page, it traverse mapping data from node->i.i_addr[0] to
      node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.
      
      This patch fixes to add sanity check on dnode page in truncate_dnode(),
      so that, it can help to avoid triggering such issue, and once it encounters
      such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
      error into superblock, later fsck can detect such issue and try repairing.
      
      Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
      function has only one caller, and uses f2fs_truncate_data_blocks_range()
      instead.
      
      Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.comSigned-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a6ec8378
    • Sheng Yong's avatar
      f2fs: only set release for file that has compressed data · 87a91a15
      Sheng Yong authored
      If a file is not comprssed yet or does not have compressed data,
      for example, its data has a very low compression ratio, do not
      set FI_COMPRESS_RELEASED flag.
      Signed-off-by: default avatarSheng Yong <shengyong@oppo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      87a91a15
    • Chao Yu's avatar
      f2fs: fix compile warning in f2fs_destroy_node_manager() · c31e4961
      Chao Yu authored
      fs/f2fs/node.c: In function ‘f2fs_destroy_node_manager’:
      fs/f2fs/node.c:3390:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
       3390 | }
      
      Merging below pointer arrays into common one, and reuse it by cast type.
      
      struct nat_entry *natvec[NATVEC_SIZE];
      struct nat_entry_set *setvec[SETVEC_SIZE];
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c31e4961
    • Chao Yu's avatar
      f2fs: fix error path handling in truncate_dnode() · 0135c482
      Chao Yu authored
      If truncate_node() fails in truncate_dnode(), it missed to call
      f2fs_put_page(), fix it.
      
      Fixes: 7735730d ("f2fs: fix to propagate error from __get_meta_page()")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0135c482
    • Jaegeuk Kim's avatar
      f2fs: fix deadlock in i_xattr_sem and inode page lock · 5eda1ad1
      Jaegeuk Kim authored
      Thread #1:
      
      [122554.641906][   T92]  f2fs_getxattr+0xd4/0x5fc
          -> waiting for f2fs_down_read(&F2FS_I(inode)->i_xattr_sem);
      
      [122554.641927][   T92]  __f2fs_get_acl+0x50/0x284
      [122554.641948][   T92]  f2fs_init_acl+0x84/0x54c
      [122554.641969][   T92]  f2fs_init_inode_metadata+0x460/0x5f0
      [122554.641990][   T92]  f2fs_add_inline_entry+0x11c/0x350
          -> Locked dir->inode_page by f2fs_get_node_page()
      
      [122554.642009][   T92]  f2fs_do_add_link+0x100/0x1e4
      [122554.642025][   T92]  f2fs_create+0xf4/0x22c
      [122554.642047][   T92]  vfs_create+0x130/0x1f4
      
      Thread #2:
      
      [123996.386358][   T92]  __get_node_page+0x8c/0x504
          -> waiting for dir->inode_page lock
      
      [123996.386383][   T92]  read_all_xattrs+0x11c/0x1f4
      [123996.386405][   T92]  __f2fs_setxattr+0xcc/0x528
      [123996.386424][   T92]  f2fs_setxattr+0x158/0x1f4
          -> f2fs_down_write(&F2FS_I(inode)->i_xattr_sem);
      
      [123996.386443][   T92]  __f2fs_set_acl+0x328/0x430
      [123996.386618][   T92]  f2fs_set_acl+0x38/0x50
      [123996.386642][   T92]  posix_acl_chmod+0xc8/0x1c8
      [123996.386669][   T92]  f2fs_setattr+0x5e0/0x6bc
      [123996.386689][   T92]  notify_change+0x4d8/0x580
      [123996.386717][   T92]  chmod_common+0xd8/0x184
      [123996.386748][   T92]  do_fchmodat+0x60/0x124
      [123996.386766][   T92]  __arm64_sys_fchmodat+0x28/0x3c
      
      Cc: <stable@vger.kernel.org>
      Fixes: 27161f13 "f2fs: avoid race in between read xattr & write xattr"
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5eda1ad1
  2. 26 Jun, 2023 18 commits
  3. 12 Jun, 2023 16 commits
    • Chao Yu's avatar
      f2fs: avoid dead loop in f2fs_issue_checkpoint() · 5079e1c0
      Chao Yu authored
      generic/082 reports a bug as below:
      
      __schedule+0x332/0xf60
      schedule+0x6f/0xf0
      schedule_timeout+0x23b/0x2a0
      wait_for_completion+0x8f/0x140
      f2fs_issue_checkpoint+0xfe/0x1b0
      f2fs_sync_fs+0x9d/0xb0
      sync_filesystem+0x87/0xb0
      dquot_load_quota_sb+0x41b/0x460
      dquot_load_quota_inode+0xa5/0x130
      dquot_quota_on+0x4b/0x60
      f2fs_quota_on+0xe3/0x1b0
      do_quotactl+0x483/0x700
      __x64_sys_quotactl+0x15c/0x310
      do_syscall_64+0x3f/0x90
      entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The root casue is race case as below:
      
      Thread A			Kworker			IRQ
      - write()
      : write data to quota.user file
      
      				- writepages
      				 - f2fs_submit_page_write
      				  - __is_cp_guaranteed return false
      				  - inc_page_count(F2FS_WB_DATA)
      				 - submit_bio
      - quotactl(Q_QUOTAON)
       - f2fs_quota_on
        - dquot_quota_on
         - dquot_load_quota_inode
          - vfs_setup_quota_inode
          : inode->i_flags |= S_NOQUOTA
      							- f2fs_write_end_io
      							 - __is_cp_guaranteed return true
      							 - dec_page_count(F2FS_WB_CP_DATA)
          - dquot_load_quota_sb
           - f2fs_sync_fs
            - f2fs_issue_checkpoint
             - do_checkpoint
              - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA)
              : loop due to F2FS_WB_CP_DATA count is negative
      
      Calling filemap_fdatawrite() and filemap_fdatawait() to keep all data
      clean before quota file setup.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5079e1c0
    • Wu Bo's avatar
      f2fs: fix args passed to trace_f2fs_lookup_end · cadfc2f9
      Wu Bo authored
      The NULL return of 'd_splice_alias' dosen't mean error. Thus the
      successful case will also return NULL, which makes the tracepoint always
      print 'err=-ENOENT'.
      
      And the different cases of 'new' & 'err' are list as following:
      1) dentry exists: err(0) with new(NULL) --> dentry, err=0
      2) dentry exists: err(0) with new(VALID) --> new, err=0
      3) dentry exists: err(0) with new(ERR) --> dentry, err=ERR
      4) no dentry exists: err(-ENOENT) with new(NULL) --> dentry, err=-ENOENT
      5) no dentry exists: err(-ENOENT) with new(VALID) --> new, err=-ENOENT
      6) no dentry exists: err(-ENOENT) with new(ERR) --> dentry, err=ERR
      Signed-off-by: default avatarWu Bo <bo.wu@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cadfc2f9
    • Yangtao Li's avatar
      f2fs: flag as supporting buffered async reads · 38b57833
      Yangtao Li authored
      The f2fs uses generic_file_buffered_read(), which supports buffered async
      reads since commit 1a0a7853 ("mm: support async buffered reads in
      generic_file_buffered_read()").
      
      Let's enable it to match other file-systems. The read performance has been
      greatly improved under io_uring:
      
          167M/s -> 234M/s, Increase ratio by 40%
      
      Test w/:
          ./fio --name=onessd --filename=/data/test/local/io_uring_test
          --size=256M --rw=randread --bs=4k --direct=0 --overwrite=0
          --numjobs=1 --iodepth=1 --time_based=0 --runtime=10
          --ioengine=io_uring --registerfiles --fixedbufs
          --gtod_reduce=1 --group_reporting --sqthread_poll=1
      Signed-off-by: default avatarLu Hongfei <luhongfei@vivo.com>
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      38b57833
    • Chao Yu's avatar
      f2fs: fix to drop all dirty meta/node pages during umount() · 20872584
      Chao Yu authored
      For cp error case, there will be dirty meta/node pages remained after
      f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and
      do sanity check on reference count of dirty pages and inflight IOs.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      20872584
    • Chunhai Guo's avatar
      f2fs: Detect looped node chain efficiently · 38a4a330
      Chunhai Guo authored
      find_fsync_dnodes() detect the looped node chain by comparing the loop
      counter with free blocks. While it may take tens of seconds to quit when
      the free blocks are large enough. We can use Floyd's cycle detection
      algorithm to make the detection more efficient.
      Signed-off-by: default avatarChunhai Guo <guochunhai@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      38a4a330
    • Daejun Park's avatar
      f2fs: add async reset zone command support · 25f90805
      Daejun Park authored
      This patch enables submit reset zone command asynchornously. It helps
      decrease average latency of write IOs in high utilization scenario by
      faster checkpointing.
      Signed-off-by: default avatarDaejun Park <daejun7.park@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      25f90805
    • Chao Yu's avatar
      f2fs: flush error flags in workqueue · 901c12d1
      Chao Yu authored
      In IRQ context, it wakes up workqueue to record errors into on-disk
      superblock fields rather than in-memory fields.
      
      Fixes: 1aa161e4 ("f2fs: fix scheduling while atomic in decompression path")
      Fixes: 95fa90c9 ("f2fs: support recording errors into superblock")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      901c12d1
    • Chao Yu's avatar
      f2fs: don't reset unchangable mount option in f2fs_remount() · 458c15df
      Chao Yu authored
      syzbot reports a bug as below:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
      RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942
      Call Trace:
       lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691
       __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline]
       _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300
       __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100
       f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116
       f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664
       f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838
       vfs_fallocate+0x54b/0x6b0 fs/open.c:324
       ksys_fallocate fs/open.c:347 [inline]
       __do_sys_fallocate fs/open.c:355 [inline]
       __se_sys_fallocate fs/open.c:353 [inline]
       __x64_sys_fallocate+0xbd/0x100 fs/open.c:353
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The root cause is race condition as below:
      - since it tries to remount rw filesystem, so that do_remount won't
      call sb_prepare_remount_readonly to block fallocate, there may be race
      condition in between remount and fallocate.
      - in f2fs_remount(), default_options() will reset mount option to default
      one, and then update it based on result of parse_options(), so there is
      a hole which race condition can happen.
      
      Thread A			Thread B
      - f2fs_fill_super
       - parse_options
        - clear_opt(READ_EXTENT_CACHE)
      
      - f2fs_remount
       - default_options
        - set_opt(READ_EXTENT_CACHE)
      				- f2fs_fallocate
      				 - f2fs_insert_range
      				  - f2fs_drop_extent_tree
      				   - __drop_extent_tree
      				    - __may_extent_tree
      				     - test_opt(READ_EXTENT_CACHE) return true
      				    - write_lock(&et->lock) access NULL pointer
       - parse_options
        - clear_opt(READ_EXTENT_CACHE)
      
      Cc: <stable@vger.kernel.org>
      Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.orgSigned-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      458c15df
    • Chao Yu's avatar
      f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io() · d8189834
      Chao Yu authored
      butt3rflyh4ck reports a bug as below:
      
      When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is
      failed, f2fs kernel thread would invoke callback function to update f2fs io
      info, it would call  f2fs_write_end_io and may trigger null-ptr-deref in
      NODE_MAPPING.
      
      general protection fault, probably for non-canonical address
      KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
      RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline]
      RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370
       <TASK>
       bio_endio+0x5af/0x6c0 block/bio.c:1608
       req_bio_endio block/blk-mq.c:761 [inline]
       blk_update_request+0x5cc/0x1690 block/blk-mq.c:906
       blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023
       lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370
       blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101
       __do_softirq+0x1d4/0x8ef kernel/softirq.c:571
       run_ksoftirqd kernel/softirq.c:939 [inline]
       run_ksoftirqd+0x31/0x60 kernel/softirq.c:931
       smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164
       kthread+0x33e/0x440 kernel/kthread.c:379
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      The root cause is below race case can cause leaving dirty metadata
      in f2fs after filesystem is remount as ro:
      
      Thread A				Thread B
      - f2fs_ioc_resize_fs
       - f2fs_readonly   --- return false
       - f2fs_resize_fs
      					- f2fs_remount
      					 - write_checkpoint
      					 - set f2fs as ro
        - free_segment_range
         - update meta_inode's data
      
      Then, if f2fs_put_super()  fails to write_checkpoint due to readonly
      status, and meta_inode's dirty data will be writebacked after node_inode
      is put, finally, f2fs_write_end_io will access NULL pointer on
      sbi->node_inode.
      
      Thread A				IRQ context
      - f2fs_put_super
       - write_checkpoint fails
       - iput(node_inode)
       - node_inode = NULL
       - iput(meta_inode)
        - write_inode_now
         - f2fs_write_meta_page
      					- f2fs_write_end_io
      					 - NODE_MAPPING(sbi)
      					 : access NULL pointer on node_inode
      
      Fixes: b4b10061 ("f2fs: refactor resize_fs to avoid meta updates in progress")
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cnTested-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d8189834
    • Chao Yu's avatar
      f2fs: clean up w/ sbi->log_sectors_per_block · bfd47662
      Chao Yu authored
      Use sbi->log_sectors_per_block to clean up below calculated one:
      
      unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      bfd47662
    • Chao Yu's avatar
      f2fs: fix to set noatime and immutable flag for quota file · 90b7c4b7
      Chao Yu authored
      We should set noatime bit for quota files, since no one cares about
      atime of quota file, and we should set immutalbe bit as well, due to
      nobody should write to the file through exported interfaces.
      
      Meanwhile this patch use inode_lock to avoid race condition during
      inode->i_flags, f2fs_inode->i_flags update.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      90b7c4b7
    • Chao Yu's avatar
      f2fs: renew value of F2FS_FEATURE_* · 77e820ea
      Chao Yu authored
      Define F2FS_FEATURE_* macro w/ 32-bits value rather than 16-bits value.
      
      No logic changes.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      77e820ea
    • Chao Yu's avatar
      f2fs: renew value of F2FS_MOUNT_* · 478d7100
      Chao Yu authored
      Then we can just define newly introduced mount option w/ lasted
      free number rather than random free one.
      
      Just cleanup, no logic changes.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      478d7100
    • Chao Yu's avatar
      f2fs: fix potential deadlock due to unpaired node_write lock use · f082c6b2
      Chao Yu authored
      If S_NOQUOTA is cleared from inode during data page writeback of quota
      file, it may miss to unlock node_write lock, result in potential
      deadlock, fix to use the lock in paired.
      
      Kworker					Thread
      - writepage
       if (IS_NOQUOTA())
         f2fs_down_read(&sbi->node_write);
      					- vfs_cleanup_quota_inode
      					 - inode->i_flags &= ~S_NOQUOTA;
       if (IS_NOQUOTA())
         f2fs_up_read(&sbi->node_write);
      
      Fixes: 79963d96 ("f2fs: shrink node_write lock coverage")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f082c6b2
    • Yonggil Song's avatar
      f2fs: Fix over-estimating free section during FG GC · 36ded4c1
      Yonggil Song authored
      There was a bug that finishing FG GC unconditionally because free sections
      are over-estimated after checkpoint in FG GC.
      This patch initializes sec_freed by every checkpoint in FG GC.
      Signed-off-by: default avatarYonggil Song <yonggil.song@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      36ded4c1
    • Daeho Jeong's avatar
      f2fs: close unused open zones while mounting · 04abeb69
      Daeho Jeong authored
      Zoned UFS allows only 6 open zones at the same time, so we need to take
      care of the count of open zones while mounting.
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      04abeb69
  4. 24 May, 2023 1 commit