1. 19 Jan, 2023 5 commits
    • Christian Brauner's avatar
      fs: port privilege checking helpers to mnt_idmap · 9452e93e
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      9452e93e
    • Christian Brauner's avatar
      fs: port inode_owner_or_capable() to mnt_idmap · 01beba79
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      01beba79
    • Christian Brauner's avatar
      fs: port inode_init_owner() to mnt_idmap · f2d40141
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      f2d40141
    • Christian Brauner's avatar
      fs: port ->permission() to pass mnt_idmap · 4609e1f1
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4609e1f1
    • Christian Brauner's avatar
      fs: port ->fileattr_set() to pass mnt_idmap · 8782a9ae
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      8782a9ae
  2. 05 Dec, 2022 21 commits
  3. 25 Nov, 2022 1 commit
    • Al Viro's avatar
      use less confusing names for iov_iter direction initializers · de4eda9d
      Al Viro authored
      
      READ/WRITE proved to be actively confusing - the meanings are
      "data destination, as used with read(2)" and "data source, as
      used with write(2)", but people keep interpreting those as
      "we read data from it" and "we write data to it", i.e. exactly
      the wrong way.
      
      Call them ITER_DEST and ITER_SOURCE - at least that is harder
      to misinterpret...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      de4eda9d
  4. 15 Nov, 2022 4 commits
    • Anand Jain's avatar
      btrfs: free btrfs_path before copying subvol info to userspace · 013c1c55
      Anand Jain authored
      
      btrfs_ioctl_get_subvol_info() frees the search path after the userspace
      copy from the temp buffer @subvol_info. This can lead to a lock splat
      warning.
      
      Fix this by freeing the path before we copy it to userspace.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      013c1c55
    • Anand Jain's avatar
      btrfs: free btrfs_path before copying fspath to userspace · 8cf96b40
      Anand Jain authored
      
      btrfs_ioctl_ino_to_path() frees the search path after the userspace copy
      from the temp buffer @ipath->fspath. Which potentially can lead to a lock
      splat warning.
      
      Fix this by freeing the path before we copy it to userspace.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8cf96b40
    • Anand Jain's avatar
      btrfs: free btrfs_path before copying inodes to userspace · 418ffb9e
      Anand Jain authored
      
      btrfs_ioctl_logical_to_ino() frees the search path after the userspace
      copy from the temp buffer @inodes. Which potentially can lead to a lock
      splat.
      
      Fix this by freeing the path before we copy @inodes to userspace.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      418ffb9e
    • Josef Bacik's avatar
      btrfs: free btrfs_path before copying root refs to userspace · b740d806
      Josef Bacik authored
      
      Syzbot reported the following lockdep splat
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0 Not tainted
      ------------------------------------------------------
      syz-executor307/3029 is trying to acquire lock:
      ffff0000c02525d8 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x54/0xb4 mm/memory.c:5576
      
      but task is already holding lock:
      ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
      ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
      ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (btrfs-root-00){++++}-{3:3}:
             down_read_nested+0x64/0x84 kernel/locking/rwsem.c:1624
             __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
             btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
             btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
             btrfs_search_slot_get_root+0x74/0x338 fs/btrfs/ctree.c:1637
             btrfs_search_slot+0x1b0/0xfd8 fs/btrfs/ctree.c:1944
             btrfs_update_root+0x6c/0x5a0 fs/btrfs/root-tree.c:132
             commit_fs_roots+0x1f0/0x33c fs/btrfs/transaction.c:1459
             btrfs_commit_transaction+0x89c/0x12d8 fs/btrfs/transaction.c:2343
             flush_space+0x66c/0x738 fs/btrfs/space-info.c:786
             btrfs_async_reclaim_metadata_space+0x43c/0x4e0 fs/btrfs/space-info.c:1059
             process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
             worker_thread+0x340/0x610 kernel/workqueue.c:2436
             kthread+0x12c/0x158 kernel/kthread.c:376
             ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
      
      -> #2 (&fs_info->reloc_mutex){+.+.}-{3:3}:
             __mutex_lock_common+0xd4/0xca8 kernel/locking/mutex.c:603
             __mutex_lock kernel/locking/mutex.c:747 [inline]
             mutex_lock_nested+0x38/0x44 kernel/locking/mutex.c:799
             btrfs_record_root_in_trans fs/btrfs/transaction.c:516 [inline]
             start_transaction+0x248/0x944 fs/btrfs/transaction.c:752
             btrfs_start_transaction+0x34/0x44 fs/btrfs/transaction.c:781
             btrfs_create_common+0xf0/0x1b4 fs/btrfs/inode.c:6651
             btrfs_create+0x8c/0xb0 fs/btrfs/inode.c:6697
             lookup_open fs/namei.c:3413 [inline]
             open_last_lookups fs/namei.c:3481 [inline]
             path_openat+0x804/0x11c4 fs/namei.c:3688
             do_filp_open+0xdc/0x1b8 fs/namei.c:3718
             do_sys_openat2+0xb8/0x22c fs/open.c:1313
             do_sys_open fs/open.c:1329 [inline]
             __do_sys_openat fs/open.c:1345 [inline]
             __se_sys_openat fs/open.c:1340 [inline]
             __arm64_sys_openat+0xb0/0xe0 fs/open.c:1340
             __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
             invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
             el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
             do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
             el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
             el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
             el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
      
      -> #1 (sb_internal#2){.+.+}-{0:0}:
             percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
             __sb_start_write include/linux/fs.h:1826 [inline]
             sb_start_intwrite include/linux/fs.h:1948 [inline]
             start_transaction+0x360/0x944 fs/btrfs/transaction.c:683
             btrfs_join_transaction+0x30/0x40 fs/btrfs/transaction.c:795
             btrfs_dirty_inode+0x50/0x140 fs/btrfs/inode.c:6103
             btrfs_update_time+0x1c0/0x1e8 fs/btrfs/inode.c:6145
             inode_update_time fs/inode.c:1872 [inline]
             touch_atime+0x1f0/0x4a8 fs/inode.c:1945
             file_accessed include/linux/fs.h:2516 [inline]
             btrfs_file_mmap+0x50/0x88 fs/btrfs/file.c:2407
             call_mmap include/linux/fs.h:2192 [inline]
             mmap_region+0x7fc/0xc14 mm/mmap.c:1752
             do_mmap+0x644/0x97c mm/mmap.c:1540
             vm_mmap_pgoff+0xe8/0x1d0 mm/util.c:552
             ksys_mmap_pgoff+0x1cc/0x278 mm/mmap.c:1586
             __do_sys_mmap arch/arm64/kernel/sys.c:28 [inline]
             __se_sys_mmap arch/arm64/kernel/sys.c:21 [inline]
             __arm64_sys_mmap+0x58/0x6c arch/arm64/kernel/sys.c:21
             __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
             invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
             el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
             do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
             el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
             el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
             el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
      
      -> #0 (&mm->mmap_lock){++++}-{3:3}:
             check_prev_add kernel/locking/lockdep.c:3095 [inline]
             check_prevs_add kernel/locking/lockdep.c:3214 [inline]
             validate_chain kernel/locking/lockdep.c:3829 [inline]
             __lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
             lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
             __might_fault+0x7c/0xb4 mm/memory.c:5577
             _copy_to_user include/linux/uaccess.h:134 [inline]
             copy_to_user include/linux/uaccess.h:160 [inline]
             btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
             btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
             vfs_ioctl fs/ioctl.c:51 [inline]
             __do_sys_ioctl fs/ioctl.c:870 [inline]
             __se_sys_ioctl fs/ioctl.c:856 [inline]
             __arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
             __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
             invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
             el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
             do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
             el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
             el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
             el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
      
      other info that might help us debug this:
      
      Chain exists of:
        &mm->mmap_lock --> &fs_info->reloc_mutex --> btrfs-root-00
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(btrfs-root-00);
                                     lock(&fs_info->reloc_mutex);
                                     lock(btrfs-root-00);
        lock(&mm->mmap_lock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor307/3029:
       #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock fs/btrfs/locking.c:134 [inline]
       #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_tree_read_lock fs/btrfs/locking.c:140 [inline]
       #0: ffff0000c958a608 (btrfs-root-00){++++}-{3:3}, at: btrfs_read_lock_root_node+0x13c/0x1c0 fs/btrfs/locking.c:279
      
      stack backtrace:
      CPU: 0 PID: 3029 Comm: syz-executor307 Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
      Call trace:
       dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
       show_stack+0x2c/0x54 arch/arm64/kernel/stacktrace.c:163
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
       dump_stack+0x1c/0x58 lib/dump_stack.c:113
       print_circular_bug+0x2c4/0x2c8 kernel/locking/lockdep.c:2053
       check_noncircular+0x14c/0x154 kernel/locking/lockdep.c:2175
       check_prev_add kernel/locking/lockdep.c:3095 [inline]
       check_prevs_add kernel/locking/lockdep.c:3214 [inline]
       validate_chain kernel/locking/lockdep.c:3829 [inline]
       __lock_acquire+0x1530/0x30a4 kernel/locking/lockdep.c:5053
       lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5666
       __might_fault+0x7c/0xb4 mm/memory.c:5577
       _copy_to_user include/linux/uaccess.h:134 [inline]
       copy_to_user include/linux/uaccess.h:160 [inline]
       btrfs_ioctl_get_subvol_rootref+0x3a8/0x4bc fs/btrfs/ioctl.c:3203
       btrfs_ioctl+0xa08/0xa64 fs/btrfs/ioctl.c:5556
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __arm64_sys_ioctl+0xd0/0x140 fs/ioctl.c:856
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
       el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
       do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
       el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:636
       el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:654
       el0t_64_sync+0x18c/0x190 arch/arm64/kernel/entry.S:581
      
      We do generally the right thing here, copying the references into a
      temporary buffer, however we are still holding the path when we do
      copy_to_user from the temporary buffer.  Fix this by freeing the path
      before we copy to user space.
      
      Reported-by: syzbot+4ef9e52e464c6ff47d9d@syzkaller.appspotmail.com
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b740d806
  5. 26 Sep, 2022 3 commits
  6. 25 Jul, 2022 4 commits
  7. 17 May, 2022 1 commit
    • Qu Wenruo's avatar
      btrfs: allow defrag to convert inline extents to regular extents · d8101a0c
      Qu Wenruo authored
      
      Btrfs defaults to max_inline=2K to make small writes inlined into
      metadata.
      
      The default value is always a win, as even DUP/RAID1/RAID10 doubles the
      metadata usage, it should still cause less physical space used compared
      to a 4K regular extents.
      
      But since the introduction of RAID1C3 and RAID1C4 it's no longer the case,
      users may find inlined extents causing too much space wasted, and want
      to convert those inlined extents back to regular extents.
      
      Unfortunately defrag will unconditionally skip all inline extents, no
      matter if the user is trying to converting them back to regular extents.
      
      So this patch will add a small exception for defrag_collect_targets() to
      allow defragging inline extents, if and only if the inlined extents are
      larger than max_inline, allowing users to convert them to regular ones.
      
      This also allows us to defrag extents like the following:
      
      	item 6 key (257 EXTENT_DATA 0) itemoff 15794 itemsize 69
      		generation 7 type 0 (inline)
      		inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
      	item 7 key (257 EXTENT_DATA 4096) itemoff 15741 itemsize 53
      		generation 7 type 1 (regular)
      		extent data disk byte 13631488 nr 4096
      		extent data offset 0 nr 16384 ram 16384
      		extent compression 1 (zlib)
      
      Previously we're unable to do any defrag, since the first extent is
      inlined, and the second one has no extent to merge.
      
      Now we can defrag it to just one single extent, saving 48 bytes metadata
      space.
      
      	item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
      		generation 8 type 1 (regular)
      		extent data disk byte 13635584 nr 4096
      		extent data offset 0 nr 20480 ram 20480
      		extent compression 1 (zlib)
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d8101a0c
  8. 16 May, 2022 1 commit