1. 12 Jan, 2024 1 commit
    • Omar Sandoval's avatar
      btrfs: don't abort filesystem when attempting to snapshot deleted subvolume · 7081929a
      Omar Sandoval authored
      
      If the source file descriptor to the snapshot ioctl refers to a deleted
      subvolume, we get the following abort:
      
        BTRFS: Transaction aborted (error -2)
        WARNING: CPU: 0 PID: 833 at fs/btrfs/transaction.c:1875 create_pending_snapshot+0x1040/0x1190 [btrfs]
        Modules linked in: pata_acpi btrfs ata_piix libata scsi_mod virtio_net blake2b_generic xor net_failover virtio_rng failover scsi_common rng_core raid6_pq libcrc32c
        CPU: 0 PID: 833 Comm: t_snapshot_dele Not tainted 6.7.0-rc6 #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
        RIP: 0010:create_pending_snapshot+0x1040/0x1190 [btrfs]
        RSP: 0018:ffffa09c01337af8 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff9982053e7c78 RCX: 0000000000000027
        RDX: ffff99827dc20848 RSI: 0000000000000001 RDI: ffff99827dc20840
        RBP: ffffa09c01337c00 R08: 0000000000000000 R09: ffffa09c01337998
        R10: 0000000000000003 R11: ffffffffb96da248 R12: fffffffffffffffe
        R13: ffff99820535bb28 R14: ffff99820b7bd000 R15: ffff99820381ea80
        FS:  00007fe20aadabc0(0000) GS:ffff99827dc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000559a120b502f CR3: 00000000055b6000 CR4: 00000000000006f0
        Call Trace:
         <TASK>
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? __warn+0x81/0x130
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? report_bug+0x171/0x1a0
         ? handle_bug+0x3a/0x70
         ? exc_invalid_op+0x17/0x70
         ? asm_exc_invalid_op+0x1a/0x20
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         ? create_pending_snapshot+0x1040/0x1190 [btrfs]
         create_pending_snapshots+0x92/0xc0 [btrfs]
         btrfs_commit_transaction+0x66b/0xf40 [btrfs]
         btrfs_mksubvol+0x301/0x4d0 [btrfs]
         btrfs_mksnapshot+0x80/0xb0 [btrfs]
         __btrfs_ioctl_snap_create+0x1c2/0x1d0 [btrfs]
         btrfs_ioctl_snap_create_v2+0xc4/0x150 [btrfs]
         btrfs_ioctl+0x8a6/0x2650 [btrfs]
         ? kmem_cache_free+0x22/0x340
         ? do_sys_openat2+0x97/0xe0
         __x64_sys_ioctl+0x97/0xd0
         do_syscall_64+0x46/0xf0
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
        RIP: 0033:0x7fe20abe83af
        RSP: 002b:00007ffe6eff1360 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe20abe83af
        RDX: 00007ffe6eff23c0 RSI: 0000000050009417 RDI: 0000000000000003
        RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fe20ad16cd0
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        R13: 00007ffe6eff13c0 R14: 00007fe20ad45000 R15: 0000559a120b6d58
         </TASK>
        ---[ end trace 0000000000000000 ]---
        BTRFS: error (device vdc: state A) in create_pending_snapshot:1875: errno=-2 No such entry
        BTRFS info (device vdc: state EA): forced readonly
        BTRFS warning (device vdc: state EA): Skipping commit of aborted transaction.
        BTRFS: error (device vdc: state EA) in cleanup_transaction:2055: errno=-2 No such entry
      
      This happens because create_pending_snapshot() initializes the new root
      item as a copy of the source root item. This includes the refs field,
      which is 0 for a deleted subvolume. The call to btrfs_insert_root()
      therefore inserts a root with refs == 0. btrfs_get_new_fs_root() then
      finds the root and returns -ENOENT if refs == 0, which causes
      create_pending_snapshot() to abort.
      
      Fix it by checking the source root's refs before attempting the
      snapshot, but after locking subvol_sem to avoid racing with deletion.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7081929a
  2. 24 Nov, 2023 1 commit
  3. 03 Nov, 2023 1 commit
  4. 28 Oct, 2023 1 commit
  5. 12 Oct, 2023 8 commits
    • Anand Jain's avatar
      btrfs: disable the device add feature for temp-fsid · ac6ea6a9
      Anand Jain authored
      
      The device addition operation will transform the cloned temp-fsid mounted
      device into a multi-device filesystem. Therefore, it is marked as
      unsupported.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ac6ea6a9
    • Filipe Manana's avatar
      btrfs: add and use helpers for reading and writing last_trans_committed · 0124855f
      Filipe Manana authored
      Currently the last_trans_committed field of struct btrfs_fs_info is
      modified and read without any locking or other protection. For example
      early in the fsync path, skip_inode_logging() is called which reads
      fs_info->last_trans_committed, but at the same time we can have a
      transaction commit completing and updating that field.
      
      In the case of an fsync this is harmless and any data race should be
      rare and at most cause an unnecessary logging of an inode.
      
      To avoid data race warnings from tools like KCSAN and other issues such
      as load and store tearing (amongst others, see [1]), create helpers to
      access the last_trans_committed field of struct btrfs_fs_info using
      READ_ONCE() and WRITE_ONCE(), and use these helpers everywhere.
      
      [1] https://lwn.net/Articles/793253/
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0124855f
    • Filipe Manana's avatar
      btrfs: add and use helpers for reading and writing fs_info->generation · 4a4f8fe2
      Filipe Manana authored
      Currently the generation field of struct btrfs_fs_info is always modified
      while holding fs_info->trans_lock locked. Most readers will access this
      field without taking that lock but while holding a transaction handle,
      which is safe to do due to the transaction life cycle.
      
      However there are other readers that are neither holding the lock nor
      holding a transaction handle open:
      
      1) When reading an inode from disk, at btrfs_read_locked_inode();
      
      2) When reading the generation to expose it to sysfs, at
         btrfs_generation_show();
      
      3) Early in the fsync path, at skip_inode_logging();
      
      4) When creating a hole at btrfs_cont_expand(), during write paths,
         truncate and reflinking;
      
      5) In the fs_info ioctl (btrfs_ioctl_fs_info());
      
      6) While mounting the filesystem, in the open_ctree() path. In these
         cases it's safe to directly read fs_info->generation as no one
         can concurrently start a transaction and update fs_info->generation.
      
      In case of the fsync path, races here should be harmless, and in the worst
      case they may cause a fsync to log an inode when it's not really needed,
      so nothing bad from a functional perspective. In the other cases it's not
      so clear if functional problems may arise, though in case 1 rare things
      like a load/store tearing [1] may cause the BTRFS_INODE_NEEDS_FULL_SYNC
      flag not being set on an inode and therefore result in incorrect logging
      later on in case a fsync call is made.
      
      To avoid data race warnings from tools like KCSAN and other issues such
      as load and store tearing (amongst others, see [1]), create helpers to
      access the generation field of struct btrfs_fs_info using READ_ONCE() and
      WRITE_ONCE(), and use these helpers where needed.
      
      [1] https://lwn.net/Articles/793253/
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4a4f8fe2
    • Filipe Manana's avatar
      btrfs: remove redundant root argument from btrfs_update_inode() · 8b9d0322
      Filipe Manana authored
      
      The root argument for btrfs_update_inode() always matches the root of the
      given inode, so remove the root argument and get it from the inode
      argument.
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8b9d0322
    • Boris Burkov's avatar
      btrfs: qgroup: track metadata relocation COW with simple quota · 60ea105a
      Boris Burkov authored
      
      Relocation COWs metadata blocks in two cases for the reloc root:
      
      - copying the subvolume root item when creating the reloc root
      - copying a btree node when there is a COW during relocation
      
      In both cases, the resulting btree node hits an abnormal code path with
      respect to the owner field in its btrfs_header. It first creates the
      root item for the new objectid, which populates the reloc root id, and
      it at this point that delayed refs are created.
      
      Later, it fully copies the old node into the new node (including the
      original owner field) which overwrites it. This results in a simple
      quotas mismatch where we run the delayed ref for the reloc root which
      has no simple quota effect (reloc root is not an fstree) but when we
      ultimately delete the node, the owner is the real original fstree and we
      do free the space.
      
      To work around this without tampering with the behavior of relocation,
      add a parameter to btrfs_add_tree_block that lets the relocation code
      path specify a different owning root than the "operating" root (in this
      case, owning root is the real root and the operating root is the reloc
      root). These can naturally be plumbed into delayed refs that have the
      same concept.
      
      Note that this is a double count in some sense, but a relatively natural
      one, as there are really two extents, and the old one will be deleted
      soon. This is consistent with how data relocation extents are accounted
      by simple quotas.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      60ea105a
    • Boris Burkov's avatar
      btrfs: qgroup: simple quota auto hierarchy for nested subvolumes · 5343cd93
      Boris Burkov authored
      
      Consider the following sequence:
      
      - enable quotas
      - create subvol S id 256 at dir outer/
      - create a qgroup 1/100
      - add 0/256 (S's auto qgroup) to 1/100
      - create subvol T id 257 at dir outer/inner/
      
      With full qgroups, there is no relationship between 0/257 and either of
      0/256 or 1/100. There is an inherit feature that the creator of inner/
      can use to specify it ought to be in 1/100.
      
      Simple quotas are targeted at container isolation, where such automatic
      inheritance for not necessarily trusted/controlled nested subvol
      creation would be quite helpful. Therefore, add a new default behavior
      for simple quotas: when you create a nested subvol, automatically
      inherit as parents any parents of the qgroup of the subvol the new inode
      is going in.
      
      In our example, 257/0 would also be under 1/100, allowing easy control
      of a total quota over an arbitrary hierarchy of subvolumes.
      
      I think this _might_ be a generally useful behavior, so it could be
      interesting to put it behind a new inheritance flag that simple quotas
      always use while traditional quotas let the user specify, but this is a
      minimally intrusive change to start.
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5343cd93
    • Boris Burkov's avatar
      btrfs: qgroup: add new quota mode for simple quotas · 182940f4
      Boris Burkov authored
      
      Add a new quota mode called "simple quotas". It can be enabled by the
      existing quota enable ioctl via a new command, and sets an incompat
      bit, as the implementation of simple quotas will make backwards
      incompatible changes to the disk format of the extent tree.
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      182940f4
    • Filipe Manana's avatar
      btrfs: abort transaction on generation mismatch when marking eb as dirty · 50564b65
      Filipe Manana authored
      
      When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(),
      we check if its generation matches the running transaction and if not we
      just print a warning. Such mismatch is an indicator that something really
      went wrong and only printing a warning message (and stack trace) is not
      enough to prevent a corruption. Allowing a transaction to commit with such
      an extent buffer will trigger an error if we ever try to read it from disk
      due to a generation mismatch with its parent generation.
      
      So abort the current transaction with -EUCLEAN if we notice a generation
      mismatch. For this we need to pass a transaction handle to
      btrfs_mark_buffer_dirty() which is always available except in test code,
      in which case we can pass NULL since it operates on dummy extent buffers
      and all test roots have a single node/leaf (root node at level 0).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      50564b65
  6. 03 Oct, 2023 1 commit
    • Josef Bacik's avatar
      btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c · 9147b9de
      Josef Bacik authored
      
      Jens reported the following warnings from -Wmaybe-uninitialized recent
      Linus' branch.
      
        In file included from ./include/asm-generic/rwonce.h:26,
      		   from ./arch/arm64/include/asm/rwonce.h:71,
      		   from ./include/linux/compiler.h:246,
      		   from ./include/linux/export.h:5,
      		   from ./include/linux/linkage.h:7,
      		   from ./include/linux/kernel.h:17,
      		   from fs/btrfs/ioctl.c:6:
        In function ‘instrument_copy_from_user_before’,
            inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
            inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
            inlined from ‘btrfs_ioctl_space_info’ at fs/btrfs/ioctl.c:2999:6,
            inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4616:10:
        ./include/linux/kasan-checks.h:38:27: warning: ‘space_args’ may be used
        uninitialized [-Wmaybe-uninitialized]
           38 | #define kasan_check_write __kasan_check_write
        ./include/linux/instrumented.h:129:9: note: in expansion of macro
        ‘kasan_check_write’
          129 |         kasan_check_write(to, n);
      	|         ^~~~~~~~~~~~~~~~~
        ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
        ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
        volatile void *’ to ‘__kasan_check_write’ declared here
           20 | bool __kasan_check_write(const volatile void *p, unsigned int
      	size);
      	|      ^~~~~~~~~~~~~~~~~~~
        fs/btrfs/ioctl.c:2981:39: note: ‘space_args’ declared here
         2981 |         struct btrfs_ioctl_space_args space_args;
      	|                                       ^~~~~~~~~~
        In function ‘instrument_copy_from_user_before’,
            inlined from ‘_copy_from_user’ at ./include/linux/uaccess.h:148:3,
            inlined from ‘copy_from_user’ at ./include/linux/uaccess.h:183:7,
            inlined from ‘_btrfs_ioctl_send’ at fs/btrfs/ioctl.c:4343:9,
            inlined from ‘btrfs_ioctl’ at fs/btrfs/ioctl.c:4658:10:
        ./include/linux/kasan-checks.h:38:27: warning: ‘args32’ may be used
        uninitialized [-Wmaybe-uninitialized]
           38 | #define kasan_check_write __kasan_check_write
        ./include/linux/instrumented.h:129:9: note: in expansion of macro
        ‘kasan_check_write’
          129 |         kasan_check_write(to, n);
      	|         ^~~~~~~~~~~~~~~~~
        ./include/linux/kasan-checks.h: In function ‘btrfs_ioctl’:
        ./include/linux/kasan-checks.h:20:6: note: by argument 1 of type ‘const
        volatile void *’ to ‘__kasan_check_write’ declared here
           20 | bool __kasan_check_write(const volatile void *p, unsigned int
      	size);
      	|      ^~~~~~~~~~~~~~~~~~~
        fs/btrfs/ioctl.c:4341:49: note: ‘args32’ declared here
         4341 |                 struct btrfs_ioctl_send_args_32 args32;
      	|                                                 ^~~~~~
      
      This was due to his config options and having KASAN turned on,
      which adds some extra checks around copy_from_user(), which then
      triggered the -Wmaybe-uninitialized checker for these cases.
      
      Fix the warnings by initializing the different structs we're copying
      into.
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9147b9de
  7. 08 Sep, 2023 1 commit
    • Filipe Manana's avatar
      btrfs: release path before inode lookup during the ino lookup ioctl · ee34a82e
      Filipe Manana authored
      During the ino lookup ioctl we can end up calling btrfs_iget() to get an
      inode reference while we are holding on a root's btree. If btrfs_iget()
      needs to lookup the inode from the root's btree, because it's not
      currently loaded in memory, then it will need to lock another or the
      same path in the same root btree. This may result in a deadlock and
      trigger the following lockdep splat:
      
        WARNING: possible circular locking dependency detected
        6.5.0-rc7-syzkaller-00004-gf7757129 #0 Not tainted
        ------------------------------------------------------
        syz-executor277/5012 is trying to acquire lock:
        ffff88802df41710 (btrfs-tree-01){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
      
        but task is already holding lock:
        ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (btrfs-tree-00){++++}-{3:3}:
               down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
               __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
               btrfs_search_slot+0x13a4/0x2f80 fs/btrfs/ctree.c:2302
               btrfs_init_root_free_objectid+0x148/0x320 fs/btrfs/disk-io.c:4955
               btrfs_init_fs_root fs/btrfs/disk-io.c:1128 [inline]
               btrfs_get_root_ref+0x5ae/0xae0 fs/btrfs/disk-io.c:1338
               btrfs_get_fs_root fs/btrfs/disk-io.c:1390 [inline]
               open_ctree+0x29c8/0x3030 fs/btrfs/disk-io.c:3494
               btrfs_fill_super+0x1c7/0x2f0 fs/btrfs/super.c:1154
               btrfs_mount_root+0x7e0/0x910 fs/btrfs/super.c:1519
               legacy_get_tree+0xef/0x190 fs/fs_context.c:611
               vfs_get_tree+0x8c/0x270 fs/super.c:1519
               fc_mount fs/namespace.c:1112 [inline]
               vfs_kern_mount+0xbc/0x150 fs/namespace.c:1142
               btrfs_mount+0x39f/0xb50 fs/btrfs/super.c:1579
               legacy_get_tree+0xef/0x190 fs/fs_context.c:611
               vfs_get_tree+0x8c/0x270 fs/super.c:1519
               do_new_mount+0x28f/0xae0 fs/namespace.c:3335
               do_mount fs/namespace.c:3675 [inline]
               __do_sys_mount fs/namespace.c:3884 [inline]
               __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
               do_syscall_x64 arch/x86/entry/common.c:50 [inline]
               do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
               entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        -> #0 (btrfs-tree-01){++++}-{3:3}:
               check_prev_add kernel/locking/lockdep.c:3142 [inline]
               check_prevs_add kernel/locking/lockdep.c:3261 [inline]
               validate_chain kernel/locking/lockdep.c:3876 [inline]
               __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
               lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
               down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
               __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
               btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
               btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
               btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
               btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
               btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
               btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
               btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
               btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
               btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
               btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
               vfs_ioctl fs/ioctl.c:51 [inline]
               __do_sys_ioctl fs/ioctl.c:870 [inline]
               __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
               do_syscall_x64 arch/x86/entry/common.c:50 [inline]
               do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
               entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          rlock(btrfs-tree-00);
                                       lock(btrfs-tree-01);
                                       lock(btrfs-tree-00);
          rlock(btrfs-tree-01);
      
         *** DEADLOCK ***
      
        1 lock held by syz-executor277/5012:
         #0: ffff88802df418e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
      
        stack backtrace:
        CPU: 1 PID: 5012 Comm: syz-executor277 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129 #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
        Call Trace:
         <TASK>
         __dump_stack lib/dump_stack.c:88 [inline]
         dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
         check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195
         check_prev_add kernel/locking/lockdep.c:3142 [inline]
         check_prevs_add kernel/locking/lockdep.c:3261 [inline]
         validate_chain kernel/locking/lockdep.c:3876 [inline]
         __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144
         lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761
         down_read_nested+0x49/0x2f0 kernel/locking/rwsem.c:1645
         __btrfs_tree_read_lock+0x2f/0x220 fs/btrfs/locking.c:136
         btrfs_tree_read_lock fs/btrfs/locking.c:142 [inline]
         btrfs_read_lock_root_node+0x292/0x3c0 fs/btrfs/locking.c:281
         btrfs_search_slot_get_root fs/btrfs/ctree.c:1832 [inline]
         btrfs_search_slot+0x4ff/0x2f80 fs/btrfs/ctree.c:2154
         btrfs_lookup_inode+0xdc/0x480 fs/btrfs/inode-item.c:412
         btrfs_read_locked_inode fs/btrfs/inode.c:3892 [inline]
         btrfs_iget_path+0x2d9/0x1520 fs/btrfs/inode.c:5716
         btrfs_search_path_in_tree_user fs/btrfs/ioctl.c:1961 [inline]
         btrfs_ioctl_ino_lookup_user+0x77a/0xf50 fs/btrfs/ioctl.c:2105
         btrfs_ioctl+0xb0b/0xd40 fs/btrfs/ioctl.c:4683
         vfs_ioctl fs/ioctl.c:51 [inline]
         __do_sys_ioctl fs/ioctl.c:870 [inline]
         __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
         do_syscall_x64 arch/x86/entry/common.c:50 [inline]
         do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
        RIP: 0033:0x7f0bec94ea39
      
      Fix this simply by releasing the path before calling btrfs_iget() as at
      point we don't need the path anymore.
      
      Reported-by: syzbot+bf66ad948981797d2f1d@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/linux-btrfs/00000000000045fa140603c4a969@google.com/
      Fixes: 23d0b79d
      
       ("btrfs: Add unprivileged version of ino_lookup ioctl")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ee34a82e
  8. 13 Jul, 2023 1 commit
  9. 19 Jun, 2023 3 commits
    • Qu Wenruo's avatar
      btrfs: trigger orphan inode cleanup during START_SYNC ioctl · edc72881
      Qu Wenruo authored
      
      There is an internal error report that scrub found an error in an orphan
      inode's data.
      
      However there are very limited ways to cleanup such orphan inodes:
      
      - btrfs_start_pre_rw_mount()
        This happens at either mount, or RO->RW switch.
        This is not a viable solution for root fs which may not be unmounted
        or RO mounted.
      
        Furthermore this doesn't cover every subvolume, it only covers the
        currently cached subvolumes.
      
      - btrfs_lookup_dentry()
        This happens when we first lookup the subvolume dentry.
        But dentry can be cached thus it's not ensured to be triggered every
        time.
      
      - create_snapshot()
        This only happens for the created snapshot, not the source one.
      
      This means if we didn't trigger orphan items cleanup, there is really no
      other way to manually trigger it. Add this step to the START_SYNC ioctl.
      This is a slight change in the semantics of the ioctl but as sync can be
      potentially slow and is usually paired with WAIT_SYNC ioctl.
      
      The errors are not handled because the main point of the ioctl is the
      async commit, orphan cleanup is a side effect.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      edc72881
    • Tom Rix's avatar
      btrfs: simplify transid initialization in btrfs_ioctl_wait_sync · 12df6a62
      Tom Rix authored
      
      A small code simplification, move the default value of transid to its
      initialization and remove the else-statement.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      12df6a62
    • Sweet Tea Dorminy's avatar
      btrfs: don't commit transaction for every subvol create · 1b53e51a
      Sweet Tea Dorminy authored
      
      Recently a Meta-internal workload encountered subvolume creation taking
      up to 2s each, significantly slower than directory creation. As they
      were hoping to be able to use subvolumes instead of directories, and
      were looking to create hundreds, this was a significant issue. After
      Josef investigated, it turned out to be due to the transaction commit
      currently performed at the end of subvolume creation.
      
      This change improves the workload by not doing transaction commit for every
      subvolume creation, and merely requiring a transaction commit on fsync.
      In the worst case, of doing a subvolume create and fsync in a loop, this
      should require an equal amount of time to the current scheme; and in the
      best case, the internal workload creating hundreds of subvolumes before
      fsyncing is greatly improved.
      
      While it would be nice to be able to use the log tree and use the normal
      fsync path, log tree replay can't deal with new subvolume inodes
      presently.
      
      It's possible that there's some reason that the transaction commit is
      necessary for correctness during subvolume creation; however,
      git logs indicate that the commit dates back to the beginning of
      subvolume creation, and there are no notes on why it would be necessary.
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarNeal Gompa <neal@gompa.dev>
      Signed-off-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1b53e51a
  10. 12 Jun, 2023 1 commit
  11. 28 Apr, 2023 1 commit
    • xiaoshoukui's avatar
      btrfs: fix assertion of exclop condition when starting balance · ac868bc9
      xiaoshoukui authored
      Balance as exclusive state is compatible with paused balance and device
      add, which makes some things more complicated. The assertion of valid
      states when starting from paused balance needs to take into account two
      more states, the combinations can be hit when there are several threads
      racing to start balance and device add. This won't typically happen when
      the commands are started from command line.
      
      Scenario 1: With exclusive_operation state == BTRFS_EXCLOP_NONE.
      
      Concurrently adding multiple devices to the same mount point and
      btrfs_exclop_finish executed finishes before assertion in
      btrfs_exclop_balance, exclusive_operation will changed to
      BTRFS_EXCLOP_NONE state which lead to assertion failed:
      
        fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE ||
        fs_info->exclusive_operation == BTRFS_EXCLOP_DEV_ADD,
        in fs/btrfs/ioctl.c:456
        Call Trace:
         <TASK>
         btrfs_exclop_balance+0x13c/0x310
         ? memdup_user+0xab/0xc0
         ? PTR_ERR+0x17/0x20
         btrfs_ioctl_add_dev+0x2ee/0x320
         btrfs_ioctl+0x9d5/0x10d0
         ? btrfs_ioctl_encoded_write+0xb80/0xb80
         __x64_sys_ioctl+0x197/0x210
         do_syscall_64+0x3c/0xb0
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Scenario 2: With exclusive_operation state == BTRFS_EXCLOP_BALANCE_PAUSED.
      
      Concurrently adding multiple devices to the same mount point and
      btrfs_exclop_balance executed finish before the latter thread execute
      assertion in btrfs_exclop_balance, exclusive_operation will changed to
      BTRFS_EXCLOP_BALANCE_PAUSED state which lead to assertion failed:
      
        fs_info->exclusive_operation == BTRFS_EXCLOP_BALANCE ||
        fs_info->exclusive_operation == BTRFS_EXCLOP_DEV_ADD ||
        fs_info->exclusive_operation == BTRFS_EXCLOP_NONE,
        fs/btrfs/ioctl.c:458
        Call Trace:
         <TASK>
         btrfs_exclop_balance+0x240/0x410
         ? memdup_user+0xab/0xc0
         ? PTR_ERR+0x17/0x20
         btrfs_ioctl_add_dev+0x2ee/0x320
         btrfs_ioctl+0x9d5/0x10d0
         ? btrfs_ioctl_encoded_write+0xb80/0xb80
         __x64_sys_ioctl+0x197/0x210
         do_syscall_64+0x3c/0xb0
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      An example of the failed assertion is below, which shows that the
      paused balance is also needed to be checked.
      
        root@syzkaller:/home/xsk# ./repro
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        [  416.611428][ T7970] BTRFS info (device loop0): fs_info exclusive_operation: 0
        Failed to add device /dev/vda, errno 14
        [  416.613973][ T7971] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.615456][ T7972] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.617528][ T7973] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.618359][ T7974] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.622589][ T7975] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.624034][ T7976] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.626420][ T7977] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.627643][ T7978] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.629006][ T7979] BTRFS info (device loop0): fs_info exclusive_operation: 3
        [  416.630298][ T7980] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        Failed to add device /dev/vda, errno 14
        [  416.632787][ T7981] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.634282][ T7982] BTRFS info (device loop0): fs_info exclusive_operation: 3
        Failed to add device /dev/vda, errno 14
        [  416.636202][ T7983] BTRFS info (device loop0): fs_info exclusive_operation: 3
        [  416.637012][ T7984] BTRFS info (device loop0): fs_info exclusive_operation: 1
        Failed to add device /dev/vda, errno 14
        [  416.637759][ T7984] assertion failed: fs_info->exclusive_operation ==
        BTRFS_EXCLOP_BALANCE || fs_info->exclusive_operation ==
        BTRFS_EXCLOP_DEV_ADD || fs_info->exclusive_operation ==
        BTRFS_EXCLOP_NONE, in fs/btrfs/ioctl.c:458
        [  416.639845][ T7984] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
        [  416.640485][ T7984] CPU: 0 PID: 7984 Comm: repro Not tainted 6.2.0 #7
        [  416.641172][ T7984] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [  416.642090][ T7984] RIP: 0010:btrfs_assertfail+0x2c/0x2e
        [  416.644423][ T7984] RSP: 0018:ffffc90003ea7e28 EFLAGS: 00010282
        [  416.645018][ T7984] RAX: 00000000000000cc RBX: 0000000000000000 RCX: 0000000000000000
        [  416.645763][ T7984] RDX: ffff88801d030000 RSI: ffffffff81637e7c RDI: fffff520007d4fb7
        [  416.646554][ T7984] RBP: ffffffff8a533de0 R08: 00000000000000cc R09: 0000000000000000
        [  416.647299][ T7984] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8a533da0
        [  416.648041][ T7984] R13: 00000000000001ca R14: 000000005000940a R15: 0000000000000000
        [  416.648785][ T7984] FS:  00007fa2985d4640(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000
        [  416.649616][ T7984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [  416.650238][ T7984] CR2: 0000000000000000 CR3: 0000000018e5e000 CR4: 0000000000750ef0
        [  416.650980][ T7984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [  416.651725][ T7984] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [  416.652502][ T7984] PKRU: 55555554
        [  416.652888][ T7984] Call Trace:
        [  416.653241][ T7984]  <TASK>
        [  416.653527][ T7984]  btrfs_exclop_balance+0x240/0x410
        [  416.654036][ T7984]  ? memdup_user+0xab/0xc0
        [  416.654465][ T7984]  ? PTR_ERR+0x17/0x20
        [  416.654874][ T7984]  btrfs_ioctl_add_dev+0x2ee/0x320
        [  416.655380][ T7984]  btrfs_ioctl+0x9d5/0x10d0
        [  416.655822][ T7984]  ? btrfs_ioctl_encoded_write+0xb80/0xb80
        [  416.656400][ T7984]  __x64_sys_ioctl+0x197/0x210
        [  416.656874][ T7984]  do_syscall_64+0x3c/0xb0
        [  416.657346][ T7984]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [  416.657922][ T7984] RIP: 0033:0x4546af
        [  416.660170][ T7984] RSP: 002b:00007fa2985d4150 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [  416.660972][ T7984] RAX: ffffffffffffffda RBX: 00007fa2985d4640 RCX: 00000000004546af
        [  416.661714][ T7984] RDX: 0000000000000000 RSI: 000000005000940a RDI: 0000000000000003
        [  416.662449][ T7984] RBP: 00007fa2985d41d0 R08: 0000000000000000 R09: 00007ffee37a4c4f
        [  416.663195][ T7984] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa2985d4640
        [  416.663951][ T7984] R13: 0000000000000009 R14: 000000000041b320 R15: 00007fa297dd4000
        [  416.664703][ T7984]  </TASK>
        [  416.665040][ T7984] Modules linked in:
        [  416.665590][ T7984] ---[ end trace 0000000000000000 ]---
        [  416.666176][ T7984] RIP: 0010:btrfs_assertfail+0x2c/0x2e
        [  416.668775][ T7984] RSP: 0018:ffffc90003ea7e28 EFLAGS: 00010282
        [  416.669425][ T7984] RAX: 00000000000000cc RBX: 0000000000000000 RCX: 0000000000000000
        [  416.670235][ T7984] RDX: ffff88801d030000 RSI: ffffffff81637e7c RDI: fffff520007d4fb7
        [  416.671050][ T7984] RBP: ffffffff8a533de0 R08: 00000000000000cc R09: 0000000000000000
        [  416.671867][ T7984] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8a533da0
        [  416.672685][ T7984] R13: 00000000000001ca R14: 000000005000940a R15: 0000000000000000
        [  416.673501][ T7984] FS:  00007fa2985d4640(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000
        [  416.674425][ T7984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [  416.675114][ T7984] CR2: 0000000000000000 CR3: 0000000018e5e000 CR4: 0000000000750ef0
        [  416.675933][ T7984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [  416.676760][ T7984] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Link: https://lore.kernel.org/linux-btrfs/20230324031611.98986-1-xiaoshoukui@gmail.com/
      
      
      CC: stable@vger.kernel.org # 6.1+
      Signed-off-by: default avatarxiaoshoukui <xiaoshoukui@ruijie.com.cn>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ac868bc9
  12. 17 Apr, 2023 1 commit
    • Qu Wenruo's avatar
      btrfs: scrub: reject unsupported scrub flags · 604e6681
      Qu Wenruo authored
      
      Since the introduction of scrub interface, the only flag that we support
      is BTRFS_SCRUB_READONLY.  Thus there is no sanity checks, if there are
      some undefined flags passed in, we just ignore them.
      
      This is problematic if we want to introduce new scrub flags, as we have
      no way to determine if such flags are supported.
      
      Address the problem by introducing a check for the flags, and if
      unsupported flags are set, return -EOPNOTSUPP to inform the user space.
      
      This check should be backported for all supported kernels before any new
      scrub flags are introduced.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      604e6681
  13. 27 Mar, 2023 1 commit
    • Filipe Manana's avatar
      btrfs: fix race between quota disable and quota assign ioctls · 2f1a6be1
      Filipe Manana authored
      
      The quota assign ioctl can currently run in parallel with a quota disable
      ioctl call. The assign ioctl uses the quota root, while the disable ioctl
      frees that root, and therefore we can have a use-after-free triggered in
      the assign ioctl, leading to a trace like the following when KASAN is
      enabled:
      
        [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
        [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
        [672.724][T736]
        [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
        [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        [672.727][T736] Call Trace:
        [672.728][T736]  <TASK>
        [672.728][T736]  dump_stack_lvl+0xd9/0x150
        [672.725][T736]  print_report+0xc1/0x5e0
        [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
        [672.727][T736]  ? __phys_addr+0xc9/0x150
        [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.722][T736]  kasan_report+0xc0/0xf0
        [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
        [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
        [672.722][T736]  ? split_leaf+0x13d0/0x13d0
        [672.726][T736]  ? rcu_is_watching+0x12/0xb0
        [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
        [672.722][T736]  update_qgroup_status_item+0xf7/0x320
        [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
        [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
        [672.730][T736]  ? spin_bug+0x1d0/0x1d0
        [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
        [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
        [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
        [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
        [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
        [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
        [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
        [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
        [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
        [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
        [672.732][T736]  ? sigprocmask+0xf2/0x340
        [672.737][T736]  ? __fget_files+0x26a/0x480
        [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
        [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.736][T736]  __x64_sys_ioctl+0x198/0x210
        [672.736][T736]  do_syscall_64+0x39/0xb0
        [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.739][T736] RIP: 0033:0x4556ad
        [672.742][T736]  </TASK>
        [672.743][T736]
        [672.748][T736] Allocated by task 27677:
        [672.743][T736]  kasan_save_stack+0x22/0x40
        [672.741][T736]  kasan_set_track+0x25/0x30
        [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
        [672.749][T736]  btrfs_alloc_root+0x48/0x90
        [672.746][T736]  btrfs_create_tree+0x146/0xa20
        [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
        [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
        [672.747][T736]  __x64_sys_ioctl+0x198/0x210
        [672.749][T736]  do_syscall_64+0x39/0xb0
        [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.756][T736]
        [672.757][T736] Freed by task 27677:
        [672.759][T736]  kasan_save_stack+0x22/0x40
        [672.759][T736]  kasan_set_track+0x25/0x30
        [672.756][T736]  kasan_save_free_info+0x2e/0x50
        [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
        [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
        [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
        [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
        [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
        [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
        [672.756][T736]  __x64_sys_ioctl+0x198/0x210
        [672.753][T736]  do_syscall_64+0x39/0xb0
        [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.769][T736]
        [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
        [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
        [672.769][T736] The buggy address is located 520 bytes inside of
        [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
        [672.760][T736]
        [672.764][T736] The buggy address belongs to the physical page:
        [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
        [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
        [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
        [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
        [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
        [672.771][T736] page dumped because: kasan: bad access detected
        [672.778][T736] page_owner tracks the page as allocated
        [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
        [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
        [672.779][T736]  __alloc_pages+0x1cb/0x4a0
        [672.776][T736]  alloc_pages+0x1aa/0x270
        [672.773][T736]  allocate_slab+0x260/0x390
        [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
        [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
        [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
        [672.789][T736]  __kmalloc+0x4e/0x1a0
        [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
        [672.781][T736]  tomoyo_path_perm+0x22f/0x420
        [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
        [672.780][T736]  security_path_unlink+0xdb/0x150
        [672.788][T736]  do_unlinkat+0x377/0x680
        [672.788][T736]  __x64_sys_unlink+0xca/0x110
        [672.789][T736]  do_syscall_64+0x39/0xb0
        [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.784][T736] page last free stack trace:
        [672.787][T736]  free_pcp_prepare+0x4e5/0x920
        [672.787][T736]  free_unref_page+0x1d/0x4e0
        [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
        [672.797][T736]  qlist_free_all+0x6a/0x180
        [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
        [672.797][T736]  __kasan_slab_alloc+0x64/0x90
        [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
        [672.799][T736]  getname_flags.part.0+0x50/0x4e0
        [672.799][T736]  getname_flags+0x9e/0xe0
        [672.792][T736]  vfs_fstatat+0x77/0xb0
        [672.791][T736]  __do_sys_newlstat+0x84/0x100
        [672.798][T736]  do_syscall_64+0x39/0xb0
        [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.790][T736]
        [672.791][T736] Memory state around the buggy address:
        [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]                       ^
        [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
      before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
      call.
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
      
      
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2f1a6be1
  14. 06 Mar, 2023 1 commit
    • Qu Wenruo's avatar
      btrfs: ioctl: return device fsid from DEV_INFO ioctl · 2943868a
      Qu Wenruo authored
      Currently user space utilizes dev info ioctl to grab the info of a
      certain devid, this includes its device uuid.  But the returned info is
      not enough to determine if a device is a seed.
      
      Commit a26d60de
      
       ("btrfs: sysfs: add devinfo/fsid to retrieve actual
      fsid from the device") exports the same value in sysfs so this is for
      parity with ioctl.  Add a new member, fsid, into
      btrfs_ioctl_dev_info_args, and populate the member with fsid value.
      
      This should not cause any compatibility problem, following the
      combinations:
      
      - Old user space, old kernel
      - Old user space, new kernel
        User space tool won't even check the new member.
      
      - New user space, old kernel
        The kernel won't touch the new member, and user space tool should
        zero out its argument, thus the new member is all zero.
      
        User space tool can then know the kernel doesn't support this fsid
        reporting, and falls back to whatever they can.
      
      - New user space, new kernel
        Go as planned.
      
        Would find the fsid member is no longer zero, and trust its value.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2943868a
  15. 15 Feb, 2023 2 commits
  16. 19 Jan, 2023 5 commits
    • Christian Brauner's avatar
      fs: port privilege checking helpers to mnt_idmap · 9452e93e
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      9452e93e
    • Christian Brauner's avatar
      fs: port inode_owner_or_capable() to mnt_idmap · 01beba79
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      01beba79
    • Christian Brauner's avatar
      fs: port inode_init_owner() to mnt_idmap · f2d40141
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      f2d40141
    • Christian Brauner's avatar
      fs: port ->permission() to pass mnt_idmap · 4609e1f1
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4609e1f1
    • Christian Brauner's avatar
      fs: port ->fileattr_set() to pass mnt_idmap · 8782a9ae
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      8782a9ae
  17. 05 Dec, 2022 10 commits