1. 27 Mar, 2023 2 commits
    • Anand Jain's avatar
      btrfs: scan device in non-exclusive mode · 50d281fc
      Anand Jain authored
      This fixes mkfs/mount/check failures due to race with systemd-udevd
      scan.
      
      During the device scan initiated by systemd-udevd, other user space
      EXCL operations such as mkfs, mount, or check may get blocked and result
      in a "Device or resource busy" error. This is because the device
      scan process opens the device with the EXCL flag in the kernel.
      
      Two reports were received:
      
       - btrfs/179 test case, where the fsck command failed with the -EBUSY
         error
      
       - LTP pwritev03 test case, where mkfs.vfs failed with
         the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
         on the device.
      
      In both cases, fsck and mkfs (respectively) were racing with a
      systemd-udevd device scan, and systemd-udevd won, resulting in the
      -EBUSY error for fsck and mkfs.
      
      Reproducing the problem has been difficult because there is a very
      small window during which these userspace threads can race to
      acquire the exclusive device open. Even on the system where the problem
      was observed, the problem occurrences were anywhere between 10 to 400
      iterations and chances of reproducing decreases with debug printk()s.
      
      However, an exclusive device open is unnecessary for the scan process,
      as there are no write operations on the device during scan. Furthermore,
      during the mount process, the superblock is re-read in the below
      function call chain:
      
        btrfs_mount_root
         btrfs_open_devices
          open_fs_devices
           btrfs_open_one_device
             btrfs_get_bdev_and_sb
      
      So, to fix this issue, removes the FMODE_EXCL flag from the scan
      operation, and add a comment.
      
      The case where mkfs may still write to the device and a scan is running,
      the btrfs signature is not written at that time so scan will not
      recognize such device.
      Reported-by: default avatarSherry Yang <sherry.yang@oracle.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      50d281fc
    • Filipe Manana's avatar
      btrfs: fix race between quota disable and quota assign ioctls · 2f1a6be1
      Filipe Manana authored
      The quota assign ioctl can currently run in parallel with a quota disable
      ioctl call. The assign ioctl uses the quota root, while the disable ioctl
      frees that root, and therefore we can have a use-after-free triggered in
      the assign ioctl, leading to a trace like the following when KASAN is
      enabled:
      
        [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
        [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
        [672.724][T736]
        [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
        [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        [672.727][T736] Call Trace:
        [672.728][T736]  <TASK>
        [672.728][T736]  dump_stack_lvl+0xd9/0x150
        [672.725][T736]  print_report+0xc1/0x5e0
        [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
        [672.727][T736]  ? __phys_addr+0xc9/0x150
        [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.722][T736]  kasan_report+0xc0/0xf0
        [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
        [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
        [672.722][T736]  ? split_leaf+0x13d0/0x13d0
        [672.726][T736]  ? rcu_is_watching+0x12/0xb0
        [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
        [672.722][T736]  update_qgroup_status_item+0xf7/0x320
        [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
        [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
        [672.730][T736]  ? spin_bug+0x1d0/0x1d0
        [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
        [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
        [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
        [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
        [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
        [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
        [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
        [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
        [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
        [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
        [672.732][T736]  ? sigprocmask+0xf2/0x340
        [672.737][T736]  ? __fget_files+0x26a/0x480
        [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
        [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.736][T736]  __x64_sys_ioctl+0x198/0x210
        [672.736][T736]  do_syscall_64+0x39/0xb0
        [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.739][T736] RIP: 0033:0x4556ad
        [672.742][T736]  </TASK>
        [672.743][T736]
        [672.748][T736] Allocated by task 27677:
        [672.743][T736]  kasan_save_stack+0x22/0x40
        [672.741][T736]  kasan_set_track+0x25/0x30
        [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
        [672.749][T736]  btrfs_alloc_root+0x48/0x90
        [672.746][T736]  btrfs_create_tree+0x146/0xa20
        [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
        [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
        [672.747][T736]  __x64_sys_ioctl+0x198/0x210
        [672.749][T736]  do_syscall_64+0x39/0xb0
        [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.756][T736]
        [672.757][T736] Freed by task 27677:
        [672.759][T736]  kasan_save_stack+0x22/0x40
        [672.759][T736]  kasan_set_track+0x25/0x30
        [672.756][T736]  kasan_save_free_info+0x2e/0x50
        [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
        [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
        [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
        [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
        [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
        [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
        [672.756][T736]  __x64_sys_ioctl+0x198/0x210
        [672.753][T736]  do_syscall_64+0x39/0xb0
        [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.769][T736]
        [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
        [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
        [672.769][T736] The buggy address is located 520 bytes inside of
        [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
        [672.760][T736]
        [672.764][T736] The buggy address belongs to the physical page:
        [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
        [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
        [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
        [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
        [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
        [672.771][T736] page dumped because: kasan: bad access detected
        [672.778][T736] page_owner tracks the page as allocated
        [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
        [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
        [672.779][T736]  __alloc_pages+0x1cb/0x4a0
        [672.776][T736]  alloc_pages+0x1aa/0x270
        [672.773][T736]  allocate_slab+0x260/0x390
        [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
        [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
        [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
        [672.789][T736]  __kmalloc+0x4e/0x1a0
        [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
        [672.781][T736]  tomoyo_path_perm+0x22f/0x420
        [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
        [672.780][T736]  security_path_unlink+0xdb/0x150
        [672.788][T736]  do_unlinkat+0x377/0x680
        [672.788][T736]  __x64_sys_unlink+0xca/0x110
        [672.789][T736]  do_syscall_64+0x39/0xb0
        [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.784][T736] page last free stack trace:
        [672.787][T736]  free_pcp_prepare+0x4e5/0x920
        [672.787][T736]  free_unref_page+0x1d/0x4e0
        [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
        [672.797][T736]  qlist_free_all+0x6a/0x180
        [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
        [672.797][T736]  __kasan_slab_alloc+0x64/0x90
        [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
        [672.799][T736]  getname_flags.part.0+0x50/0x4e0
        [672.799][T736]  getname_flags+0x9e/0xe0
        [672.792][T736]  vfs_fstatat+0x77/0xb0
        [672.791][T736]  __do_sys_newlstat+0x84/0x100
        [672.798][T736]  do_syscall_64+0x39/0xb0
        [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.790][T736]
        [672.791][T736] Memory state around the buggy address:
        [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]                       ^
        [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
      before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
      call.
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2f1a6be1
  2. 15 Mar, 2023 7 commits
  3. 08 Mar, 2023 1 commit
    • Filipe Manana's avatar
      btrfs: fix block group item corruption after inserting new block group · 675dfe12
      Filipe Manana authored
      We can often end up inserting a block group item, for a new block group,
      with a wrong value for the used bytes field.
      
      This happens if for the new allocated block group, in the same transaction
      that created the block group, we have tasks allocating extents from it as
      well as tasks removing extents from it.
      
      For example:
      
      1) Task A creates a metadata block group X;
      
      2) Two extents are allocated from block group X, so its "used" field is
         updated to 32K, and its "commit_used" field remains as 0;
      
      3) Transaction commit starts, by some task B, and it enters
         btrfs_start_dirty_block_groups(). There it tries to update the block
         group item for block group X, which currently has its "used" field with
         a value of 32K. But that fails since the block group item was not yet
         inserted, and so on failure update_block_group_item() sets the
         "commit_used" field of the block group back to 0;
      
      4) The block group item is inserted by task A, when for example
         btrfs_create_pending_block_groups() is called when releasing its
         transaction handle. This results in insert_block_group_item() inserting
         the block group item in the extent tree (or block group tree), with a
         "used" field having a value of 32K, but without updating the
         "commit_used" field in the block group, which remains with value of 0;
      
      5) The two extents are freed from block X, so its "used" field changes
         from 32K to 0;
      
      6) The transaction commit by task B continues, it enters
         btrfs_write_dirty_block_groups() which calls update_block_group_item()
         for block group X, and there it decides to skip the block group item
         update, because "used" has a value of 0 and "commit_used" has a value
         of 0 too.
      
         As a result, we end up with a block item having a 32K "used" field but
         no extents allocated from it.
      
      When this issue happens, a btrfs check reports an error like this:
      
         [1/7] checking root items
         [2/7] checking extents
         block group [1104150528 1073741824] used 39796736 but extent items used 0
         ERROR: errors found in extent allocation tree or chunk allocation
         (...)
      
      Fix this by making insert_block_group_item() update the block group's
      "commit_used" field.
      
      Fixes: 7248e0ce ("btrfs: skip update of block group item if used bytes are the same")
      CC: stable@vger.kernel.org # 6.2+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      675dfe12
  4. 06 Mar, 2023 6 commits
    • Filipe Manana's avatar
      btrfs: fix extent map logging bit not cleared for split maps after dropping range · e4cc1483
      Filipe Manana authored
      At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
      bit on a 'flags' variable that was not initialized. This makes static
      checkers complain about it, so initialize the 'flags' variable before
      clearing the bit.
      
      In practice this has no consequences, because EXTENT_FLAG_LOGGING should
      not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
      the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
      mode too and it always flushes all delalloc.
      
      Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
      flags of the split extent map.
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
      Fixes: db21370b ("btrfs: drop extent map range more efficiently")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e4cc1483
    • Johannes Thumshirn's avatar
      btrfs: fix percent calculation for bg reclaim message · 95cd356c
      Johannes Thumshirn authored
      We have a report, that the info message for block-group reclaim is
      crossing the 100% used mark.
      
      This is happening as we were truncating the divisor for the division
      (the block_group->length) to a 32bit value.
      
      Fix this by using div64_u64() to not truncate the divisor.
      
      In the worst case, it can lead to a div by zero error and should be
      possible to trigger on 4 disks RAID0, and each device is large enough:
      
        $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
        btrfs-progs v6.1
        [...]
        Filesystem size:    40.00GiB
        Block group profiles:
          Data:             RAID0             4.00GiB <<<
          Metadata:         RAID1           256.00MiB
          System:           RAID1             8.00MiB
      Reported-by: default avatarForza <forza@tnonline.net>
      Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
      Fixes: 5f93e776 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ add Qu's note ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      95cd356c
    • Naohiro Aota's avatar
      btrfs: fix unnecessary increment of read error stat on write error · 98e8d36a
      Naohiro Aota authored
      Current btrfs_log_dev_io_error() increases the read error count even if the
      erroneous IO is a WRITE request. This is because it forget to use "else
      if", and all the error WRITE requests counts as READ error as there is (of
      course) no REQ_RAHEAD bit set.
      
      Fixes: c3a62baf ("btrfs: use chained bios when cloning")
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      98e8d36a
    • void0red's avatar
      btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode · c06016a0
      void0red authored
      Even if the slot is already read out, we may still need to re-balance
      the tree, thus it can cause error in that btrfs_del_item() call and we
      need to handle it properly.
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarvoid0red <void0red@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c06016a0
    • Qu Wenruo's avatar
      btrfs: ioctl: return device fsid from DEV_INFO ioctl · 2943868a
      Qu Wenruo authored
      Currently user space utilizes dev info ioctl to grab the info of a
      certain devid, this includes its device uuid.  But the returned info is
      not enough to determine if a device is a seed.
      
      Commit a26d60de ("btrfs: sysfs: add devinfo/fsid to retrieve actual
      fsid from the device") exports the same value in sysfs so this is for
      parity with ioctl.  Add a new member, fsid, into
      btrfs_ioctl_dev_info_args, and populate the member with fsid value.
      
      This should not cause any compatibility problem, following the
      combinations:
      
      - Old user space, old kernel
      - Old user space, new kernel
        User space tool won't even check the new member.
      
      - New user space, old kernel
        The kernel won't touch the new member, and user space tool should
        zero out its argument, thus the new member is all zero.
      
        User space tool can then know the kernel doesn't support this fsid
        reporting, and falls back to whatever they can.
      
      - New user space, new kernel
        Go as planned.
      
        Would find the fsid member is no longer zero, and trust its value.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2943868a
    • Boris Burkov's avatar
      btrfs: fix potential dead lock in size class loading logic · 12148367
      Boris Burkov authored
      As reported by Filipe, there's a potential deadlock caused by
      using btrfs_search_forward on commit_root. The locking there is
      unconditional, even if ->skip_locking and ->search_commit_root is set.
      It's not meant to be used for commit roots, so it always needs to do
      locking.
      
      So if another task is COWing a child node of the same root node and
      then needs to wait for block group caching to complete when trying to
      allocate a metadata extent, it deadlocks.
      
      For example:
      
      [539604.239315] sysrq: Show Blocked State
      [539604.240133] task:kworker/u16:6   state:D stack:0     pid:2119594 ppid:2      flags:0x00004000
      [539604.241613] Workqueue: btrfs-cache btrfs_work_helper [btrfs]
      [539604.242673] Call Trace:
      [539604.243129]  <TASK>
      [539604.243925]  __schedule+0x41d/0xee0
      [539604.244797]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.245399]  ? rwsem_down_read_slowpath+0x185/0x490
      [539604.246111]  schedule+0x5d/0xf0
      [539604.246593]  rwsem_down_read_slowpath+0x2da/0x490
      [539604.247290]  ? rcu_barrier_tasks_trace+0x10/0x20
      [539604.248090]  __down_read_common+0x3d/0x150
      [539604.248702]  down_read_nested+0xc3/0x140
      [539604.249280]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
      [539604.250097]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
      [539604.250915]  btrfs_search_forward+0x59/0x460 [btrfs]
      [539604.251781]  ? btrfs_global_root+0x50/0x70 [btrfs]
      [539604.252476]  caching_thread+0x1be/0x920 [btrfs]
      [539604.253167]  btrfs_work_helper+0xf6/0x400 [btrfs]
      [539604.253848]  process_one_work+0x24f/0x5a0
      [539604.254476]  worker_thread+0x52/0x3b0
      [539604.255166]  ? __pfx_worker_thread+0x10/0x10
      [539604.256047]  kthread+0xf0/0x120
      [539604.256591]  ? __pfx_kthread+0x10/0x10
      [539604.257212]  ret_from_fork+0x29/0x50
      [539604.257822]  </TASK>
      [539604.258233] task:btrfs-transacti state:D stack:0     pid:2236474 ppid:2      flags:0x00004000
      [539604.259802] Call Trace:
      [539604.260243]  <TASK>
      [539604.260615]  __schedule+0x41d/0xee0
      [539604.261205]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.262000]  ? rwsem_down_read_slowpath+0x185/0x490
      [539604.262822]  schedule+0x5d/0xf0
      [539604.263374]  rwsem_down_read_slowpath+0x2da/0x490
      [539604.266228]  ? lock_acquire+0x160/0x310
      [539604.266917]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.267996]  ? lock_contended+0x19e/0x500
      [539604.268720]  __down_read_common+0x3d/0x150
      [539604.269400]  down_read_nested+0xc3/0x140
      [539604.270057]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
      [539604.271129]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
      [539604.272372]  btrfs_search_slot+0x143/0xf70 [btrfs]
      [539604.273295]  update_block_group_item+0x9e/0x190 [btrfs]
      [539604.274282]  btrfs_start_dirty_block_groups+0x1c4/0x4f0 [btrfs]
      [539604.275381]  ? __mutex_unlock_slowpath+0x45/0x280
      [539604.276390]  btrfs_commit_transaction+0xee/0xed0 [btrfs]
      [539604.277391]  ? lock_acquire+0x1a4/0x310
      [539604.278080]  ? start_transaction+0xcb/0x6c0 [btrfs]
      [539604.279099]  transaction_kthread+0x142/0x1c0 [btrfs]
      [539604.279996]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
      [539604.280673]  kthread+0xf0/0x120
      [539604.281050]  ? __pfx_kthread+0x10/0x10
      [539604.281496]  ret_from_fork+0x29/0x50
      [539604.281966]  </TASK>
      [539604.282255] task:fsstress        state:D stack:0     pid:2236483 ppid:1      flags:0x00004006
      [539604.283897] Call Trace:
      [539604.284700]  <TASK>
      [539604.285088]  __schedule+0x41d/0xee0
      [539604.285660]  schedule+0x5d/0xf0
      [539604.286175]  btrfs_wait_block_group_cache_progress+0xf2/0x170 [btrfs]
      [539604.287342]  ? __pfx_autoremove_wake_function+0x10/0x10
      [539604.288450]  find_free_extent+0xd93/0x1750 [btrfs]
      [539604.289256]  ? _raw_spin_unlock+0x29/0x50
      [539604.289911]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
      [539604.290843]  btrfs_reserve_extent+0x147/0x290 [btrfs]
      [539604.291943]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
      [539604.292903]  __btrfs_cow_block+0x138/0x580 [btrfs]
      [539604.293773]  btrfs_cow_block+0x10e/0x240 [btrfs]
      [539604.294595]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
      [539604.295585]  btrfs_update_device+0x71/0x1b0 [btrfs]
      [539604.296459]  btrfs_chunk_alloc_add_chunk_item+0xe0/0x340 [btrfs]
      [539604.297489]  btrfs_chunk_alloc+0x1bf/0x490 [btrfs]
      [539604.298335]  find_free_extent+0x6fa/0x1750 [btrfs]
      [539604.299174]  ? _raw_spin_unlock+0x29/0x50
      [539604.299950]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
      [539604.300918]  btrfs_reserve_extent+0x147/0x290 [btrfs]
      [539604.301797]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
      [539604.303017]  ? lock_release+0x224/0x4a0
      [539604.303855]  __btrfs_cow_block+0x138/0x580 [btrfs]
      [539604.304789]  btrfs_cow_block+0x10e/0x240 [btrfs]
      [539604.305611]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
      [539604.306682]  ? btrfs_global_root+0x50/0x70 [btrfs]
      [539604.308198]  lookup_inline_extent_backref+0x17b/0x7a0 [btrfs]
      [539604.309254]  lookup_extent_backref+0x43/0xd0 [btrfs]
      [539604.310122]  __btrfs_free_extent+0xf8/0x810 [btrfs]
      [539604.310874]  ? lock_release+0x224/0x4a0
      [539604.311724]  ? btrfs_merge_delayed_refs+0x17b/0x1d0 [btrfs]
      [539604.313023]  __btrfs_run_delayed_refs+0x2ba/0x1260 [btrfs]
      [539604.314271]  btrfs_run_delayed_refs+0x8f/0x1c0 [btrfs]
      [539604.315445]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.316706]  btrfs_commit_transaction+0xa2/0xed0 [btrfs]
      [539604.317855]  ? do_raw_spin_unlock+0x4b/0xa0
      [539604.318544]  ? _raw_spin_unlock+0x29/0x50
      [539604.319240]  create_subvol+0x53d/0x6e0 [btrfs]
      [539604.320283]  btrfs_mksubvol+0x4f5/0x590 [btrfs]
      [539604.321220]  __btrfs_ioctl_snap_create+0x11b/0x180 [btrfs]
      [539604.322307]  btrfs_ioctl_snap_create_v2+0xc6/0x150 [btrfs]
      [539604.323295]  btrfs_ioctl+0x9f7/0x33e0 [btrfs]
      [539604.324331]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.325137]  ? lock_release+0x224/0x4a0
      [539604.325808]  ? __x64_sys_ioctl+0x87/0xc0
      [539604.326467]  __x64_sys_ioctl+0x87/0xc0
      [539604.327109]  do_syscall_64+0x38/0x90
      [539604.327875]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [539604.328792] RIP: 0033:0x7f05a7babaeb
      
      This needs to use regular btrfs_search_slot() with some skip and stop
      logic.
      
      Since we only consider five samples (five search slots), don't bother
      with the complexity of looking for commit_root_sem contention. If
      necessary, it can be added to the load function in between samples.
      Reported-by: default avatarFilipe Manana <fdmanana@kernel.org>
      Link: https://lore.kernel.org/linux-btrfs/CAL3q7H7eKMD44Z1+=Kb-1RFMMeZpAm2fwyO59yeBwCcSOU80Pg@mail.gmail.com/
      Fixes: c7eec3d9 ("btrfs: load block group size class when caching")
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      12148367
  5. 01 Mar, 2023 1 commit
  6. 15 Feb, 2023 23 commits