1. 20 Jul, 2023 4 commits
    • Linus Torvalds's avatar
      Merge tag 'iomap-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e599e16c
      Linus Torvalds authored
      Pull iomap fix from Darrick Wong:
       "Fix partial write regression.
      
        It turns out that fstests doesn't have any test coverage for short
        writes, but LTP does. Fortunately, this was caught right after -rc1
        was tagged.
      
        Summary:
      
         - Fix a bug wherein a failed write could clobber short write status"
      
      * tag 'iomap-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: micro optimize the ki_pos assignment in iomap_file_buffered_write
        iomap: fix a regression for partial write errors
      e599e16c
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 69435880
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Flexarray declaration conversions.
      
        This probably should've been done with the merge window open, but I
        was not aware that the UBSAN knob would be getting turned up for 6.5,
        and the fstests failures due to the kernel warnings are getting in the
        way of testing.
      
        Summary:
      
         - Convert all the array[1] declarations into the accepted flex
           array[] declarations so that UBSAN and friends will not get
           confused"
      
      * tag 'xfs-6.5-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: convert flex-array declarations in xfs attr shortform objects
        xfs: convert flex-array declarations in xfs attr leaf blocks
        xfs: convert flex-array declarations in struct xfs_attrlist*
      69435880
    • Linus Torvalds's avatar
      Merge tag 'for-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 46670259
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Stable fixes:
      
         - fix race between balance and cancel/pause
      
         - various iput() fixes
      
         - fix use-after-free of new block group that became unused
      
         - fix warning when putting transaction with qgroups enabled after
           abort
      
         - fix crash in subpage mode when page could be released between map
           and map read
      
         - when scrubbing raid56 verify the P/Q stripes unconditionally
      
         - fix minor memory leak in zoned mode when a block group with an
           unexpected superblock is found
      
        Regression fixes:
      
         - fix ordered extent split error handling when submitting direct IO
      
         - user irq-safe locking when adding delayed iputs"
      
      * tag 'for-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix warning when putting transaction with qgroups enabled after abort
        btrfs: fix ordered extent split error handling in btrfs_dio_submit_io
        btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand
        btrfs: raid56: always verify the P/Q contents for scrub
        btrfs: use irq safe locking when running and adding delayed iputs
        btrfs: fix iput() on error pointer after error during orphan cleanup
        btrfs: fix double iput() on inode after an error during orphan cleanup
        btrfs: zoned: fix memory leak after finding block group with super blocks
        btrfs: fix use-after-free of new block group that became unused
        btrfs: be a bit more careful when setting mirror_num_ret in btrfs_map_block
        btrfs: fix race between balance and cancel/pause
      46670259
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.5-rc1' of... · 2922800a
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "One fix for an issue with parsing partially specified DTs"
      
      * tag 'regulator-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: da9063: fix null pointer deref with partial DT config
      2922800a
  2. 19 Jul, 2023 1 commit
  3. 18 Jul, 2023 12 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of... · ccff6d11
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Don't group events when computing metrics that require more than the
         maximum number of simultaneously enabled events on AMD systems.
      
       - Fix multi CU handling in 'perf probe', add a 'perf test' entry to
         regress test it.
      
       - Make the 'perf test task_exit' stop generating samples by using the
         'dummy' event, all it is testing is if a PERF_RECORD_EXIT is
         generated at the end of a perf session. This makes this perf test to
         stop sometimes failing on some systems due to a full ring buffer.
      
       - Avoid SEGV if PMU lookup fails for legacy cache terms.
      
       - Fix libsubcmd SEGV/use-after-free when commands aren't excluded.
      
       - Fix OpenCSD (ARM64's CoreSight hardware tracing) library path
         resolution when specifying CSLIBS= in the make command line.
      
       - Fix broken feature check for libtracefs due to external lib changes,
         use the provided pkgconfig file instead future proof it.
      
       - Sync drm, fcntl, kvm, mount, prctl, socket, vhost, asound, arm64's
         cputype headers with the kernel sources, in some cases this made the
         tools become aware of new kernel APIs such as ioctls and the
         cachestat sysctl.
      
      * tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
        perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT
        tools headers arm64: Sync arm64's cputype.h with the kernel sources
        tools include UAPI: Sync the sound/asound.h copy with the kernel sources
        tools include UAPI: Sync linux/vhost.h with the kernel sources
        perf beauty: Update copy of linux/socket.h with the kernel sources
        perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms
        libsubcmd: Avoid SEGV/use-after-free when commands aren't excluded
        tools headers UAPI: Sync linux/prctl.h with the kernel sources
        perf build: Fix broken feature check for libtracefs due to external lib changes
        tools include UAPI: Sync linux/mount.h copy with the kernel sources
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers uapi: Sync linux/fcntl.h with the kernel sources
        perf vendor events amd: Fix large metrics
        perf build: Fix library not found error when using CSLIBS
        tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        perf probe: Read DWARF files from the correct CU
        perf probe: Add test for regression introduced by switch to die_get_decl_file()
      ccff6d11
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-07-18-12-28' of... · 4806364a
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "Seven hotfixes, six of which are cc:stable and one of which addresses
        a post-6.5 issue"
      
      * tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        maple_tree: fix node allocation testing on 32 bit
        maple_tree: fix 32 bit mas_next testing
        selftests/mm: mkdirty: fix incorrect position of #endif
        maple_tree: set the node limit when creating a new root node
        mm/mlock: fix vma iterator conversion of apply_vma_lock_flags()
        prctl: move PR_GET_AUXV out of PR_MCE_KILL
        selftests/mm: give scripts execute permission
      4806364a
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-6.5-rc3' of... · 74f1456c
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fixes from Shuah Khan:
       "Fixes to bugs that are interfering with arm64 and risc workflows. Also
        two fixes to timer and mincore tests that are causing test failures"
      
      * tag 'linux-kselftest-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/arm64: fix build failure during the "emit_tests" step
        selftests/riscv: fix potential build failure during the "emit_tests" step
        tools: timers: fix freq average calculation
        selftests/mincore: fix skip condition for check_huge_pages test
      74f1456c
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · f2f393c3
      Linus Torvalds authored
      Pull tpm fixes from Jarkko Sakkinen.
      
      Mostly interrupt storm fixes, with some other minor changes.
      
      * tag 'tpmdd-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs
        tpm/tpm_tis: Disable interrupts for Lenovo L590 devices
        tpm: Do not remap from ACPI resources again for Pluton TPM
        tpm/tpm_tis: Disable interrupts for Framework Laptop Intel 13th gen
        tpm/tpm_tis: Disable interrupts for Framework Laptop Intel 12th gen
        security: keys: Modify mismatched function name
        tpm: return false from tpm_amd_is_rng_defective on non-x86 platforms
        keys: Fix linking a duplicate key to a keyring's assoc_array
        tpm: tis_i2c: Limit write bursts to I2C_SMBUS_BLOCK_MAX (32) bytes
        tpm: tis_i2c: Limit read bursts to I2C_SMBUS_BLOCK_MAX (32) bytes
        tpm_tis_spi: Release chip select when flow control fails
        tpm: tpm_tis: Disable interrupts *only* for AEON UPX-i11
        tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
      f2f393c3
    • Filipe Manana's avatar
      btrfs: fix warning when putting transaction with qgroups enabled after abort · aa84ce8a
      Filipe Manana authored
      If we have a transaction abort with qgroups enabled we get a warning
      triggered when doing the final put on the transaction, like this:
      
        [552.6789] ------------[ cut here ]------------
        [552.6815] WARNING: CPU: 4 PID: 81745 at fs/btrfs/transaction.c:144 btrfs_put_transaction+0x123/0x130 [btrfs]
        [552.6817] Modules linked in: btrfs blake2b_generic xor (...)
        [552.6819] CPU: 4 PID: 81745 Comm: btrfs-transacti Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
        [552.6819] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
        [552.6819] RIP: 0010:btrfs_put_transaction+0x123/0x130 [btrfs]
        [552.6821] Code: bd a0 01 00 (...)
        [552.6821] RSP: 0018:ffffa168c0527e28 EFLAGS: 00010286
        [552.6821] RAX: ffff936042caed00 RBX: ffff93604a3eb448 RCX: 0000000000000000
        [552.6821] RDX: ffff93606421b028 RSI: ffffffff92ff0878 RDI: ffff93606421b010
        [552.6821] RBP: ffff93606421b000 R08: 0000000000000000 R09: ffffa168c0d07c20
        [552.6821] R10: 0000000000000000 R11: ffff93608dc52950 R12: ffffa168c0527e70
        [552.6821] R13: ffff93606421b000 R14: ffff93604a3eb420 R15: ffff93606421b028
        [552.6821] FS:  0000000000000000(0000) GS:ffff93675fb00000(0000) knlGS:0000000000000000
        [552.6821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [552.6821] CR2: 0000558ad262b000 CR3: 000000014feda005 CR4: 0000000000370ee0
        [552.6822] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [552.6822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [552.6822] Call Trace:
        [552.6822]  <TASK>
        [552.6822]  ? __warn+0x80/0x130
        [552.6822]  ? btrfs_put_transaction+0x123/0x130 [btrfs]
        [552.6824]  ? report_bug+0x1f4/0x200
        [552.6824]  ? handle_bug+0x42/0x70
        [552.6824]  ? exc_invalid_op+0x14/0x70
        [552.6824]  ? asm_exc_invalid_op+0x16/0x20
        [552.6824]  ? btrfs_put_transaction+0x123/0x130 [btrfs]
        [552.6826]  btrfs_cleanup_transaction+0xe7/0x5e0 [btrfs]
        [552.6828]  ? _raw_spin_unlock_irqrestore+0x23/0x40
        [552.6828]  ? try_to_wake_up+0x94/0x5e0
        [552.6828]  ? __pfx_process_timeout+0x10/0x10
        [552.6828]  transaction_kthread+0x103/0x1d0 [btrfs]
        [552.6830]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
        [552.6832]  kthread+0xee/0x120
        [552.6832]  ? __pfx_kthread+0x10/0x10
        [552.6832]  ret_from_fork+0x29/0x50
        [552.6832]  </TASK>
        [552.6832] ---[ end trace 0000000000000000 ]---
      
      This corresponds to this line of code:
      
        void btrfs_put_transaction(struct btrfs_transaction *transaction)
        {
            (...)
                WARN_ON(!RB_EMPTY_ROOT(
                                &transaction->delayed_refs.dirty_extent_root));
            (...)
        }
      
      The warning happens because btrfs_qgroup_destroy_extent_records(), called
      in the transaction abort path, we free all entries from the rbtree
      "dirty_extent_root" with rbtree_postorder_for_each_entry_safe(), but we
      don't actually empty the rbtree - it's still pointing to nodes that were
      freed.
      
      So set the rbtree's root node to NULL to avoid this warning (assign
      RB_ROOT).
      
      Fixes: 81f7eb00 ("btrfs: destroy qgroup extent records on transaction abort")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      aa84ce8a
    • Christoph Hellwig's avatar
      btrfs: fix ordered extent split error handling in btrfs_dio_submit_io · 7cad645e
      Christoph Hellwig authored
      When the call to btrfs_extract_ordered_extent in btrfs_dio_submit_io
      fails to allocate memory for a new ordered_extent, it calls into the
      btrfs_dio_end_io for error handling.  btrfs_dio_end_io then assumes that
      bbio->ordered is set because it is supposed to be at this point, except
      for this error handling corner case.  Try to not overload the
      btrfs_dio_end_io with error handling of a bio in a non-canonical state,
      and instead call btrfs_finish_ordered_extent and iomap_dio_bio_end_io
      directly for this error case.
      Reported-by: default avatarsyzbot <syzbot+5b82f0e951f8c2bcdb8f@syzkaller.appspotmail.com>
      Fixes: b41b6f69 ("btrfs: use btrfs_finish_ordered_extent to complete direct writes")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Tested-by: default avatarsyzbot <syzbot+5b82f0e951f8c2bcdb8f@syzkaller.appspotmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7cad645e
    • Josef Bacik's avatar
      btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand · 17b17fcd
      Josef Bacik authored
      While trying to get the subpage blocksize tests running, I hit the
      following panic on generic/476
      
        assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229
        kernel BUG at fs/btrfs/subpage.c:229!
        Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
        CPU: 1 PID: 1453 Comm: fsstress Not tainted 6.4.0-rc7+ #12
        Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023
        pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
        pc : btrfs_subpage_assert+0xbc/0xf0
        lr : btrfs_subpage_assert+0xbc/0xf0
        Call trace:
         btrfs_subpage_assert+0xbc/0xf0
         btrfs_subpage_clear_checked+0x38/0xc0
         btrfs_page_clear_checked+0x48/0x98
         btrfs_truncate_block+0x5d0/0x6a8
         btrfs_cont_expand+0x5c/0x528
         btrfs_write_check.isra.0+0xf8/0x150
         btrfs_buffered_write+0xb4/0x760
         btrfs_do_write_iter+0x2f8/0x4b0
         btrfs_file_write_iter+0x1c/0x30
         do_iter_readv_writev+0xc8/0x158
         do_iter_write+0x9c/0x210
         vfs_iter_write+0x24/0x40
         iter_file_splice_write+0x224/0x390
         direct_splice_actor+0x38/0x68
         splice_direct_to_actor+0x12c/0x260
         do_splice_direct+0x90/0xe8
         generic_copy_file_range+0x50/0x90
         vfs_copy_file_range+0x29c/0x470
         __arm64_sys_copy_file_range+0xcc/0x498
         invoke_syscall.constprop.0+0x80/0xd8
         do_el0_svc+0x6c/0x168
         el0_svc+0x50/0x1b0
         el0t_64_sync_handler+0x114/0x120
         el0t_64_sync+0x194/0x198
      
      This happens because during btrfs_cont_expand we'll get a page, set it
      as mapped, and if it's not Uptodate we'll read it.  However between the
      read and re-locking the page we could have called release_folio() on the
      page, but left the page in the file mapping.  release_folio() can clear
      the page private, and thus further down we blow up when we go to modify
      the subpage bits.
      
      Fix this by putting the set_page_extent_mapped() after the read.  This
      is safe because read_folio() will call set_page_extent_mapped() before
      it does the read, and then if we clear page private but leave it on the
      mapping we're completely safe re-setting set_page_extent_mapped().  With
      this patch I can now run generic/476 without panicing.
      
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      17b17fcd
    • Qu Wenruo's avatar
      btrfs: raid56: always verify the P/Q contents for scrub · 486c737f
      Qu Wenruo authored
      [REGRESSION]
      Commit 75b47033 ("btrfs: raid56: migrate recovery and scrub recovery
      path to use error_bitmap") changed the behavior of scrub_rbio().
      
      Initially if we have no error reading the raid bio, we will assign
      @need_check to true, then finish_parity_scrub() would later verify the
      content of P/Q stripes before writeback.
      
      But after that commit we never verify the content of P/Q stripes and
      just writeback them.
      
      This can lead to unrepaired P/Q stripes during scrub, or already
      corrupted P/Q copied to the dev-replace target.
      
      [FIX]
      The situation is more complex than the regression, in fact the initial
      behavior is not 100% correct either.
      
      If we have the following rare case, it can still lead to the same
      problem using the old behavior:
      
      		0	16K	32K	48K	64K
      	Data 1:	|IIIIIII|                       |
      	Data 2:	|				|
      	Parity:	|	|CCCCCCC|		|
      
      Where "I" means IO error, "C" means corruption.
      
      In the above case, we're scrubbing the parity stripe, then read out all
      the contents of Data 1, Data 2, Parity stripes.
      
      But found IO error in Data 1, which leads to rebuild using Data 2 and
      Parity and got the correct data.
      
      In that case, we would not verify if the Parity is correct for range
      [16K, 32K).
      
      So here we have to always verify the content of Parity no matter if we
      did recovery or not.
      
      This patch would remove the @need_check parameter of
      finish_parity_scrub() completely, and would always do the P/Q
      verification before writeback.
      
      Fixes: 75b47033 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap")
      CC: stable@vger.kernel.org # 6.2+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      486c737f
    • Filipe Manana's avatar
      btrfs: use irq safe locking when running and adding delayed iputs · 866e98a4
      Filipe Manana authored
      Running delayed iputs, which never happens in an irq context, needs to
      lock the spinlock fs_info->delayed_iput_lock. When finishing bios for
      data writes (irq context, bio.c) we call btrfs_put_ordered_extent() which
      needs to add a delayed iput and for that it needs to acquire the spinlock
      fs_info->delayed_iput_lock. Without disabling irqs when running delayed
      iputs we can therefore deadlock on that spinlock. The same deadlock can
      also happen when adding an inode to the delayed iputs list, since this
      can be done outside an irq context as well.
      
      Syzbot recently reported this, which results in the following trace:
      
        ================================
        WARNING: inconsistent lock state
        6.4.0-syzkaller-09904-ga507db1d #0 Not tainted
        --------------------------------
        inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
        btrfs-cleaner/16079 [HC0[0]:SC0[0]:HE1:SE1] takes:
        ffff888107804d20 (&fs_info->delayed_iput_lock){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline]
        ffff888107804d20 (&fs_info->delayed_iput_lock){+.?.}-{2:2}, at: btrfs_run_delayed_iputs+0x28/0xe0 fs/btrfs/inode.c:3523
        {IN-SOFTIRQ-W} state was registered at:
          lock_acquire kernel/locking/lockdep.c:5761 [inline]
          lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5726
          __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
          _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
          spin_lock include/linux/spinlock.h:350 [inline]
          btrfs_add_delayed_iput+0x128/0x390 fs/btrfs/inode.c:3490
          btrfs_put_ordered_extent fs/btrfs/ordered-data.c:559 [inline]
          btrfs_put_ordered_extent+0x2f6/0x610 fs/btrfs/ordered-data.c:547
          __btrfs_bio_end_io fs/btrfs/bio.c:118 [inline]
          __btrfs_bio_end_io+0x136/0x180 fs/btrfs/bio.c:112
          btrfs_orig_bbio_end_io+0x86/0x2b0 fs/btrfs/bio.c:163
          btrfs_simple_end_io+0x105/0x380 fs/btrfs/bio.c:378
          bio_endio+0x589/0x690 block/bio.c:1617
          req_bio_endio block/blk-mq.c:766 [inline]
          blk_update_request+0x5c5/0x1620 block/blk-mq.c:911
          blk_mq_end_request+0x59/0x680 block/blk-mq.c:1032
          lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370
          blk_complete_reqs+0xb3/0xf0 block/blk-mq.c:1110
          __do_softirq+0x1d4/0x905 kernel/softirq.c:553
          run_ksoftirqd kernel/softirq.c:921 [inline]
          run_ksoftirqd+0x31/0x60 kernel/softirq.c:913
          smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164
          kthread+0x344/0x440 kernel/kthread.c:389
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
        irq event stamp: 39
        hardirqs last  enabled at (39): [<ffffffff81d5ebc4>] __do_kmem_cache_free mm/slab.c:3558 [inline]
        hardirqs last  enabled at (39): [<ffffffff81d5ebc4>] kmem_cache_free mm/slab.c:3582 [inline]
        hardirqs last  enabled at (39): [<ffffffff81d5ebc4>] kmem_cache_free+0x244/0x370 mm/slab.c:3575
        hardirqs last disabled at (38): [<ffffffff81d5eb5e>] __do_kmem_cache_free mm/slab.c:3553 [inline]
        hardirqs last disabled at (38): [<ffffffff81d5eb5e>] kmem_cache_free mm/slab.c:3582 [inline]
        hardirqs last disabled at (38): [<ffffffff81d5eb5e>] kmem_cache_free+0x1de/0x370 mm/slab.c:3575
        softirqs last  enabled at (0): [<ffffffff814ac99f>] copy_process+0x227f/0x75c0 kernel/fork.c:2448
        softirqs last disabled at (0): [<0000000000000000>] 0x0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&fs_info->delayed_iput_lock);
          <Interrupt>
            lock(&fs_info->delayed_iput_lock);
      
         *** DEADLOCK ***
      
        1 lock held by btrfs-cleaner/16079:
         #0: ffff888107804860 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x103/0x4b0 fs/btrfs/disk-io.c:1463
      
        stack backtrace:
        CPU: 3 PID: 16079 Comm: btrfs-cleaner Not tainted 6.4.0-syzkaller-09904-ga507db1d #0
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
        Call Trace:
         <TASK>
         __dump_stack lib/dump_stack.c:88 [inline]
         dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
         print_usage_bug kernel/locking/lockdep.c:3978 [inline]
         valid_state kernel/locking/lockdep.c:4020 [inline]
         mark_lock_irq kernel/locking/lockdep.c:4223 [inline]
         mark_lock.part.0+0x1102/0x1960 kernel/locking/lockdep.c:4685
         mark_lock kernel/locking/lockdep.c:4649 [inline]
         mark_usage kernel/locking/lockdep.c:4598 [inline]
         __lock_acquire+0x8e4/0x5e20 kernel/locking/lockdep.c:5098
         lock_acquire kernel/locking/lockdep.c:5761 [inline]
         lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5726
         __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
         _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
         spin_lock include/linux/spinlock.h:350 [inline]
         btrfs_run_delayed_iputs+0x28/0xe0 fs/btrfs/inode.c:3523
         cleaner_kthread+0x2e5/0x4b0 fs/btrfs/disk-io.c:1478
         kthread+0x344/0x440 kernel/kthread.c:389
         ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
         </TASK>
      
      So fix this by using spin_lock_irq() and spin_unlock_irq() when running
      delayed iputs, and using spin_lock_irqsave() and spin_unlock_irqrestore()
      when adding a delayed iput().
      
      Reported-by: syzbot+da501a04be5ff533b102@syzkaller.appspotmail.com
      Fixes: ec63b84d ("btrfs: add an ordered_extent pointer to struct btrfs_bio")
      Link: https://lore.kernel.org/linux-btrfs/000000000000d5c89a05ffbd39dd@google.com/Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      866e98a4
    • Filipe Manana's avatar
      btrfs: fix iput() on error pointer after error during orphan cleanup · cbaee87f
      Filipe Manana authored
      At btrfs_orphan_cleanup(), if we can't find an inode (btrfs_iget() returns
      an -ENOENT error pointer), we proceed with 'ret' set to -ENOENT and the
      inode pointer set to ERR_PTR(-ENOENT). Later when we proceed to the body
      of the following if statement:
      
          if (ret == -ENOENT || inode->i_nlink) {
              (...)
              trans = btrfs_start_transaction(root, 1);
              if (IS_ERR(trans)) {
                  ret = PTR_ERR(trans);
                  iput(inode);
                  goto out;
              }
              (...)
              ret = btrfs_del_orphan_item(trans, root,
                                          found_key.objectid);
              btrfs_end_transaction(trans);
              if (ret) {
                  iput(inode);
                  goto out;
              }
              continue;
          }
      
      If we get an error from btrfs_start_transaction() or from the call to
      btrfs_del_orphan_item() we end calling iput() against an inode pointer
      that has a value of ERR_PTR(-ENOENT), resulting in a crash with the
      following trace:
      
        [876.667] BUG: kernel NULL pointer dereference, address: 0000000000000096
        [876.667] #PF: supervisor read access in kernel mode
        [876.667] #PF: error_code(0x0000) - not-present page
        [876.667] PGD 0 P4D 0
        [876.668] Oops: 0000 [#1] PREEMPT SMP PTI
        [876.668] CPU: 0 PID: 2356187 Comm: mount Tainted: G        W          6.4.0-rc6-btrfs-next-134+ #1
        [876.668] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
        [876.668] RIP: 0010:iput+0xa/0x20
        [876.668] Code: ff ff ff 66 (...)
        [876.669] RSP: 0018:ffffafa9c0c9f9d0 EFLAGS: 00010282
        [876.669] RAX: ffffffffffffffe4 RBX: 000000000009453b RCX: 0000000000000000
        [876.669] RDX: 0000000000000001 RSI: ffffafa9c0c9f930 RDI: fffffffffffffffe
        [876.669] RBP: ffff95c612f3b800 R08: 0000000000000001 R09: ffffffffffffffe4
        [876.670] R10: 00018f2a71010000 R11: 000000000ead96e3 R12: ffff95cb7d6909a0
        [876.670] R13: fffffffffffffffe R14: ffff95c60f477000 R15: 00000000ffffffe4
        [876.670] FS:  00007f5fbe30a840(0000) GS:ffff95ccdfa00000(0000) knlGS:0000000000000000
        [876.670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [876.671] CR2: 0000000000000096 CR3: 000000055e9f6004 CR4: 0000000000370ef0
        [876.671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [876.671] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [876.672] Call Trace:
        [876.744]  <TASK>
        [876.744]  ? __die_body+0x1b/0x60
        [876.744]  ? page_fault_oops+0x15d/0x450
        [876.745]  ? __kmem_cache_alloc_node+0x47/0x410
        [876.745]  ? do_user_addr_fault+0x65/0x8a0
        [876.745]  ? exc_page_fault+0x74/0x170
        [876.746]  ? asm_exc_page_fault+0x22/0x30
        [876.746]  ? iput+0xa/0x20
        [876.746]  btrfs_orphan_cleanup+0x221/0x330 [btrfs]
        [876.746]  btrfs_lookup_dentry+0x58f/0x5f0 [btrfs]
        [876.747]  btrfs_lookup+0xe/0x30 [btrfs]
        [876.747]  __lookup_slow+0x82/0x130
        [876.785]  walk_component+0xe5/0x160
        [876.786]  path_lookupat.isra.0+0x6e/0x150
        [876.786]  filename_lookup+0xcf/0x1a0
        [876.786]  ? mod_objcg_state+0xd2/0x360
        [876.786]  ? obj_cgroup_charge+0xf5/0x110
        [876.787]  ? should_failslab+0xa/0x20
        [876.787]  ? kmem_cache_alloc+0x47/0x450
        [876.787]  vfs_path_lookup+0x51/0x90
        [876.788]  mount_subtree+0x8d/0x130
        [876.788]  btrfs_mount+0x149/0x410 [btrfs]
        [876.788]  ? __kmem_cache_alloc_node+0x47/0x410
        [876.788]  ? vfs_parse_fs_param+0xc0/0x110
        [876.789]  legacy_get_tree+0x24/0x50
        [876.834]  vfs_get_tree+0x22/0xd0
        [876.852]  path_mount+0x2d8/0x9c0
        [876.852]  do_mount+0x79/0x90
        [876.852]  __x64_sys_mount+0x8e/0xd0
        [876.853]  do_syscall_64+0x38/0x90
        [876.899]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [876.958] RIP: 0033:0x7f5fbe50b76a
        [876.959] Code: 48 8b 0d a9 (...)
        [876.959] RSP: 002b:00007fff01925798 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        [876.959] RAX: ffffffffffffffda RBX: 00007f5fbe694264 RCX: 00007f5fbe50b76a
        [876.960] RDX: 0000561bde6c8720 RSI: 0000561bde6bdec0 RDI: 0000561bde6c31a0
        [876.960] RBP: 0000561bde6bdc70 R08: 0000000000000000 R09: 0000000000000001
        [876.960] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
        [876.960] R13: 0000561bde6c31a0 R14: 0000561bde6c8720 R15: 0000561bde6bdc70
        [876.960]  </TASK>
      
      So fix this by setting 'inode' to NULL whenever we get an error from
      btrfs_iget(), and to make the code simpler, stop testing for 'ret' being
      -ENOENT to check if we have an inode - instead test for 'inode' being NULL
      or not. Having a NULL 'inode' prevents any iput() call from crashing, as
      iput() ignores NULL inode pointers. Also, stop testing for a NULL return
      value from btrfs_iget() with PTR_ERR_OR_ZERO(), because btrfs_iget() never
      returns NULL - in case an inode is not found, it returns ERR_PTR(-ENOENT),
      and in case of memory allocation failure, it returns ERR_PTR(-ENOMEM).
      We also don't need the extra iput() calls on the error branches for the
      btrfs_start_transaction() and btrfs_del_orphan_item() calls, as we have
      already called iput() before, so remove them.
      
      Fixes: a13bb2c0 ("btrfs: add missing iputs on orphan cleanup failure")
      CC: stable@vger.kernel.org # 6.4
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cbaee87f
    • Filipe Manana's avatar
      btrfs: fix double iput() on inode after an error during orphan cleanup · b777d279
      Filipe Manana authored
      At btrfs_orphan_cleanup(), if we were able to find the inode, we do an
      iput() on the inode, then if btrfs_drop_verity_items() succeeds and then
      either btrfs_start_transaction() or btrfs_del_orphan_item() fail, we do
      another iput() in the respective error paths, resulting in an extra iput()
      on the inode.
      
      Fix this by setting inode to NULL after the first iput(), as iput()
      ignores a NULL inode pointer argument.
      
      Fixes: a13bb2c0 ("btrfs: add missing iputs on orphan cleanup failure")
      CC: stable@vger.kernel.org # 6.4
      Reviewed-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b777d279
    • Filipe Manana's avatar
      btrfs: zoned: fix memory leak after finding block group with super blocks · f1a07c2b
      Filipe Manana authored
      At exclude_super_stripes(), if we happen to find a block group that has
      super blocks mapped to it and we are on a zoned filesystem, we error out
      as this is not supposed to happen, indicating either a bug or maybe some
      memory corruption for example. However we are exiting the function without
      freeing the memory allocated for the logical address of the super blocks.
      Fix this by freeing the logical address.
      
      Fixes: 12659251 ("btrfs: implement log-structured superblock for ZONED mode")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f1a07c2b
  4. 17 Jul, 2023 23 commits