1. 21 Nov, 2022 1 commit
    • Long Li's avatar
      xfs: fix incorrect i_nlink caused by inode racing · 28b4b059
      Long Li authored
      The following error occurred during the fsstress test:
      
      XFS: Assertion failed: VFS_I(ip)->i_nlink >= 2, file: fs/xfs/xfs_inode.c, line: 2452
      
      The problem was that inode race condition causes incorrect i_nlink to be
      written to disk, and then it is read into memory. Consider the following
      call graph, inodes that are marked as both XFS_IFLUSHING and
      XFS_IRECLAIMABLE, i_nlink will be reset to 1 and then restored to original
      value in xfs_reinit_inode(). Therefore, the i_nlink of directory on disk
      may be set to 1.
      
        xfsaild
            xfs_inode_item_push
                xfs_iflush_cluster
                    xfs_iflush
                        xfs_inode_to_disk
      
        xfs_iget
            xfs_iget_cache_hit
                xfs_iget_recycle
                    xfs_reinit_inode
                        inode_init_always
      
      xfs_reinit_inode() needs to hold the ILOCK_EXCL as it is changing internal
      inode state and can race with other RCU protected inode lookups. On the
      read side, xfs_iflush_cluster() grabs the ILOCK_SHARED while under rcu +
      ip->i_flags_lock, and so xfs_iflush/xfs_inode_to_disk() are protected from
      racing inode updates (during transactions) by that lock.
      
      Fixes: ff7bebeb ("xfs: refactor the inode recycling code") # goes further back than this
      Signed-off-by: default avatarLong Li <leo.lilong@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      28b4b059
  2. 17 Nov, 2022 12 commits
    • Lukas Herbolt's avatar
      xfs: Print XFS UUID on mount and umount events. · 64c80dfd
      Lukas Herbolt authored
      As of now only device names are printed out over __xfs_printk().
      The device names are not persistent across reboots which in case
      of searching for origin of corruption brings another task to properly
      identify the devices. This patch add XFS UUID upon every mount/umount
      event which will make the identification much easier.
      Signed-off-by: default avatarLukas Herbolt <lukas@herbolt.com>
      [sandeen: rebase onto current upstream kernel]
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      64c80dfd
    • Long Li's avatar
      xfs: fix sb write verify for lazysbcount · 59f6ab40
      Long Li authored
      When lazysbcount is enabled, fsstress and loop mount/unmount test report
      the following problems:
      
      XFS (loop0): SB summary counter sanity check failed
      XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460,
      	xfs_sb block 0x0
      XFS (loop0): Unmount and run xfs_repair
      XFS (loop0): First 128 bytes of corrupted metadata buffer:
      00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00  XFSB.........(..
      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a  i.|._.D..t....4Z
      00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80  ..... ..........
      00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
      00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00  ................
      00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00  ................
      00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19  ................
      XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply
      	+0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580).  Shutting down filesystem.
      XFS (loop0): Please unmount the filesystem and rectify the problem(s)
      XFS (loop0): log mount/recovery failed: error -117
      XFS (loop0): log mount failed
      
      This corruption will shutdown the file system and the file system will
      no longer be mountable. The following script can reproduce the problem,
      but it may take a long time.
      
       #!/bin/bash
      
       device=/dev/sda
       testdir=/mnt/test
       round=0
      
       function fail()
       {
      	 echo "$*"
      	 exit 1
       }
      
       mkdir -p $testdir
       while [ $round -lt 10000 ]
       do
      	 echo "******* round $round ********"
      	 mkfs.xfs -f $device
      	 mount $device $testdir || fail "mount failed!"
      	 fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null &
      	 sleep 4
      	 killall -w fsstress
      	 umount $testdir
      	 xfs_repair -e $device > /dev/null
      	 if [ $? -eq 2 ];then
      		 echo "ERR CODE 2: Dirty log exception during repair."
      		 exit 1
      	 fi
      	 round=$(($round+1))
       done
      
      With lazysbcount is enabled, There is no additional lock protection for
      reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the
      m_ifree, this will make the m_ifree greater than m_icount. For example,
      consider the following sequence and ifreedelta is postive:
      
       CPU0				 CPU1
       xfs_log_sb			 xfs_trans_unreserve_and_mod_sb
       ----------			 ------------------------------
       percpu_counter_sum(&mp->m_icount)
      				 percpu_counter_add_batch(&mp->m_icount,
      						idelta, XFS_ICOUNT_BATCH)
      				 percpu_counter_add(&mp->m_ifree, ifreedelta);
       percpu_counter_sum(&mp->m_ifree)
      
      After this, incorrect inode count (sb_ifree > sb_icount) will be writen to
      the log. In the subsequent writing of sb, incorrect inode count (sb_ifree >
      sb_icount) will fail to pass the boundary check in xfs_validate_sb_write()
      that cause the file system shutdown.
      
      When lazysbcount is enabled, we don't need to guarantee that Lazy sb
      counters are completely correct, but we do need to guarantee that sb_ifree
      <= sb_icount. On the other hand, the constraint that m_ifree <= m_icount
      must be satisfied any time that there /cannot/ be other threads allocating
      or freeing inode chunks. If the constraint is violated under these
      circumstances, sb_i{count,free} (the ondisk superblock inode counters)
      maybe incorrect and need to be marked sick at unmount, the count will
      be rebuilt on the next mount.
      
      Fixes: 8756a5af ("libxfs: add more bounds checking to sb sanity checks")
      Signed-off-by: default avatarLong Li <leo.lilong@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      59f6ab40
    • Darrick J. Wong's avatar
      xfs: fix incorrect error-out in xfs_remove · 2653d533
      Darrick J. Wong authored
      Clean up resources if resetting the dotdot entry doesn't succeed.
      Observed through code inspection.
      
      Fixes: 5838d035 ("xfs: reset child dir '..' entry when unlinking child")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarAndrey Albershteyn <aalbersh@redhat.com>
      2653d533
    • Darrick J. Wong's avatar
      Merge tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of... · 7b082b5e
      Darrick J. Wong authored
      Merge tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: scrub inode core when checking metadata files
      
      Running the online fsck QA fuzz tests, I noticed that we were
      consistently missing fuzzed records in the inode cores of the realtime
      freespace files and the quota files.  This patch adds the ability to
      check inode cores in xchk_metadata_inode_forks.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: check inode core when scrubbing metadata files
        xfs: don't warn about files that are exactly s_maxbytes long
      7b082b5e
    • Darrick J. Wong's avatar
      Merge tag 'scrub-bmap-enhancements-6.2_2022-11-16' of... · cc5f38fa
      Darrick J. Wong authored
      Merge tag 'scrub-bmap-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: strengthen file mapping scrub
      
      This series strengthens the file extent mapping scrubber in various
      ways, such as confirming that there are enough bmap records to match up
      with the rmap records for this file, checking delalloc reservations,
      checking for no unwritten extents in metadata files, invalid CoW fork
      formats, and weird things like shared CoW fork extents.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-bmap-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: teach scrub to flag non-extents format cow forks
        xfs: check that CoW fork extents are not shared
        xfs: check quota files for unwritten extents
        xfs: block map scrub should handle incore delalloc reservations
        xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap
        xfs: fix perag loop in xchk_bmap_check_rmaps
      cc5f38fa
    • Darrick J. Wong's avatar
      Merge tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of... · 7aab8a05
      Darrick J. Wong authored
      Merge tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: enhance fs summary counter scrubber
      
      This series makes two changes to the fs summary counter scrubber: first,
      we should mark the scrub incomplete when we can't read the AG headers.
      Second, it fixes a functionality gap where we don't actually check the
      free rt extent count.
      
      v23.2: fix pointless inline
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: online checking of the free rt extent count
        xfs: skip fscounters comparisons when the scan is incomplete
      7aab8a05
    • Darrick J. Wong's avatar
      Merge tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of... · b76f593b
      Darrick J. Wong authored
      Merge tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: improve rt metadata use for scrub
      
      This short series makes some small changes to the way we handle the
      realtime metadata inodes.  First, we now make sure that the bitmap and
      summary file forks are always loaded at mount time so that every
      scrubber won't have to call xfs_iread_extents.  This won't be easy if
      we're, say, cross-referencing realtime space allocations.  The second
      change makes the ILOCK annotations more consistent with the rest of XFS.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file
        xfs: load rtbitmap and rtsummary extent mapping btrees at mount time
      b76f593b
    • Darrick J. Wong's avatar
      Merge tag 'scrub-fix-return-value-6.2_2022-11-16' of... · 3d8426b1
      Darrick J. Wong authored
      Merge tag 'scrub-fix-return-value-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: fix incorrect return values in online fsck
      
      Here we fix a couple of problems with the errno values that we return to
      userspace.
      
      v23.2: fix vague wording of comment
      v23.3: fix the commit message to discuss what's really going on in this
      patch
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-fix-return-value-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: don't return -EFSCORRUPTED from repair when resources cannot be grabbed
        xfs: don't retry repairs harder when EAGAIN is returned
        xfs: fix return code when fatal signal encountered during dquot scrub
        xfs: return EINTR when a fatal signal terminates scrub
      3d8426b1
    • Darrick J. Wong's avatar
      Merge tag 'scrub-cleanup-malloc-6.2_2022-11-16' of... · af1077fa
      Darrick J. Wong authored
      Merge tag 'scrub-cleanup-malloc-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: clean up memory allocations in online fsck
      
      This series standardizes the GFP_ flags that we use for memory
      allocation in online scrub, and convert the callers away from the old
      kmem_alloc code that was ported from Irix.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-cleanup-malloc-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: pivot online scrub away from kmem.[ch]
        xfs: initialize the check_owner object fully
        xfs: standardize GFP flags usage in online scrub
      af1077fa
    • Darrick J. Wong's avatar
      Merge tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of... · 823ca26a
      Darrick J. Wong authored
      Merge tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
      
      xfs: fix handling of AG[IF] header buffers during scrub
      
      While reading through the online fsck code, I noticed that the setup
      code for AG metadata scrubs will attach the AGI, the AGF, and the AGFL
      buffers to the transaction.  It isn't necessary to hold the AGFL buffer,
      since any code that wants to do anything with the AGFL will need to hold
      the AGF to know which parts of the AGFL are active.  Therefore, we only
      need to hold the AGFL when scrubbing the AGFL itself.
      
      The second bug fixed by this patchset is one that I observed while
      testing online repair.  When a buffer is held across a transaction roll,
      its buffer log item will be detached if the bli was clean before the
      roll.  If we are holding the AG headers to maintain a lock on an AG, we
      then need to set the buffer type on the new bli to avoid confusing the
      logging code later.
      
      There's also a bug fix for uninitialized memory in the directory scanner
      that didn't fit anywhere else.
      
      Ths patchset finishes off by teaching the AGFL repair function to look
      for and discard crosslinked blocks instead of putting them back on the
      AGFL.
      
      v23.2: Log the buffers before rolling the transaction to keep the moving
      forward in the log and avoid the bli falling off.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      
      * tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: make AGFL repair function avoid crosslinked blocks
        xfs: log the AGI/AGF buffers when rolling transactions during an AG repair
        xfs: don't track the AGFL buffer in the scrub AG context
        xfs: fully initialize xfs_da_args in xchk_directory_blocks
      823ca26a
    • Darrick J. Wong's avatar
      xfs: check inode core when scrubbing metadata files · f36b954a
      Darrick J. Wong authored
      Metadata files (e.g. realtime bitmaps and quota files) do not show up in
      the bulkstat output, which means that scrub-by-handle does not work;
      they can only be checked through a specific scrub type.  Therefore, each
      scrub type calls xchk_metadata_inode_forks to check the metadata for
      whatever's in the file.
      
      Unfortunately, that function doesn't actually check the inode record
      itself.  Refactor the function a bit to make that happen.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      f36b954a
    • Darrick J. Wong's avatar
      xfs: don't warn about files that are exactly s_maxbytes long · bd5ab5f9
      Darrick J. Wong authored
      We can handle files that are exactly s_maxbytes bytes long; we just
      can't handle anything larger than that.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      bd5ab5f9
  3. 16 Nov, 2022 21 commits
  4. 06 Nov, 2022 6 commits
    • Linus Torvalds's avatar
      Linux 6.1-rc4 · f0c4d9fc
      Linus Torvalds authored
      f0c4d9fc
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 16c7a368
      Linus Torvalds authored
      Pull cxl fixes from Dan Williams:
       "Several fixes for CXL region creation crashes, leaks and failures.
      
        This is mainly fallout from the original implementation of dynamic CXL
        region creation (instantiate new physical memory pools) that arrived
        in v6.0-rc1.
      
        Given the theme of "failures in the presence of pass-through decoders"
        this also includes new regression test infrastructure for that case.
      
        Summary:
      
         - Fix region creation crash with pass-through decoders
      
         - Fix region creation crash when no decoder allocation fails
      
         - Fix region creation crash when scanning regions to enforce the
           increasing physical address order constraint that CXL mandates
      
         - Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
           1:1 memory-device-to-region associations.
      
         - Fix a memory leak for cxl_region objects when regions with active
           targets are deleted
      
         - Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
           emulated proximity domains.
      
         - Fix region creation failure for switch attached devices downstream
           of a single-port host-bridge
      
         - Fix false positive memory leak of cxl_region objects by recycling
           recently used region ids rather than freeing them
      
         - Add regression test infrastructure for a pass-through decoder
           configuration
      
         - Fix some mailbox payload handling corner cases"
      
      * tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/region: Recycle region ids
        cxl/region: Fix 'distance' calculation with passthrough ports
        tools/testing/cxl: Add a single-port host-bridge regression config
        tools/testing/cxl: Fix some error exits
        cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
        cxl/region: Fix cxl_region leak, cleanup targets at region delete
        cxl/region: Fix region HPA ordering validation
        cxl/pmem: Use size_add() against integer overflow
        cxl/region: Fix decoder allocation crash
        ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
        cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
        cxl/region: Fix null pointer dereference due to pass through decoder commit
        cxl/mbox: Add a check on input payload size
      16c7a368
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v6.1-rc4' of... · aa529949
      Linus Torvalds authored
      Merge tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Fix two regressions:
      
         - Commit 54cc3dbf ("hwmon: (pmbus) Add regulator supply into
           macro") resulted in regulator undercount when disabling regulators.
           Revert it.
      
         - The thermal subsystem rework caused the scmi driver to no longer
           register with the thermal subsystem because index values no longer
           match. To fix the problem, the scmi driver now directly registers
           with the thermal subsystem, no longer through the hwmon core"
      
      * tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        Revert "hwmon: (pmbus) Add regulator supply into macro"
        hwmon: (scmi) Register explicitly with Thermal Framework
      aa529949
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 727ea09e
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Add Cooper Lake's stepping to the PEBS guest/host events isolation
         fixed microcode revisions checking quirk
      
       - Update Icelake and Sapphire Rapids events constraints
      
       - Use the standard energy unit for Sapphire Rapids in RAPL
      
       - Fix the hw_breakpoint test to fail more graciously on !SMP configs
      
      * tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
        perf/x86/intel: Fix pebs event constraints for SPR
        perf/x86/intel: Fix pebs event constraints for ICL
        perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
        perf/hw_breakpoint: test: Skip the test if dependencies unmet
      727ea09e
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f6f52047
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Add new Intel CPU models
      
       - Enforce that TDX guests are successfully loaded only on TDX hardware
         where virtualization exception (#VE) delivery on kernel memory is
         disabled because handling those in all possible cases is "essentially
         impossible"
      
       - Add the proper include to the syscall wrappers so that BTF can see
         the real pt_regs definition and not only the forward declaration
      
      * tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add several Intel server CPU model numbers
        x86/tdx: Panic on bad configs that #VE on "private" memory access
        x86/tdx: Prepare for using "INFO" call for a second purpose
        x86/syscall: Include asm/ptrace.h in syscall_wrapper header
      f6f52047
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.1-2' of... · 35697d81
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Use POSIX-compatible grep options
      
       - Document git-related tips for reproducible builds
      
       - Fix a typo in the modpost rule
      
       - Suppress SIGPIPE error message from gcc-ar and llvm-ar
      
       - Fix segmentation fault in the menuconfig search
      
      * tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: fix segmentation fault in menuconfig search
        kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
        kbuild: fix typo in modpost
        Documentation: kbuild: Add description of git for reproducible builds
        kbuild: use POSIX-compatible grep option
      35697d81