1. 02 Jul, 2023 7 commits
    • Darrick J. Wong's avatar
      xfs: fix xfs_btree_query_range callers to initialize btree rec fully · 75dc0345
      Darrick J. Wong authored
      Use struct initializers to ensure that the xfs_btree_irecs passed into
      the query_range function are completely initialized.  No functional
      changes, just closing some sloppy hygiene.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      75dc0345
    • Darrick J. Wong's avatar
      xfs: validate fsmap offsets specified in the query keys · 3ee9351e
      Darrick J. Wong authored
      Improve the validation of the fsmap offset fields in the query keys and
      move the validation to the top of the function now that we have pushed
      the low key adjustment code downwards.
      
      Also fix some indenting issues that aren't worth a separate patch.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      3ee9351e
    • Darrick J. Wong's avatar
      xfs: fix logdev fsmap query result filtering · a949a1c2
      Darrick J. Wong authored
      The external log device fsmap backend doesn't have an rmapbt to query,
      so it's wasteful to spend time initializing the rmap_irec objects.
      Worse yet, the log could (someday) be longer than 2^32 fsblocks, so
      using the rmap irec structure will result in integer overflows.
      
      Fix this mess by computing the start address that we want from keys[0]
      directly, and use the daddr-based record filtering algorithm that we
      also use for rtbitmap queries.
      
      Fixes: e89c0413 ("xfs: implement the GETFSMAP ioctl")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      a949a1c2
    • Darrick J. Wong's avatar
      xfs: clean up the rtbitmap fsmap backend · f045dd00
      Darrick J. Wong authored
      The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to
      spend time initializing the rmap_irec objects.  Worse yet, the logic to
      query the rtbitmap is spread across three separate functions, which is
      unnecessarily difficult to follow.
      
      Compute the start rtextent that we want from keys[0] directly and
      combine the functions to avoid passing parameters around everywhere, and
      consolidate all the logic into a single function.  At one point many
      years ago I intended to use __xfs_getfsmap_rtdev as the launching point
      for realtime rmapbt queries, but this hasn't been the case for a long
      time.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      f045dd00
    • Darrick J. Wong's avatar
      xfs: fix getfsmap reporting past the last rt extent · d898137d
      Darrick J. Wong authored
      The realtime section ends at the last rt extent.  If the user configures
      the rt geometry with an extent size that is not an integer factor of the
      number of rt blocks, it's possible for there to be rt blocks past the
      end of the last rt extent.  These tail blocks cannot ever be allocated
      and will cause corruption reports if the last extent coincides with the
      end of an rt bitmap block, so do not report consider them for the
      GETFSMAP output.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      d898137d
    • Darrick J. Wong's avatar
      xfs: fix integer overflows in the fsmap rtbitmap and logdev backends · 7975aba1
      Darrick J. Wong authored
      It's not correct to use the rmap irec structure to hold query key
      information to query the rtbitmap because the realtime volume can be
      longer than 2^32 fsblocks in length.  Because the rt volume doesn't have
      allocation groups, introduce a daddr-based record filtering algorithm
      and compute the rtextent values using 64-bit variables.  The same
      problem exists in the external log device fsmap implementation, so use
      the same solution to fix it too.
      
      After this patch, all the code that touches info->low and info->high
      under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary.
      Cleaning this up will be done in subsequent patches.
      
      Fixes: 4c934c7d ("xfs: report realtime space information via the rtbitmap")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      7975aba1
    • Darrick J. Wong's avatar
      xfs: fix interval filtering in multi-step fsmap queries · 63ef7a35
      Darrick J. Wong authored
      I noticed a bug in ranged GETFSMAP queries:
      
      # xfs_io -c 'fsmap -vvvv' /opt
       EXT: DEV  BLOCK-RANGE           OWNER              FILE-OFFSET      AG AG-OFFSET           TOTAL
         0: 8:80 [0..7]:               static fs metadata                  0  (0..7)                  8
      <snip>
         9: 8:80 [192..223]:           137                0..31            0  (192..223)             32
      # xfs_io -c 'fsmap -vvvv -d 208 208' /opt
      #
      
      That's not right -- we asked what block maps block 208, and we should've
      received a mapping for inode 137 offset 16.  Instead, we get nothing.
      
      The root cause of this problem is a mis-interaction between the fsmap
      code and how btree ranged queries work.  xfs_btree_query_range returns
      any btree record that overlaps with the query interval, even if the
      record starts before or ends after the interval.  Similarly, GETFSMAP is
      supposed to return a recordset containing all records that overlap the
      range queried.
      
      However, it's possible that the recordset is larger than the buffer that
      the caller provided to convey mappings to userspace.  In /that/ case,
      userspace is supposed to copy the last record returned to fmh_keys[0]
      and call GETFSMAP again.  In this case, we do not want to return
      mappings that we have already supplied to the caller.  The call to
      xfs_btree_query_range is the same, but now we ignore any records that
      start before fmh_keys[0].
      
      Unfortunately, we didn't implement the filtering predicate correctly.
      The predicate should only be called when we're calling back for more
      records.  Accomplish this by setting info->low.rm_blockcount to a
      nonzero value and ensuring that it is cleared as necessary.  As a
      result, we no longer want to adjust dkeys[0] in the main setup function
      because that's confusing.
      
      This patch doesn't touch the logdev/rtbitmap backends because they have
      bigger problems that will be addressed by subsequent patches.
      
      Found via xfs/556 with parent pointers enabled.
      
      Fixes: e89c0413 ("xfs: implement the GETFSMAP ioctl")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      63ef7a35
  2. 29 Jun, 2023 9 commits
    • Dave Chinner's avatar
      xfs: fix bounds check in xfs_defer_agfl_block() · 2bed0d82
      Dave Chinner authored
      Need to happen before we allocate and then leak the xefi. Found by
      coverity via an xfsprogs libxfs scan.
      
      [djwong: This also fixes the type of the @agbno argument.]
      
      Fixes: 7dfee17b ("xfs: validate block number being freed before adding to xefi")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      2bed0d82
    • Dave Chinner's avatar
      xfs: AGF length has never been bounds checked · edd8276d
      Dave Chinner authored
      The AGF verifier does not check that the AGF length field is within
      known good bounds. This has never been checked by runtime kernel
      code (i.e. the lack of verification goes back to 1993) yet we assume
      in many places that it is correct and verify other metdata against
      it.
      
      Add length verification to the AGF verifier. The length of the AGF
      must be equal to the size of the AG specified in the superblock,
      unless it is the last AG in the filesystem. In that case, it must be
      less than or equal to sb->sb_agblocks and greater than
      XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs operation will
      allow to exist.
      
      This requires a bit of rework of the verifier function. We want to
      verify metadata before we use it to verify other metadata. Hence
      we need to verify the AGF sequence numbers before using them to
      verify the length of the AGF. Then we can verify the AGF length
      before we verify AGFL fields. Then we can verifier other fields that
      are bounds limited by the AGF length.
      
      And, finally, by calculating agf_length only once into a local
      variable, we can collapse repeated "if (xfs_has_foo() &&"
      conditionaly checks into single checks. This makes the code much
      easier to follow as all the checks for a given feature are obviously
      in the same place.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      edd8276d
    • Dave Chinner's avatar
      xfs: journal geometry is not properly bounds checked · f1e1765a
      Dave Chinner authored
      If the journal geometry results in a sector or log stripe unit
      validation problem, it indicates that we cannot set the log up to
      safely write to the the journal. In these cases, we must abort the
      mount because the corruption needs external intervention to resolve.
      Similarly, a journal that is too large cannot be written to safely,
      either, so we shouldn't allow those geometries to mount, either.
      
      If the log is too small, we risk having transaction reservations
      overruning the available log space and the system hanging waiting
      for space it can never provide. This is purely a runtime hang issue,
      not a corruption issue as per the first cases listed above. We abort
      mounts of the log is too small for V5 filesystems, but we must allow
      v4 filesystems to mount because, historically, there was no log size
      validity checking and so some systems may still be out there with
      undersized logs.
      
      The problem is that on V4 filesystems, when we discover a log
      geometry problem, we skip all the remaining checks and then allow
      the log to continue mounting. This mean that if one of the log size
      checks fails, we skip the log stripe unit check. i.e. we allow the
      mount because a "non-fatal" geometry is violated, and then fail to
      check the hard fail geometries that should fail the mount.
      
      Move all these fatal checks to the superblock verifier, and add a
      new check for the two log sector size geometry variables having the
      same values. This will prevent any attempt to mount a log that has
      invalid or inconsistent geometries long before we attempt to mount
      the log.
      
      However, for the minimum log size checks, we can only do that once
      we've setup up the log and calculated all the iclog sizes and
      roundoffs. Hence this needs to remain in the log mount code after
      the log has been initialised. It is also the only case where we
      should allow a v4 filesystem to continue running, so leave that
      handling in place, too.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      f1e1765a
    • Dave Chinner's avatar
      xfs: don't block in busy flushing when freeing extents · 8ebbf262
      Dave Chinner authored
      If the current transaction holds a busy extent and we are trying to
      allocate a new extent to fix up the free list, we can deadlock if
      the AG is entirely empty except for the busy extent held by the
      transaction.
      
      This can occur at runtime processing an XEFI with multiple extents
      in this path:
      
      __schedule+0x22f at ffffffff81f75e8f
      schedule+0x46 at ffffffff81f76366
      xfs_extent_busy_flush+0x69 at ffffffff81477d99
      xfs_alloc_ag_vextent_size+0x16a at ffffffff8141711a
      xfs_alloc_ag_vextent+0x19b at ffffffff81417edb
      xfs_alloc_fix_freelist+0x22f at ffffffff8141896f
      xfs_free_extent_fix_freelist+0x6a at ffffffff8141939a
      __xfs_free_extent+0x99 at ffffffff81419499
      xfs_trans_free_extent+0x3e at ffffffff814a6fee
      xfs_extent_free_finish_item+0x24 at ffffffff814a70d4
      xfs_defer_finish_noroll+0x1f7 at ffffffff81441407
      xfs_defer_finish+0x11 at ffffffff814417e1
      xfs_itruncate_extents_flags+0x13d at ffffffff8148b7dd
      xfs_inactive_truncate+0xb9 at ffffffff8148bb89
      xfs_inactive+0x227 at ffffffff8148c4f7
      xfs_fs_destroy_inode+0xb8 at ffffffff81496898
      destroy_inode+0x3b at ffffffff8127d2ab
      do_unlinkat+0x1d1 at ffffffff81270df1
      do_syscall_64+0x40 at ffffffff81f6b5f0
      entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8200007c
      
      This can also happen in log recovery when processing an EFI
      with multiple extents through this path:
      
      context_switch() kernel/sched/core.c:3881
      __schedule() kernel/sched/core.c:5111
      schedule() kernel/sched/core.c:5186
      xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598
      xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641
      xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828
      xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362
      xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029
      __xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067
      xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370
      xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626
      xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605
      xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893
      xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824
      xfs_log_mount_finish() fs/xfs/xfs_log.c:764
      xfs_mountfs() fs/xfs/xfs_mount.c:978
      xfs_fs_fill_super() fs/xfs/xfs_super.c:1908
      mount_bdev() fs/super.c:1417
      xfs_fs_mount() fs/xfs/xfs_super.c:1985
      legacy_get_tree() fs/fs_context.c:647
      vfs_get_tree() fs/super.c:1547
      do_new_mount() fs/namespace.c:2843
      do_mount() fs/namespace.c:3163
      ksys_mount() fs/namespace.c:3372
      __do_sys_mount() fs/namespace.c:3386
      __se_sys_mount() fs/namespace.c:3383
      __x64_sys_mount() fs/namespace.c:3383
      do_syscall_64() arch/x86/entry/common.c:296
      entry_SYSCALL_64() arch/x86/entry/entry_64.S:180
      
      To avoid this deadlock, we should not block in
      xfs_extent_busy_flush() if we hold a busy extent in the current
      transaction.
      
      Now that the EFI processing code can handle requeuing a partially
      completed EFI, we can detect this situation in
      xfs_extent_busy_flush() and return -EAGAIN rather than going to
      sleep forever. The -EAGAIN get propagated back out to the
      xfs_trans_free_extent() context, where the EFD is populated and the
      transaction is rolled, thereby moving the busy extents into the CIL.
      
      At this point, we can retry the extent free operation again with a
      clean transaction. If we hit the same "all free extents are busy"
      situation when trying to fix up the free list, we can safely call
      xfs_extent_busy_flush() and wait for the busy extents to resolve
      and wake us. At this point, the allocation search can make progress
      again and we can fix up the free list.
      
      This deadlock was first reported by Chandan in mid-2021, but I
      couldn't make myself understood during review, and didn't have time
      to fix it myself.
      
      It was reported again in March 2023, and again I have found myself
      unable to explain the complexities of the solution needed during
      review.
      
      As such, I don't have hours more time to waste trying to get the
      fix written the way it needs to be written, so I'm just doing it
      myself. This patchset is largely based on Wengang Wang's last patch,
      but with all the unnecessary stuff removed, split up into multiple
      patches and cleaned up somewhat.
      Reported-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
      Reported-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      8ebbf262
    • Dave Chinner's avatar
      xfs: allow extent free intents to be retried · 0853b5de
      Dave Chinner authored
      Extent freeing neeeds to be able to avoid a busy extent deadlock
      when the transaction itself holds the only busy extents in the
      allocation group. This may occur if we have an EFI that contains
      multiple extents to be freed, and the freeing the second intent
      requires the space the first extent free released to expand the
      AGFL. If we block on the busy extent at this point, we deadlock.
      
      We hold a dirty transaction that contains a entire atomic extent
      free operations within it, so if we can abort the extent free
      operation and commit the progress that we've made, the busy extent
      can be resolved by a log force. Hence we can restart the aborted
      extent free with a new transaction and continue to make
      progress without risking deadlocks.
      
      To enable this, we need the EFI processing code to be able to handle
      an -EAGAIN error to tell it to commit the current transaction and
      retry again. This mechanism is already built into the defer ops
      processing (used bythe refcount btree modification intents), so
      there's relatively little handling we need to add to the EFI code to
      enable this.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      0853b5de
    • Dave Chinner's avatar
      xfs: pass alloc flags through to xfs_extent_busy_flush() · 6a2a9d77
      Dave Chinner authored
      To avoid blocking in xfs_extent_busy_flush() when freeing extents
      and the only busy extents are held by the current transaction, we
      need to pass the XFS_ALLOC_FLAG_FREEING flag context all the way
      into xfs_extent_busy_flush().
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      6a2a9d77
    • Dave Chinner's avatar
      xfs: use deferred frees for btree block freeing · b742d7b4
      Dave Chinner authored
      Btrees that aren't freespace management trees use the normal extent
      allocation and freeing routines for their blocks. Hence when a btree
      block is freed, a direct call to xfs_free_extent() is made and the
      extent is immediately freed. This puts the entire free space
      management btrees under this path, so we are stacking btrees on
      btrees in the call stack. The inobt, finobt and refcount btrees
      all do this.
      
      However, the bmap btree does not do this - it calls
      xfs_free_extent_later() to defer the extent free operation via an
      XEFI and hence it gets processed in deferred operation processing
      during the commit of the primary transaction (i.e. via intent
      chaining).
      
      We need to change xfs_free_extent() to behave in a non-blocking
      manner so that we can avoid deadlocks with busy extents near ENOSPC
      in transactions that free multiple extents. Inserting or removing a
      record from a btree can cause a multi-level tree merge operation and
      that will free multiple blocks from the btree in a single
      transaction. i.e. we can call xfs_free_extent() multiple times, and
      hence the btree manipulation transaction is vulnerable to this busy
      extent deadlock vector.
      
      To fix this, convert all the remaining callers of xfs_free_extent()
      to use xfs_free_extent_later() to queue XEFIs and hence defer
      processing of the extent frees to a context that can be safely
      restarted if a deadlock condition is detected.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      b742d7b4
    • Dave Chinner's avatar
      xfs: don't reverse order of items in bulk AIL insertion · 939bd50d
      Dave Chinner authored
      XFS has strict metadata ordering requirements. One of the things it
      does is maintain the commit order of items from transaction commit
      through the CIL and into the AIL. That is, if a transaction logs
      item A before item B in a modification, then they will be inserted
      into the CIL in the order {A, B}. These items are then written into
      the iclog during checkpointing in the order {A, B}. When the
      checkpoint commits, they are supposed to be inserted into the AIL in
      the order {A, B}, and when they are pushed from the AIL, they are
      pushed in the order {A, B}.
      
      If we crash, log recovery then replays the two items from the
      checkpoint in the order {A, B}, resulting in the objects the items
      apply to being queued for writeback at the end of the checkpoint
      in the order {A, B}. This means recovery behaves the same way as the
      runtime code.
      
      In places, we have subtle dependencies on this ordering being
      maintained. One of this place is performing intent recovery from the
      log. It assumes that recovering an intent will result in a
      non-intent object being the first thing that is modified in the
      recovery transaction, and so when the transaction commits and the
      journal flushes, the first object inserted into the AIL beyond the
      intent recovery range will be a non-intent item.  It uses the
      transistion from intent items to non-intent items to stop the
      recovery pass.
      
      A recent log recovery issue indicated that an intent was appearing
      as the first item in the AIL beyond the recovery range, hence
      breaking the end of recovery detection that exists.
      
      Tracing indicated insertion of the items into the AIL was apparently
      occurring in the right order (the intent was last in the commit item
      list), but the intent was appearing first in the AIL. IOWs, the
      order of items in the AIL was {D,C,B,A}, not {A,B,C,D}, and bulk
      insertion was reversing the order of the items in the batch of items
      being inserted.
      
      Lucky for us, all the items fed to bulk insertion have the same LSN,
      so the reversal of order does not affect the log head/tail tracking
      that is based on the contents of the AIL. It only impacts on code
      that has implicit, subtle dependencies on object order, and AFAICT
      only the intent recovery loop is impacted by it.
      
      Make sure bulk AIL insertion does not reorder items incorrectly.
      
      Fixes: 0e57f6a3 ("xfs: bulk AIL insertion during transaction commit")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChandan Babu R <chandan.babu@oracle.com>
      939bd50d
    • Colin Ian King's avatar
      xfs: remove redundant initializations of pointers drop_leaf and save_leaf · 347eb95b
      Colin Ian King authored
      Pointers drop_leaf and save_leaf are initialized with values that are never
      read, they are being re-assigned later on just before they are used. Remove
      the redundant early initializations and keep the later assignments at the
      point where they are used. Cleans up two clang scan build warnings:
      
      fs/xfs/libxfs/xfs_attr_leaf.c:2288:29: warning: Value stored to 'drop_leaf'
      during its initialization is never read [deadcode.DeadStores]
      fs/xfs/libxfs/xfs_attr_leaf.c:2289:29: warning: Value stored to 'save_leaf'
      during its initialization is never read [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      347eb95b
  3. 13 Jun, 2023 4 commits
    • Long Li's avatar
      xfs: fix ag count overflow during growfs · c3b880ac
      Long Li authored
      I found a corruption during growfs:
      
       XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of
         file fs/xfs/libxfs/xfs_alloc.c.  Caller __xfs_free_extent+0x28e/0x3c0
       CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
       Call Trace:
        <TASK>
        dump_stack_lvl+0x50/0x70
        xfs_corruption_error+0x134/0x150
        __xfs_free_extent+0x2c1/0x3c0
        xfs_ag_extend_space+0x291/0x3e0
        xfs_growfs_data+0xd72/0xe90
        xfs_file_ioctl+0x5f9/0x14a0
        __x64_sys_ioctl+0x13e/0x1c0
        do_syscall_64+0x39/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
       XFS (loop0): Corruption detected. Unmount and run xfs_repair
       XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file
         fs/xfs/xfs_trans.c.  Caller xfs_growfs_data+0x691/0xe90
       CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257
       Call Trace:
        <TASK>
        dump_stack_lvl+0x50/0x70
        xfs_error_report+0x93/0xc0
        xfs_trans_cancel+0x2c0/0x350
        xfs_growfs_data+0x691/0xe90
        xfs_file_ioctl+0x5f9/0x14a0
        __x64_sys_ioctl+0x13e/0x1c0
        do_syscall_64+0x39/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
       RIP: 0033:0x7f2d86706577
      
      The bug can be reproduced with the following sequence:
      
       # truncate -s  1073741824 xfs_test.img
       # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img
       # truncate -s 2305843009213693952  xfs_test.img
       # mount -o loop xfs_test.img /mnt/test
       # xfs_growfs -D  1125899907891200  /mnt/test
      
      The root cause is that during growfs, user space passed in a large value
      of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is
      too small, new AG count will exceed UINT_MAX. Because of AG number type
      is unsigned int and it would overflow, that caused nagcount much smaller
      than the actual value. During AG extent space, delta blocks in
      xfs_resizefs_init_new_ags() will much larger than the actual value due to
      incorrect nagcount, even exceed UINT_MAX. This will cause corruption and
      be detected in __xfs_free_extent. Fix it by growing the filesystem to up
      to the maximally allowed AGs and not return EINVAL when new AG count
      overflow.
      Signed-off-by: default avatarLong Li <leo.lilong@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      c3b880ac
    • Christoph Hellwig's avatar
      xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method · b2943499
      Christoph Hellwig authored
      Since commit a2ad63da ("VFS: add FMODE_CAN_ODIRECT file flag") file
      systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
      wiring up a dummy direct_IO method to indicate support for direct I/O.
      Do that for xfs so that noop_direct_IO can eventually be removed.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      b2943499
    • Darrick J. Wong's avatar
      xfs: drop EXPERIMENTAL tag for large extent counts · 61d7e827
      Darrick J. Wong authored
      This feature has been baking in upstream for ~10mo with no bug reports.
      It seems to work fine here, let's get rid of the scary warnings?
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      61d7e827
    • Darrick J. Wong's avatar
      xfs: don't deplete the reserve pool when trying to shrink the fs · 06f3ef6e
      Darrick J. Wong authored
      Every now and then, xfs/168 fails with this logged in dmesg:
      
      Reserve blocks depleted! Consider increasing reserve pool size.
      EXPERIMENTAL online shrink feature in use. Use at your own risk!
      Per-AG reservation for AG 1 failed.  Filesystem may run out of space.
      Per-AG reservation for AG 1 failed.  Filesystem may run out of space.
      Error -28 reserving per-AG metadata reserve pool.
      Corruption of in-memory data (0x8) detected at xfs_ag_shrink_space+0x23c/0x3b0 [xfs] (fs/xfs/libxfs/xfs_ag.c:1007).  Shutting down filesystem.
      
      It's silly to deplete the reserved blocks pool just to shrink the
      filesystem, particularly since the fs goes down after that.
      
      Fixes: fb2fc172 ("xfs: support shrinking unused space in the last AG")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      06f3ef6e
  4. 11 Jun, 2023 3 commits
    • Linus Torvalds's avatar
      Linux 6.4-rc6 · 858fd168
      Linus Torvalds authored
      858fd168
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4c605260
      Linus Torvalds authored
      Pull x86 fix from Borislav Petkov:
      
       - Set up the kernel CS earlier in the boot process in case EFI boots
         the kernel after bypassing the decompressor and the CS descriptor
         used ends up being the EFI one which is not mapped in the identity
         page table, leading to early SEV/SNP guest communication exceptions
         resulting in the guest crashing
      
      * tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
      4c605260
    • Linus Torvalds's avatar
      Merge tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd · 65d7ca59
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Five smb3 server fixes, all also for stable:
      
         - Fix four slab out of bounds warnings: improve checks for protocol
           id, and for small packet length, and for create context parsing,
           and for negotiate context parsing
      
         - Fix for incorrect dereferencing POSIX ACLs"
      
      * tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: validate smb request protocol id
        ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
        ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
        ksmbd: fix out-of-bound read in parse_lease_state()
        ksmbd: fix out-of-bound read in deassemble_neg_contexts()
      65d7ca59
  5. 10 Jun, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 022ce886
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Biggest news is that Andi Shyti steps in for maintaining the
        controller drivers. Thank you very much!
      
        Other than that, one new driver maintainer and the rest is usual
        driver bugfixes. at24 has a Kconfig dependecy fix"
      
      * tag 'i2c-for-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        MAINTAINERS: Add entries for Renesas RZ/V2M I2C driver
        eeprom: at24: also select REGMAP
        i2c: sprd: Delete i2c adapter in .remove's error path
        i2c: mv64xxx: Fix reading invalid status value in atomic mode
        i2c: designware: fix idx_write_cnt in read loop
        i2c: mchp-pci1xxxx: Avoid cast to incompatible function type
        i2c: img-scb: Fix spelling mistake "innacurate" -> "inaccurate"
        MAINTAINERS: Add myself as I2C host drivers maintainer
      022ce886
    • Linus Torvalds's avatar
      Merge tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire · 6be5e47b
      Linus Torvalds authored
      Pull soundwire fixes from Vinod Koul:
       "Core fix for missing flag clear, error patch handling in qcom driver
        and BIOS quirk for HP Spectre x360:
      
         - HP Spectre x360 soundwire DMI quirk
      
         - Error path handling for qcom driver
      
         - Core fix for missing clear of alloc_slave_rt"
      
      * tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
        soundwire: stream: Add missing clear of alloc_slave_rt
        soundwire: qcom: add proper error paths in qcom_swrm_startup()
        soundwire: dmi-quirks: add new mapping for HP Spectre x360
      6be5e47b
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 859c7459
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Most of the changes this time are for the Qualcomm Snapdragon
        platforms.
      
        There are bug fixes for error handling in Qualcomm icc-bwmon,
        rpmh-rsc, ramp_controller and rmtfs driver as well as the AMD tee
        firmware driver and a missing initialization in the Arm ff-a firmware
        driver. The Qualcomm RPMh and EDAC drivers need some rework to work
        correctly on all supported chips.
      
        The DT fixes include:
      
         - i.MX8 fixes for gpio, pinmux and clock settings
      
         - ADS touchscreen gpio polarity settings in several machines
      
         - Address dtb warnings for caches, panel and input-enable properties
           on Qualcomm platforms
      
         - Incorrect data on qualcomm platforms fir SA8155P power domains,
           SM8550 LLCC, SC7180-lite SDRAM frequencies and SM8550 soundwire
      
         - Remoteproc firmware paths are corrected for Sony Xperia 10 IV"
      
      * tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits)
        firmware: arm_ffa: Set handle field to zero in memory descriptor
        ARM: dts: Fix erroneous ADS touchscreen polarities
        arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
        arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
        arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
        EDAC/qcom: Get rid of hardcoded register offsets
        EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
        arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
        dt-bindings: cache: qcom,llcc: Fix SM8550 description
        arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
        arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
        soc: qcom: rpmhpd: Add SA8155P power domains
        arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
        dt-bindings: power: qcom,rpmpd: Add SA8155P
        soc: qcom: Rename ice to qcom_ice to avoid module name conflict
        soc: qcom: rmtfs: Fix error code in probe()
        soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
        ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
        ARM: at91: pm: fix imbalanced reference counter for ethernet devices
        arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
        ...
      859c7459
  6. 09 Jun, 2023 14 commits