1. 05 Jun, 2023 2 commits
    • Dave Chinner's avatar
      xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag() · 1e473279
      Dave Chinner authored
      xfs_bmap_longest_free_extent() can return an error when accessing
      the AGF fails. In this case, the behaviour of
      xfs_filestream_pick_ag() is conditional on the error. We may
      continue the loop, or break out of it. The error handling after the
      loop cleans up the perag reference held when the break occurs. If we
      continue, the next loop iteration handles cleaning up the perag
      reference.
      
      EIther way, we don't need to release the active perag reference when
      xfs_bmap_longest_free_extent() fails. Doing so means we do a double
      decrement on the active reference count, and this causes tha active
      reference count to fall to zero. At this point, new active
      references will fail.
      
      This leads to unmount hanging because it tries to grab active
      references to that perag, only for it to fail. This happens inside a
      loop that retries until a inode tree radix tree tag is cleared,
      which cannot happen because we can't get an active reference to the
      perag.
      
      The unmount livelocks in this path:
      
        xfs_reclaim_inodes+0x80/0xc0
        xfs_unmount_flush_inodes+0x5b/0x70
        xfs_unmountfs+0x5b/0x1a0
        xfs_fs_put_super+0x49/0x110
        generic_shutdown_super+0x7c/0x1a0
        kill_block_super+0x27/0x50
        deactivate_locked_super+0x30/0x90
        deactivate_super+0x3c/0x50
        cleanup_mnt+0xc2/0x160
        __cleanup_mnt+0x12/0x20
        task_work_run+0x5e/0xa0
        exit_to_user_mode_prepare+0x1bc/0x1c0
        syscall_exit_to_user_mode+0x16/0x40
        do_syscall_64+0x40/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Fixes: eb70aa2d ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      1e473279
    • Darrick J. Wong's avatar
      xfs: fix broken logic when detecting mergeable bmap records · 6be73cec
      Darrick J. Wong authored
      Commit 6bc6c99a944c was a well-intentioned effort to initiate
      consolidation of adjacent bmbt mapping records by setting the PREEN
      flag.  Consolidation can only happen if the length of the combined
      record doesn't overflow the 21-bit blockcount field of the bmbt
      recordset.  Unfortunately, the length test is inverted, leading to it
      triggering on data forks like these:
      
       EXT: FILE-OFFSET           BLOCK-RANGE        AG AG-OFFSET               TOTAL
         0: [0..16777207]:        76110848..92888055  0 (76110848..92888055) 16777208
         1: [16777208..20639743]: 92888056..96750591  0 (92888056..96750591)  38625368
      
      Note that record 0 has a length of 16777208 512b blocks.  This
      corresponds to 2097151 4k fsblocks, which is the maximum.  Hence the two
      records cannot be merged.
      
      However, the logic is still wrong even if we change the in-loop
      comparison, because the scope of our examination isn't broad enough
      inside the loop to detect mappings like this:
      
         0: [0..9]:               76110838..76110847  0 (76110838..76110847)       10
         1: [10..16777217]:       76110848..92888055  0 (76110848..92888055) 16777208
         2: [16777218..20639753]: 92888056..96750591  0 (92888056..96750591)  38625368
      
      These three records could be merged into two, but one cannot determine
      this purely from looking at records 0-1 or 1-2 in isolation.
      
      Hoist the mergability detection outside the loop, and base its decision
      making on whether or not a merged mapping could be expressed in fewer
      bmbt records.  While we're at it, fix the incorrect return type of the
      iter function.
      
      Fixes: 336642f7 ("xfs: alert the user about data/attr fork mappings that could be merged")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      6be73cec
  2. 04 Jun, 2023 14 commits
    • Geert Uytterhoeven's avatar
      xfs: Fix undefined behavior of shift into sign bit · 4320f346
      Geert Uytterhoeven authored
      With gcc-5:
      
          In file included from ./include/trace/define_trace.h:102:0,
      		     from ./fs/xfs/scrub/trace.h:988,
      		     from fs/xfs/scrub/trace.c:40:
          ./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’:
          ./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant
           #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */
      				^
      
      Shifting the (signed) value 1 into the sign bit is undefined behavior.
      
      Fix this for all definitions in the file by shifting "1U" instead of
      "1".
      
      This was exposed by the first user added in commit 466c525d
      ("xfs: minimize overhead of drain wakeups by using jump labels").
      
      Fixes: 160b5a78 ("xfs: hoist the already_fixed variable to the scrub context")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      4320f346
    • Dave Chinner's avatar
      xfs: fix AGF vs inode cluster buffer deadlock · 82842fee
      Dave Chinner authored
      Lock order in XFS is AGI -> AGF, hence for operations involving
      inode unlinked list operations we always lock the AGI first. Inode
      unlinked list operations operate on the inode cluster buffer,
      so the lock order there is AGI -> inode cluster buffer.
      
      For O_TMPFILE operations, this now means the lock order set down in
      xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the
      unlinked ops are done before the directory modifications that may
      allocate space and lock the AGF.
      
      Unfortunately, we also now lock the inode cluster buffer when
      logging an inode so that we can attach the inode to the cluster
      buffer and pin it in memory. This creates a lock order of AGF ->
      inode cluster buffer in directory operations as we have to log the
      inode after we've allocated new space for it.
      
      This creates a lock inversion between the AGF and the inode cluster
      buffer. Because the inode cluster buffer is shared across multiple
      inodes, the inversion is not specific to individual inodes but can
      occur when inodes in the same cluster buffer are accessed in
      different orders.
      
      To fix this we need move all the inode log item cluster buffer
      interactions to the end of the current transaction. Unfortunately,
      xfs_trans_log_inode() calls are littered throughout the transactions
      with no thought to ordering against other items or locking. This
      makes it difficult to do anything that involves changing the call
      sites of xfs_trans_log_inode() to change locking orders.
      
      However, we do now have a mechanism that allows is to postpone dirty
      item processing to just before we commit the transaction: the
      ->iop_precommit method. This will be called after all the
      modifications are done and high level objects like AGI and AGF
      buffers have been locked and modified, thereby providing a mechanism
      that guarantees we don't lock the inode cluster buffer before those
      high level objects are locked.
      
      This change is largely moving the guts of xfs_trans_log_inode() to
      xfs_inode_item_precommit() and providing an extra flag context in
      the inode log item to track the dirty state of the inode in the
      current transaction. This also means we do a lot less repeated work
      in xfs_trans_log_inode() by only doing it once per transaction when
      all the work is done.
      
      Fixes: 298f7bec ("xfs: pin inode backing buffer to the inode log item")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      82842fee
    • Dave Chinner's avatar
      xfs: defered work could create precommits · cb042117
      Dave Chinner authored
      To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
      inode cluster buffer operations to the ->iop_precommit() method.
      However, this means that deferred operations can require precommits
      to be run on the final transaction that the deferred ops pass back
      to xfs_trans_commit() context. This will be exposed by attribute
      handling, in that the last changes to the inode in the attr set
      state machine "disappear" because the precommit operation is not run.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      cb042117
    • Dave Chinner's avatar
      xfs: restore allocation trylock iteration · 00dcd17c
      Dave Chinner authored
      It was accidentally dropped when refactoring the allocation code,
      resulting in the AG iteration always doing blocking AG iteration.
      This results in a small performance regression for a specific fsmark
      test that runs more user data writer threads than there are AGs.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Fixes: 2edf06a5 ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      00dcd17c
    • Dave Chinner's avatar
      xfs: buffer pins need to hold a buffer reference · 89a4bf0d
      Dave Chinner authored
      When a buffer is unpinned by xfs_buf_item_unpin(), we need to access
      the buffer after we've dropped the buffer log item reference count.
      This opens a window where we can have two racing unpins for the
      buffer item (e.g. shutdown checkpoint context callback processing
      racing with journal IO iclog completion processing) and both attempt
      to access the buffer after dropping the BLI reference count.  If we
      are unlucky, the "BLI freed" context wins the race and frees the
      buffer before the "BLI still active" case checks the buffer pin
      count.
      
      This results in a use after free that can only be triggered
      in active filesystem shutdown situations.
      
      To fix this, we need to ensure that buffer existence extends beyond
      the BLI reference count checks and until the unpin processing is
      complete. This implies that a buffer pin operation must also take a
      buffer reference to ensure that the buffer cannot be freed until the
      buffer unpin processing is complete.
      Reported-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: Christoph Hellwig <hch@lst.de> 
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      89a4bf0d
    • Linus Torvalds's avatar
      Linux 6.4-rc5 · 9561de3a
      Linus Torvalds authored
      9561de3a
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6f64a5eb
      Linus Torvalds authored
      Pull irq fix from Borislav Petkov:
      
       - Fix open firmware quirks validation so that they don't get applied
         wrongly
      
      * tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic: Correctly validate OF quirk descriptors
      6f64a5eb
    • Linus Torvalds's avatar
      Merge tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 5e89d62e
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Some driver fixes:
         - a regression fix for the verisilicon driver
         - uvcvideo: don't expose unsupported video formats to userspace
         - camss-video: don't zero subdev format after init
         - mediatek: some fixes for 4K decoder formats
         - fix a Sphinx build warning (missing doc for client_caps)
         - some fixes for imx and atomisp staging drivers
      
        And two CEC core fixes:
         - don't set last_initiator if TX in progress
         - disable adapter in cec_devnode_unregister"
      
      * tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: uvcvideo: Don't expose unsupported formats to userspace
        media: v4l2-subdev: Fix missing kerneldoc for client_caps
        media: staging: media: imx: initialize hs_settle to avoid warning
        media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad()
        media: staging: media: atomisp: init high & low vars
        media: cec: core: don't set last_initiator if tx in progress
        media: cec: core: disable adapter in cec_devnode_unregister
        media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats
        media: camss: camss-video: Don't zero subdev format again after initialization
        media: verisilicon: Additional fix for the crash when opening the driver
      5e89d62e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 209835e8
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that
        resolve a number of reported issues. Included in here are:
      
         - iio driver fixes
      
         - fpga driver fixes
      
         - test_firmware bugfixes
      
         - fastrpc driver tiny bugfixes
      
         - MAINTAINERS file updates for some subsystems
      
        All of these have been in linux-next this past week with no reported
        issues"
      
      * tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits)
        test_firmware: fix the memory leak of the allocated firmware buffer
        test_firmware: fix a memory leak with reqs buffer
        test_firmware: prevent race conditions by a correct implementation of locking
        firmware_loader: Fix a NULL vs IS_ERR() check
        MAINTAINERS: Vaibhav Gupta is the new ipack maintainer
        dt-bindings: fpga: replace Ivan Bornyakov maintainership
        MAINTAINERS: update Microchip MPF FPGA reviewers
        misc: fastrpc: reject new invocations during device removal
        misc: fastrpc: return -EPIPE to invocations on device removal
        misc: fastrpc: Reassign memory ownership only for remote heap
        misc: fastrpc: Pass proper scm arguments for secure map request
        iio: imu: inv_icm42600: fix timestamp reset
        iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
        dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
        iio: dac: mcp4725: Fix i2c_master_send() return value handling
        iio: accel: kx022a fix irq getting
        iio: bu27034: Ensure reset is written
        iio: dac: build ad5758 driver when AD5758 is selected
        iio: addac: ad74413: fix resistance input processing
        iio: light: vcnl4035: fixed chip ID check
        ...
      209835e8
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 41f3ab2d
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here are two small driver core cacheinfo fixes for 6.4-rc5 that
        resolve a number of reported issues with that file. These changes have
        been in linux-next this past week with no reported problems"
      
      * tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug
        drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug
      41f3ab2d
    • Linus Torvalds's avatar
      Merge tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 12c2f77b
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small tty/serial driver fixes for 6.4-rc5 that have all
        been in linux-next this past week with no reported problems. Included
        in here are:
      
         - 8250_tegra driver bugfix
      
         - fsl uart driver bugfixes
      
         - Kconfig fix for dependancy issue
      
         - dt-bindings fix for the 8250_omap driver"
      
      * tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        dt-bindings: serial: 8250_omap: add rs485-rts-active-high
        serial: cpm_uart: Fix a COMPILE_TEST dependency
        soc: fsl: cpm1: Fix TSA and QMC dependencies in case of COMPILE_TEST
        tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK
        serial: 8250_tegra: Fix an error handling path in tegra_uart_probe()
      12c2f77b
    • Linus Torvalds's avatar
      Merge tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8b435e40
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some USB driver and core fixes for 6.4-rc5. Most of these are
        tiny driver fixes, including:
      
         - udc driver bugfix
      
         - f_fs gadget driver bugfix
      
         - cdns3 driver bugfix
      
         - typec bugfixes
      
        But the "big" thing in here is a fix yet-again for how the USB buffers
        are handled from userspace when dealing with DMA issues. The changes
        were discussed a lot, and tested a lot, on the list, and acked by the
        relevant mm maintainers and have been in linux-next all this past week
        with no reported problems"
      
      * tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: tps6598x: Fix broken polling mode after system suspend/resume
        mm: page_table_check: Ensure user pages are not slab pages
        mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
        usb: usbfs: Use consistent mmap functions
        usb: usbfs: Enforce page requirements for mmap
        dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type
        usb: gadget: udc: fix NULL dereference in remove()
        usb: gadget: f_fs: Add unbind event before functionfs_unbind
        usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
      8b435e40
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b066935b
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Address some fallout of the locking rework, this time affecting the
           way the vgic is configured
      
         - Fix an issue where the page table walker frees a subtree and then
           proceeds with walking what it has just freed...
      
         - Check that a given PA donated to the guest is actually memory (only
           affecting pKVM)
      
         - Correctly handle MTE CMOs by Set/Way
      
         - Fix the reported address of a watchpoint forwarded to userspace
      
         - Fix the freeing of the root of stage-2 page tables
      
         - Stop creating spurious PMU events to perform detection of the
           default PMU and use the existing PMU list instead
      
        x86:
      
         - Fix a memslot lookup bug in the NX recovery thread that could
           theoretically let userspace bypass the NX hugepage mitigation
      
         - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
      
         - Account exit stats for fastpath VM-Exits that never leave the super
           tight run-loop
      
         - Fix an out-of-bounds bug in the optimized APIC map code, and add a
           regression test for the race"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: selftests: Add test for race in kvm_recalculate_apic_map()
        KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
        KVM: x86: Account fastpath-only VM-Exits in vCPU stats
        KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
        KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
        KVM: arm64: Document default vPMU behavior on heterogeneous systems
        KVM: arm64: Iterate arm_pmus list to probe for default PMU
        KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()
        KVM: arm64: Populate fault info for watchpoint
        KVM: arm64: Reload PTE after invoking walker callback on preorder traversal
        KVM: arm64: Handle trap of tagged Set/Way CMOs
        arm64: Add missing Set/Way CMO encodings
        KVM: arm64: Prevent unconditional donation of unmapped regions from the host
        KVM: arm64: vgic: Fix a comment
        KVM: arm64: vgic: Fix locking comment
        KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
        KVM: arm64: vgic: Fix a circular locking issue
      b066935b
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 9455b4b6
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix link errors in new aes-gcm-p10 code when built-in with other
         drivers
      
       - Limit number of TCEs passed to H_STUFF_TCE hcall as per spec
      
       - Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write
      
      Thanks to Gaurav Batra and Maninder Singh Vishal Chourasia.
      
      * tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/xmon: Use KSYM_NAME_LEN in array size
        powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall
        powerpc/crypto: Fix aes-gcm-p10 link errors
      9455b4b6
  3. 03 Jun, 2023 10 commits
  4. 02 Jun, 2023 14 commits