1. 14 Jun, 2024 3 commits
  2. 12 Jun, 2024 3 commits
    • Damien Le Moal's avatar
      scsi: mpi3mr: Fix ATA NCQ priority support · 90e6f089
      Damien Le Moal authored
      The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to
      the HBA if a read or write command directed at an ATA device should be
      translated to an NCQ read/write command with the high prioiryt bit set
      when the request uses the RT priority class and the user has enabled NCQ
      priority through sysfs.
      
      However, unlike the mpt3sas driver, the mpi3mr driver does not define
      the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so
      the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never
      actually set and NCQ Priority cannot ever be used.
      
      Fix this by defining these missing atributes to allow a user to check if
      an ATA device supports NCQ priority and to enable/disable the use of NCQ
      priority. To do this, lift the function scsih_ncq_prio_supp() out of the
      mpt3sas driver and make it the generic SCSI SAS transport function
      sas_ata_ncq_prio_supported(). Nothing in that function is hardware
      specific, so this function can be used in both the mpt3sas driver and
      the mpi3mr driver.
      Reported-by: default avatarScott McCoy <scott.mccoy@wdc.com>
      Fixes: 023ab2a9 ("scsi: mpi3mr: Add support for queue command processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.orgReviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      90e6f089
    • Ziqi Chen's avatar
      scsi: ufs: core: Quiesce request queues before checking pending cmds · 77691af4
      Ziqi Chen authored
      In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked,
      ufshcd_pending_cmds() is called to check whether there are pending
      transactions or not. And only if there are no pending transactions can we
      proceed to kickstart the clock scaling sequence.
      
      ufshcd_pending_cmds() traverses over all SCSI devices and calls
      sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down
      to three steps:
      
       1. Calculate the nr outstanding bits set in the 'word' bitmap.
      
       2. Calculate the nr outstanding bits set in the 'cleared' bitmap.
      
       3. Subtract the result from step 1 by the result from step 2.
      
      This can lead to a race condition as outlined below:
      
      Assume there is one pending transaction in the request queue of one SCSI
      device, say sda, and the budget token of this request is 0, the 'word' is
      0x1 and the 'cleared' is 0x0.
      
       1. When step 1 executes, it gets the result as 1.
      
       2. Before step 2 executes, block layer tries to dispatch a new request to
          sda. Since the SCSI layer is blocked, the request cannot pass through
          SCSI but the block layer would do budget_get() and budget_put() to
          sda's budget map regardless, so the 'word' has become 0x3 and 'cleared'
          has become 0x2 (assume the new request got budget token 1).
      
       3. When step 2 executes, it gets the result as 1.
      
       4. When step 3 executes, it gets the result as 0, meaning there is no
          pending transactions, which is wrong.
      
          Thread A                        Thread B
          ufshcd_pending_cmds()           __blk_mq_sched_dispatch_requests()
          |                               |
          sbitmap_weight(word)            |
          |                               scsi_mq_get_budget()
          |                               |
          |                               scsi_mq_put_budget()
          |                               |
          sbitmap_weight(cleared)
          ...
      
      When this race condition happens, the clock scaling sequence is started
      with transactions still in flight, leading to subsequent hibernate enter
      failure, broken link, task abort and back to back error recovery.
      
      Fix this race condition by quiescing the request queues before calling
      ufshcd_pending_cmds() so that block layer won't touch the budget map when
      ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer
      blocking/unblocking to reduce redundancies and latencies.
      
      Fixes: 8d077ede ("scsi: ufs: Optimize the command queueing code")
      Co-developed-by: default avatarCan Guo <quic_cang@quicinc.com>
      Signed-off-by: default avatarCan Guo <quic_cang@quicinc.com>
      Signed-off-by: default avatarZiqi Chen <quic_ziqichen@quicinc.com>
      Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.comReviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      77691af4
    • Damien Le Moal's avatar
      scsi: core: Disable CDL by default · 52912ca8
      Damien Le Moal authored
      For SCSI devices supporting the Command Duration Limits feature set, the
      user can enable/disable this feature use through the sysfs device attribute
      "cdl_enable". This attribute modification triggers a call to
      scsi_cdl_enable() to enable and disable the feature for ATA devices and set
      the scsi device cdl_enable field to the user provided bool value.  For SCSI
      devices supporting CDL, the feature set is always enabled and
      scsi_cdl_enable() is reduced to setting the cdl_enable field.
      
      However, for ATA devices, a drive may spin-up with the CDL feature enabled
      by default. But the SCSI device cdl_enable field is always initialized to
      false (CDL disabled), regardless of the actual device CDL feature
      state. For ATA devices managed by libata (or libsas), libata-core always
      disables the CDL feature set when the device is attached, thus syncing the
      state of the CDL feature on the device and of the SCSI device cdl_enable
      field. However, for ATA devices connected to a SAS HBA, the CDL feature is
      not disabled on scan for ATA devices that have this feature enabled by
      default, leading to an inconsistent state of the feature on the device with
      the SCSI device cdl_enable field.
      
      Avoid this inconsistency by adding a call to scsi_cdl_enable() in
      scsi_cdl_check() to make sure that the device-side state of the CDL feature
      set always matches the scsi device cdl_enable field state.  This implies
      that CDL will always be disabled for ATA devices connected to SAS HBAs,
      which is consistent with libata/libsas initialization of the device.
      Reported-by: default avatarScott McCoy <scott.mccoy@wdc.com>
      Fixes: 1b22cfb1 ("scsi: core: Allow enabling and disabling command duration limits")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.orgReviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Reviewed-by: default avatarIgor Pylypiv <ipylypiv@google.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      52912ca8
  3. 06 Jun, 2024 2 commits
  4. 31 May, 2024 1 commit
  5. 29 May, 2024 1 commit
  6. 26 May, 2024 5 commits
  7. 25 May, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of... · 9b62e02e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "16 hotfixes, 11 of which are cc:stable.
      
        A few nilfs2 fixes, the remainder are for MM: a couple of selftests
        fixes, various singletons fixing various issues in various parts"
      
      * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/ksm: fix possible UAF of stable_node
        mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
        mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
        nilfs2: fix potential hang in nilfs_detach_log_writer()
        nilfs2: fix unexpected freezing of nilfs_segctor_sync()
        nilfs2: fix use-after-free of timer for log writer thread
        selftests/mm: fix build warnings on ppc64
        arm64: patching: fix handling of execmem addresses
        selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
        selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
        selftests/mm: compaction_test: fix bogus test success on Aarch64
        mailmap: update email address for Satya Priya
        mm/huge_memory: don't unpoison huge_zero_folio
        kasan, fortify: properly rename memintrinsics
        lib: add version into /proc/allocinfo output
        mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
      9b62e02e
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0db36ed
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
      
       - Fix x86 IRQ vector leak caused by a CPU offlining race
      
       - Fix build failure in the riscv-imsic irqchip driver
         caused by an API-change semantic conflict
      
       - Fix use-after-free in irq_find_at_or_after()
      
      * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
        genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
        irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
      a0db36ed
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a390f24
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - Fix regressions of the new x86 CPU VFM (vendor/family/model)
         enumeration/matching code
      
       - Fix crash kernel detection on buggy firmware with
         non-compliant ACPI MADT tables
      
       - Address Kconfig warning
      
      * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
        crypto: x86/aes-xts - switch to new Intel CPU model defines
        x86/topology: Handle bogus ACPI tables correctly
        x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
      3a390f24
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi · 56676c4c
      Linus Torvalds authored
      Pull ipmi updates from Corey Minyard:
       "Mostly updates for deprecated interfaces, platform.remove and
        converting from a tasklet to a BH workqueue.
      
        Also use HAS_IOPORT for disabling inb()/outb()"
      
      * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
        ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
        ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
        ipmi: ipmi_ssif: Convert to platform remove callback returning void
        ipmi: ipmi_si_platform: Convert to platform remove callback returning void
        ipmi: ipmi_powernv: Convert to platform remove callback returning void
        ipmi: bt-bmc: Convert to platform remove callback returning void
        char: ipmi: handle HAS_IOPORT dependencies
        ipmi: Convert from tasklet to BH workqueue
      56676c4c
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client · 74eca356
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A series from Xiubo that adds support for additional access checks
        based on MDS auth caps which were recently made available to clients.
      
        This is needed to prevent scenarios where the MDS quietly discards
        updates that a UID-restricted client previously (wrongfully) acked to
        the user.
      
        Other than that, just a documentation fixup"
      
      * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
        doc: ceph: update userspace command to get CephFS metadata
        ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
        ceph: check the cephx mds auth access for async dirop
        ceph: check the cephx mds auth access for open
        ceph: check the cephx mds auth access for setattr
        ceph: add ceph_mds_check_access() helper
        ceph: save cap_auths in MDS client when session is opened
      74eca356
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3 · 89b61ca4
      Linus Torvalds authored
      Pull ntfs3 updates from Konstantin Komarov:
       "Fixes:
         - reusing of the file index (could cause the file to be trimmed)
         - infinite dir enumeration
         - taking DOS names into account during link counting
         - le32_to_cpu conversion, 32 bit overflow, NULL check
         - some code was refactored
      
        Changes:
         - removed max link count info display during driver init
      
        Remove:
         - atomic_open has been removed for lack of use"
      
      * tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
        fs/ntfs3: Break dir enumeration if directory contents error
        fs/ntfs3: Fix case when index is reused during tree transformation
        fs/ntfs3: Mark volume as dirty if xattr is broken
        fs/ntfs3: Always make file nonresident on fallocate call
        fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
        fs/ntfs3: Use variable length array instead of fixed size
        fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
        fs/ntfs3: Check 'folio' pointer for NULL
        fs/ntfs3: Missed le32_to_cpu conversion
        fs/ntfs3: Remove max link count info display during driver init
        fs/ntfs3: Taking DOS names into account during link counting
        fs/ntfs3: remove atomic_open
        fs/ntfs3: use kcalloc() instead of kzalloc()
      89b61ca4
    • Linus Torvalds's avatar
      Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 6c8b1a2d
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes, both for stable"
      
      * tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: ignore trailing slashes in share paths
        ksmbd: avoid to send duplicate oplock break notifications
      6c8b1a2d
    • Linus Torvalds's avatar
      Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 54f71b03
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "There is one new driver and then most of the changes are the device
        tree bindings conversions to yaml.
      
        New driver:
         - Epson RX8111
      
        Drivers:
         - Many Device Tree bindings conversions to dtschema
         - pcf8563: wakeup-source support"
      
      * tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        pcf8563: add wakeup-source support
        rtc: rx8111: handle VLOW flag
        rtc: rx8111: demote warnings to debug level
        rtc: rx6110: Constify struct regmap_config
        dt-bindings: rtc: convert trivial devices into dtschema
        dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
        dt-bindings: rtc: pxa-rtc: convert to dtschema
        rtc: Add driver for Epson RX8111
        dt-bindings: rtc: Add Epson RX8111
        rtc: mcp795: drop unneeded MODULE_ALIAS
        rtc: nuvoton: Modify part number value
        rtc: test: Split rtc unit test into slow and normal speed test
        dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
        dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
        dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
        dt-bindings: rtc: armada-380-rtc: convert to dtschema
        rtc: cros-ec: provide ID table for avoiding fallback match
      54f71b03
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · 4286e1fc
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "Runtime PM (power management) is improved and hot-join support has
        been added to the dw controller driver.
      
        Core:
         - Allow device driver to trigger controller runtime PM
      
        Drivers:
         - dw: hot-join support
         - svc: better IBI handling"
      
      * tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: dw: Add hot-join support.
        i3c: master: Enable runtime PM for master controller
        i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
        i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
        i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
      4286e1fc
    • Linus Torvalds's avatar
      Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs · 6951abe8
      Linus Torvalds authored
      Pull jffs2 updates from Richard Weinberger:
      
       - Fix illegal memory access in jffs2_free_inode()
      
       - Kernel-doc fixes
      
       - print symbolic error names
      
      * tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        jffs2: Fix potential illegal address access in jffs2_free_inode
        jffs2: Simplify the allocation of slab caches
        jffs2: nodemgmt: fix kernel-doc comments
        jffs2: print symbolic error name instead of error code
      6951abe8
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 2313022e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Fixes for -Wmissing-prototypes warnings and further cleanup
      
       - Remove callback returning void from rtc and virtio drivers
      
       - Fix bash location
      
      * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
        um: virtio_uml: Convert to platform remove callback returning void
        um: rtc: Convert to platform remove callback returning void
        um: Remove unused do_get_thread_area function
        um: Fix -Wmissing-prototypes warnings for __vdso_*
        um: Add an internal header shared among the user code
        um: Fix the declaration of kasan_map_memory
        um: Fix the -Wmissing-prototypes warning for get_thread_reg
        um: Fix the -Wmissing-prototypes warning for __switch_mm
        um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
        um: Stop tracking host PID in cpu_tasks
        um: process: remove unused 'n' variable
        um: vector: remove unused len variable/calculation
        um: vector: fix bpfflash parameter evaluation
        um: slirp: remove set but unused variable 'pid'
        um: signal: move pid variable where needed
        um: Makefile: use bash from the environment
        um: Add winch to winch_handlers before registering winch IRQ
        um: Fix -Wmissing-prototypes warnings for __warp_* and foo
        um: Fix -Wmissing-prototypes warnings for text_poke*
        um: Move declarations to proper headers
        ...
      2313022e
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel · 56fb6f92
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes for the end of the merge window, mostly amdgpu and panthor,
        with one nouveau uAPI change that fixes a bad decision we made a few
        months back.
      
        nouveau:
         - fix bo metadata uAPI for vm bind
      
        panthor:
         - Fixes for panthor's heap logical block.
         - Reset on unrecoverable fault
         - Fix VM references.
         - Reset fix.
      
        xlnx:
         - xlnx compile and doc fixes.
      
        amdgpu:
         - Handle vbios table integrated info v2.3
      
        amdkfd:
         - Handle duplicate BOs in reserve_bo_and_cond_vms
         - Handle memory limitations on small APUs
      
        dp/mst:
         - MST null deref fix.
      
        bridge:
         - Don't let next bridge create connector in adv7511 to make probe
           work"
      
      * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
        drm/amdgpu/atomfirmware: add intergrated info v2.3 table
        drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
        drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
        drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
        drm/bridge: adv7511: Attach next bridge without creating connector
        drm/buddy: Fix the warn on's during force merge
        drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
        drm/panthor: Call panthor_sched_post_reset() even if the reset failed
        drm/panthor: Reset the FW VM to NULL on unplug
        drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
        drm/panthor: Force an immediate reset on unrecoverable faults
        drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
        drm/panthor: Fix an off-by-one in the heap context retrieval logic
        drm/panthor: Relax the constraints on the tiler chunk size
        drm/panthor: Make sure the tiler initial/max chunks are consistent
        drm/panthor: Fix tiler OOM handling to allow incremental rendering
        drm: xlnx: zynqmp_dpsub: Fix compilation error
        drm: xlnx: zynqmp_dpsub: Fix few function comments
      56fb6f92
  8. 24 May, 2024 13 commits
    • David Howells's avatar
      cifs: Fix missing set of remote_i_size · 93a43155
      David Howells authored
      Occasionally, the generic/001 xfstest will fail indicating corruption in
      one of the copy chains when run on cifs against a server that supports
      FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs).  The
      problem is that the remote_i_size value isn't updated by cifs_setsize()
      when called by smb2_duplicate_extents(), but i_size *is*.
      
      This may cause cifs_remap_file_range() to then skip the bit after calling
      ->duplicate_extents() that sets sizes.
      
      Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
      calling cifs_setsize() to set i_size.
      
      This means we don't then need to call netfs_resize_file() upon return from
      ->duplicate_extents(), but we also fix the test to compare against the pre-dup
      inode size.
      
      [Note that this goes back before the addition of remote_i_size with the
      netfs_inode struct.  It should probably have been setting cifsi->server_eof
      previously.]
      
      Fixes: cfc63fc8 ("smb3: fix cached file size problems in duplicate extents (reflink)")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Steve French <sfrench@samba.org>
      cc: Paulo Alcantara <pc@manguebit.com>
      cc: Shyam Prasad N <nspmangalore@gmail.com>
      cc: Rohith Surabattula <rohiths.msft@gmail.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      93a43155
    • David Howells's avatar
      cifs: Fix smb3_insert_range() to move the zero_point · 8a160723
      David Howells authored
      Fix smb3_insert_range() to move the zero_point over to the new EOF.
      Without this, generic/147 fails as reads of data beyond the old EOF point
      return zeroes.
      
      Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Shyam Prasad N <nspmangalore@gmail.com>
      cc: Rohith Surabattula <rohiths.msft@gmail.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      8a160723
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 0b32d436
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "Jeff Xu's implementation of the mseal() syscall"
      
      * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftest mm/mseal read-only elf memory segment
        mseal: add documentation
        selftest mm/mseal memory sealing
        mseal: add mseal syscall
        mseal: wire up mseal syscall
      0b32d436
    • Chengming Zhou's avatar
      mm/ksm: fix possible UAF of stable_node · 90e82349
      Chengming Zhou authored
      The commit 2c653d0e ("ksm: introduce ksm_max_page_sharing per page
      deduplication limit") introduced a possible failure case in the
      stable_tree_insert(), where we may free the new allocated stable_node_dup
      if we fail to prepare the missing chain node.
      
      Then that kfolio return and unlock with a freed stable_node set...  And
      any MM activities can come in to access kfolio->mapping, so UAF.
      
      Fix it by moving folio_set_stable_node() to the end after stable_node
      is inserted successfully.
      
      Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
      Fixes: 2c653d0e ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
      Signed-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90e82349
    • Miaohe Lin's avatar
      mm/memory-failure: fix handling of dissolved but not taken off from buddy pages · 8cf360b9
      Miaohe Lin authored
      When I did memory failure tests recently, below panic occurs:
      
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
      page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
      ------------[ cut here ]------------
      kernel BUG at include/linux/page-flags.h:1009!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:__del_page_from_free_list+0x151/0x180
      RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
      RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
      RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
      RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
      R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
      R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
      FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       __rmqueue_pcplist+0x23b/0x520
       get_page_from_freelist+0x26b/0xe40
       __alloc_pages_noprof+0x113/0x1120
       __folio_alloc_noprof+0x11/0xb0
       alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
       __alloc_fresh_hugetlb_folio+0xe7/0x140
       alloc_pool_huge_folio+0x68/0x100
       set_max_huge_pages+0x13d/0x340
       hugetlb_sysctl_handler_common+0xe8/0x110
       proc_sys_call_handler+0x194/0x280
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7ff916114887
      RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
      RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
      RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
      R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
      R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
       </TASK>
      Modules linked in: mce_inject hwpoison_inject
      ---[ end trace 0000000000000000 ]---
      
      And before the panic, there had an warning about bad page state:
      
      BUG: Bad page state in process page-types  pfn:8cee00
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      page_type: 0xffffff7f(buddy)
      raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
      page dumped because: nonzero mapcount
      Modules linked in: mce_inject hwpoison_inject
      CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
      Call Trace:
       <TASK>
       dump_stack_lvl+0x83/0xa0
       bad_page+0x63/0xf0
       free_unref_page+0x36e/0x5c0
       unpoison_memory+0x50b/0x630
       simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
       debugfs_attr_write+0x42/0x60
       full_proxy_write+0x5b/0x80
       vfs_write+0xcd/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f189a514887
      RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
      RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
      RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
      R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
       </TASK>
      
      The root cause should be the below race:
      
       memory_failure
        try_memory_failure_hugetlb
         me_huge_page
          __page_handle_poison
           dissolve_free_hugetlb_folio
           drain_all_pages -- Buddy page can be isolated e.g. for compaction.
           take_page_off_buddy -- Failed as page is not in the buddy list.
      	     -- Page can be putback into buddy after compaction.
          page_ref_inc -- Leads to buddy page with refcnt = 1.
      
      Then unpoison_memory() can unpoison the page and send the buddy page back
      into buddy list again leading to the above bad page state warning.  And
      bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
      page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
      allocate this page.
      
      Fix this issue by only treating __page_handle_poison() as successful when
      it returns 1.
      
      Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
      Fixes: ceaf8fbe ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8cf360b9
    • Yuanyuan Zhong's avatar
      mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again · 6d065f50
      Yuanyuan Zhong authored
      After switching smaps_rollup to use VMA iterator, searching for next entry
      is part of the condition expression of the do-while loop.  So the current
      VMA needs to be addressed before the continue statement.
      
      Otherwise, with some VMAs skipped, userspace observed memory
      consumption from /proc/pid/smaps_rollup will be smaller than the sum of
      the corresponding fields from /proc/pid/smaps.
      
      Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
      Fixes: c4c84f06 ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
      Signed-off-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Reviewed-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d065f50
    • Ryusuke Konishi's avatar
      nilfs2: fix potential hang in nilfs_detach_log_writer() · eb85dace
      Ryusuke Konishi authored
      Syzbot has reported a potential hang in nilfs_detach_log_writer() called
      during nilfs2 unmount.
      
      Analysis revealed that this is because nilfs_segctor_sync(), which
      synchronizes with the log writer thread, can be called after
      nilfs_segctor_destroy() terminates that thread, as shown in the call trace
      below:
      
      nilfs_detach_log_writer
        nilfs_segctor_destroy
          nilfs_segctor_kill_thread  --> Shut down log writer thread
          flush_work
            nilfs_iput_work_func
              nilfs_dispose_list
                iput
                  nilfs_evict_inode
                    nilfs_transaction_commit
                      nilfs_construct_segment (if inode needs sync)
                        nilfs_segctor_sync  --> Attempt to synchronize with
                                                log writer thread
                                 *** DEADLOCK ***
      
      Fix this issue by changing nilfs_segctor_sync() so that the log writer
      thread returns normally without synchronizing after it terminates, and by
      forcing tasks that are already waiting to complete once after the thread
      terminates.
      
      The skipped inode metadata flushout will then be processed together in the
      subsequent cleanup work in nilfs_segctor_destroy().
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb85dace
    • Ryusuke Konishi's avatar
      nilfs2: fix unexpected freezing of nilfs_segctor_sync() · 936184ea
      Ryusuke Konishi authored
      A potential and reproducible race issue has been identified where
      nilfs_segctor_sync() would block even after the log writer thread writes a
      checkpoint, unless there is an interrupt or other trigger to resume log
      writing.
      
      This turned out to be because, depending on the execution timing of the
      log writer thread running in parallel, the log writer thread may skip
      responding to nilfs_segctor_sync(), which causes a call to schedule()
      waiting for completion within nilfs_segctor_sync() to lose the opportunity
      to wake up.
      
      The reason why waking up the task waiting in nilfs_segctor_sync() may be
      skipped is that updating the request generation issued using a shared
      sequence counter and adding an wait queue entry to the request wait queue
      to the log writer, are not done atomically.  There is a possibility that
      log writing and request completion notification by nilfs_segctor_wakeup()
      may occur between the two operations, and in that case, the wait queue
      entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
      nilfs_segctor_sync() will be carried over until the next request occurs.
      
      Fix this issue by performing these two operations simultaneously within
      the lock section of sc_state_lock.  Also, following the memory barrier
      guidelines for event waiting loops, move the call to set_current_state()
      in the same location into the event waiting loop to ensure that a memory
      barrier is inserted just before the event condition determination.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
      Fixes: 9ff05123 ("nilfs2: segment constructor")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      936184ea
    • Ryusuke Konishi's avatar
      nilfs2: fix use-after-free of timer for log writer thread · f5d4e046
      Ryusuke Konishi authored
      Patch series "nilfs2: fix log writer related issues".
      
      This bug fix series covers three nilfs2 log writer-related issues,
      including a timer use-after-free issue and potential deadlock issue on
      unmount, and a potential freeze issue in event synchronization found
      during their analysis.  Details are described in each commit log.
      
      
      This patch (of 3):
      
      A use-after-free issue has been reported regarding the timer sc_timer on
      the nilfs_sc_info structure.
      
      The problem is that even though it is used to wake up a sleeping log
      writer thread, sc_timer is not shut down until the nilfs_sc_info structure
      is about to be freed, and is used regardless of the thread's lifetime.
      
      Fix this issue by limiting the use of sc_timer only while the log writer
      thread is alive.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
      Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
      Fixes: fdce895e ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar"Bai, Shuangpeng" <sjb7183@psu.edu>
      Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJTested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5d4e046
    • Michael Ellerman's avatar
      selftests/mm: fix build warnings on ppc64 · 1901472f
      Michael Ellerman authored
      Fix warnings like:
      
        In file included from uffd-unit-tests.c:8:
        uffd-unit-tests.c: In function `uffd_poison_handle_fault':
        uffd-common.h:45:33: warning: format `%llu' expects argument of type
        `long long unsigned int', but argument 3 has type `__u64' {aka `long
        unsigned int'} [-Wformat=]
      
      By switching to unsigned long long for u64 for ppc64 builds.
      
      Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.auSigned-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1901472f
    • Will Deacon's avatar
      arm64: patching: fix handling of execmem addresses · b1480ed2
      Will Deacon authored
      Klara Modin reported warnings for a kernel configured with BPF_JIT but
      without MODULES:
      
      [   44.131296] Trying to vfree() bad address (000000004a17c299)
      [   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a #25
      [   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
      [   44.164433] Workqueue: events bpf_prog_free_deferred
      [   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.188772] sp : ffff800082a13c70
      [   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
      [   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
      [   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
      [   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
      [   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
      [   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
      [   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
      [   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
      [   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
      [   44.275703] Call trace:
      [   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.283858] vfree (mm/vmalloc.c:3322)
      [   44.287835] execmem_free (mm/execmem.c:70)
      [   44.292347] bpf_jit_free_exec+0x10/0x1c
      [   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
      [   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
      [   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
      [   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
      [   44.317785] process_one_work (kernel/workqueue.c:3273)
      [   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
      [   44.327292] kthread (kernel/kthread.c:388)
      [   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
      
      The problem is because bpf_arch_text_copy() silently fails to write to the
      read-only area as a result of patch_map() faulting and the resulting
      -EFAULT being chucked away.
      
      Update patch_map() to use CONFIG_EXECMEM instead of
      CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
      
      Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
      Fixes: 2c9e5d4a ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reported-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.comTested-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b1480ed2
    • Dev Jain's avatar
      selftests/mm: compaction_test: fix bogus test success and reduce probability... · fb9293b6
      Dev Jain authored
      selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
      
      Reset nr_hugepages to zero before the start of the test.
      
      If a non-zero number of hugepages is already set before the start of the
      test, the following problems arise:
      
       - The probability of the test getting OOM-killed increases.  Proof:
         The test wants to run on 80% of available memory to prevent OOM-killing
         (see original code comments).  Let the value of mem_free at the start
         of the test, when nr_hugepages = 0, be x.  In the other case, when
         nr_hugepages > 0, let the memory consumed by hugepages be y.  In the
         former case, the test operates on 0.8 * x of memory.  In the latter,
         the test operates on 0.8 * (x - y) of memory, with y already filled,
         hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
         x.  Q.E.D
      
       - The probability of a bogus test success increases.  Proof: Let the
         memory consumed by hugepages be greater than 25% of x, with x and y
         defined as above.  The definition of compaction_index is c_index = (x -
         y)/z where z is the memory consumed by hugepages after trying to
         increase them again.  In check_compaction(), we set the number of
         hugepages to zero, and then increase them back; the probability that
         they will be set back to consume at least y amount of memory again is
         very high (since there is not much delay between the two attempts of
         changing nr_hugepages).  Hence, z >= y > (x/4) (by the 25% assumption).
         Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
         hence, c_index can always be forced to be less than 3, thereby the test
         succeeding always.  Q.E.D
      
      Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
      Fixes: bd67d5c1 ("Test compaction of mlocked memory")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sri Jayaramappa <sjayaram@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb9293b6
    • Dev Jain's avatar
      selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages · 9ad665ef
      Dev Jain authored
      Currently, the test tries to set nr_hugepages to zero, but that is not
      actually done because the file offset is not reset after read().  Fix that
      using lseek().
      
      Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
      Fixes: bd67d5c1 ("Test compaction of mlocked memory")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sri Jayaramappa <sjayaram@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9ad665ef