1. 17 Jun, 2024 5 commits
    • Linus Torvalds's avatar
      Revert "mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default" · 14d7c92f
      Linus Torvalds authored
      This reverts commit 3afb76a6.
      
      This was a wrongheaded workaround for an issue that had already been
      fixed much better by commit 4ef9ad19 ("mm: huge_memory: don't force
      huge page alignment on 32 bit").
      
      Asking users questions at kernel compile time that they can't make sense
      of is not a viable strategy.  And the fact that even the kernel VM
      maintainers apparently didn't catch that this "fix" is not a fix any
      more pretty much proves the point that people can't be expected to
      understand the implications of the question.
      
      It may well be the case that we could improve things further, and that
      __thp_get_unmapped_area() should take the mapping randomization into
      account even for 64-bit kernels.  Maybe we should not be so eager to use
      THP mappings.
      
      But in no case should this be a kernel config option.
      
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14d7c92f
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of... · e6b324fb
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"
      
      * tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        kcov: don't lose track of remote references during softirqs
        mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
        mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
        mm: fix possible OOB in numa_rebuild_large_mapping()
        mm/migrate: fix kernel BUG at mm/compaction.c:2761!
        selftests: mm: make map_fixed_noreplace test names stable
        mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
        mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
        gcov: add support for GCC 14
        zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
        mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
        lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
        lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
        MAINTAINERS: remove Lorenzo as vmalloc reviewer
        Revert "mm: init_mlocked_on_free_v3"
        mm/page_table_check: fix crash on ZONE_DEVICE
        gcc: disable '-Warray-bounds' for gcc-9
        ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
        ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
      e6b324fb
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 5cf81d7b
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - yama: document function parameter (Christian Göttsche)
      
       - mm/util: Swap kmemdup_array() arguments (Jean-Philippe Brucker)
      
       - kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX()
      
       - MAINTAINERS: Update entries for Kees Cook
      
      * tag 'hardening-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        MAINTAINERS: Update entries for Kees Cook
        kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX()
        yama: document function parameter
        mm/util: Swap kmemdup_array() arguments
      5cf81d7b
    • Kees Cook's avatar
      MAINTAINERS: Update entries for Kees Cook · 1ab1a422
      Kees Cook authored
      Update current email address for Kees Cook in the MAINTAINER file to
      match the change from commit 4e173c82 ("mailmap: update entry for
      Kees Cook").
      
      Link: https://lore.kernel.org/r/20240617181257.work.206-kees@kernel.orgSigned-off-by: default avatarKees Cook <kees@kernel.org>
      1ab1a422
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20240616' of... · 6226e749
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20240616' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull Hyper-V fixes from Wei Liu:
      
       - Some cosmetic changes for hv.c and balloon.c (Aditya Nagesh)
      
       - Two documentation updates (Michael Kelley)
      
       - Suppress the invalid warning for packed member alignment (Saurabh
         Sengar)
      
       - Two hv_balloon fixes (Michael Kelley)
      
      * tag 'hyperv-fixes-signed-20240616' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        Drivers: hv: Cosmetic changes for hv.c and balloon.c
        Documentation: hyperv: Improve synic and interrupt handling description
        Documentation: hyperv: Update spelling and fix typo
        tools: hv: suppress the invalid warning for packed member alignment
        hv_balloon: Enable hot-add for memblock sizes > 128 MiB
        hv_balloon: Use kernel macros to simplify open coded sequences
      6226e749
  2. 16 Jun, 2024 14 commits
    • Linus Torvalds's avatar
      Linux 6.10-rc4 · 6ba59ff4
      Linus Torvalds authored
      6ba59ff4
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.10-rc4' of... · 6456c425
      Linus Torvalds authored
      Merge tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
      
      Pull parisc fix from Helge Deller:
       "On parisc we have suffered since years from random segfaults which
        seem to have been triggered due to cache inconsistencies. Those
        segfaults happened more often on machines with PA8800 and PA8900 CPUs,
        which have much bigger caches than the earlier machines.
      
        Dave Anglin has worked over the last few weeks to fix this bug. His
        patch has been successfully tested by various people on various
        machines and with various kernels (6.6, 6.8 and 6.9), and the debian
        buildd servers haven't shown a single random segfault with this patch.
      
        Since the cache handling has been reworked, the patch is slightly
        bigger than I would like in this stage, but the greatly improved
        stability IMHO justifies the inclusion now"
      
      * tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Try to fix random segmentation faults in package builds
      6456c425
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 4301487e
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Two fixes to correctly report i2c functionality, ensuring that
        I2C_FUNC_SLAVE is reported when a device operates solely as a slave
        interface"
      
      * tag 'i2c-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: designware: Fix the functionality flags of the slave-only interface
        i2c: at91: Fix the functionality flags of the slave-only interface
      4301487e
    • Linus Torvalds's avatar
      Merge tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · b5beaa44
      Linus Torvalds authored
      Pull USB / Thunderbolt fixes from Greg KH:
       "Here are some small USB and Thunderbolt driver fixes for 6.10-rc4.
        Included in here are:
      
         - thunderbolt debugfs bugfix
      
         - USB typec bugfixes
      
         - kcov usb bugfix
      
         - xhci bugfixes
      
         - usb-storage bugfix
      
         - dt-bindings bugfix
      
         - cdc-wdm log message spam bugfix
      
        All of these, except for the last cdc-wdm log level change, have been
        in linux-next for a while with no reported problems. The cdc-wdm
        bugfix has been tested by syzbot and proved to fix the reported cpu
        lockup issues when the log is constantly spammed by a broken device"
      
      * tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
        xhci: Handle TD clearing for multiple streams case
        xhci: Apply broken streams quirk to Etron EJ188 xHCI host
        xhci: Apply reset resume quirk to Etron EJ188 xHCI host
        xhci: Set correct transferred length for cancelled bulk transfers
        usb-storage: alauda: Check whether the media is initialized
        usb: typec: ucsi: Ack also failed Get Error commands
        kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
        dt-bindings: usb: realtek,rts5411: Add missing "additionalProperties" on child nodes
        usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
        usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps
        USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected
        usb: typec: ucsi: glink: increase max ports for x1e80100
        Revert "usb: chipidea: move ci_ulpi_init after the phy initialization"
        thunderbolt: debugfs: Fix margin debugfs node creation condition
      b5beaa44
    • Linus Torvalds's avatar
      Merge tag 'tty-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 6efc63a8
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small tty and serial driver fixes that resolve som
        reported problems. Included in here are:
      
         - n_tty lookahead buffer bugfix
      
         - WARN_ON() removal where it was not needed
      
         - 8250_dw driver bugfixes
      
         - 8250_pxa bugfix
      
         - sc16is7xx Kconfig fixes for reported build issues
      
        All of these have been in linux-next for over a week with no reported
        problems"
      
      * tag 'tty-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: drop debugging WARN_ON_ONCE() from uart_write()
        serial: sc16is7xx: re-add Kconfig SPI or I2C dependency
        serial: sc16is7xx: rename Kconfig CONFIG_SERIAL_SC16IS7XX_CORE
        serial: port: Don't block system suspend even if bytes are left to xmit
        serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
        serial: 8250_dw: Revert "Move definitions to the shared header"
        serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw
        tty: n_tty: Fix buffer offsets when lookahead is used
      6efc63a8
    • Linus Torvalds's avatar
      Merge tag 'staging-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · d3e6dc4f
      Linus Torvalds authored
      Pull staging driver fix from Greg KH:
       "Here is a single staging driver fix, for the vc04 driver. It resolves
        a reported problem that showed up in the merge window set of changes.
      
        It's been in linux-next for over a week with no reported problems"
      
      * tag 'staging-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: vchiq_debugfs: Fix NPD in vchiq_dump_state
      d3e6dc4f
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.10-rc4' of... · e12fa4dd
      Linus Torvalds authored
      Merge tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core and sysfs fixes from Greg KH:
       "Here are three small changes for 6.10-rc4 that resolve reported
        problems, and finally drop an unused api call. These are:
      
         - removal of devm_device_add_groups(), all the callers of this are
           finally gone after the 6.10-rc1 merge (changes came in through
           different trees), so it's safe to remove.
      
         - much reported sysfs build error fixed up for systems that did not
           have sysfs enabled
      
         - driver core sync issue fix for a many reported issue over the years
           that no one really paid much attention to, until Dirk finally
           tracked down the real issue and made the "obviously correct and
           simple" fix for it.
      
        All of these have been in linux-next for over a week with no reported
        problems"
      
      * tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        drivers: core: synchronize really_probe() and dev_uevent()
        sysfs: Unbreak the build around sysfs_bin_attr_simple_read()
        driver core: remove devm_device_add_groups()
      e12fa4dd
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 33f855cb
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a number of small char/misc and iio driver fixes for
        6.10-rc4. Included in here are the following:
      
         - iio driver fixes for a bunch of reported problems.
      
         - mei driver fixes for a number of reported issues.
      
         - amiga parport driver build fix.
      
         - .editorconfig fix that was causing lots of unintended whitespace
           changes to happen to files when they were being edited. Unless we
           want to sweep the whole tree and remove all trailing whitespace at
           once, this is needed for the .editorconfig file to be able to be
           used at all. This change is required because the original
           submitters never touched older files in the tree.
      
         - jfs bugfix for a buffer overflow
      
        The jfs bugfix is in here as I didn't know where else to put it, and
        it's been ignored for a while as the filesystem seems to be abandoned
        and I'm tired of seeing the same issue reported in multiple places.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (25 commits)
        .editorconfig: remove trim_trailing_whitespace option
        jfs: xattr: fix buffer overflow for invalid xattr
        misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
        misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
        parport: amiga: Mark driver struct with __refdata to prevent section mismatch
        mei: vsc: Fix wrong invocation of ACPI SID method
        mei: vsc: Don't stop/restart mei device during system suspend/resume
        mei: me: release irq in mei_me_pci_resume error path
        mei: demote client disconnect warning on suspend to debug
        iio: inkern: fix channel read regression
        iio: imu: inv_mpu6050: stabilized timestamping in interrupt
        iio: adc: ad7173: Fix sampling frequency setting
        iio: adc: ad7173: Clear append status bit
        iio: imu: inv_icm42600: delete unneeded update watermark call
        iio: imu: inv_icm42600: stabilized timestamp in interrupt
        iio: invensense: fix odr switching to same value
        iio: adc: ad7173: Remove index from temp channel
        iio: adc: ad7173: Add ad7173_device_info names
        iio: adc: ad7173: fix buffers enablement for ad7176-2
        iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
        ...
      33f855cb
    • Linus Torvalds's avatar
      Merge tag 'ata-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · e8b0264d
      Linus Torvalds authored
      Pull ata fix from Niklas Cassel:
       "Fix a bug where the SCSI Removable Media Bit (RMB) was incorrectly set
        for hot-plug capable (and eSATA) ports.
      
        The RMB bit means that the media is removable (e.g. floppy or CD-ROM),
        not that the device server is removable. If the RMB bit is set, SCSI
        will set the removable media sysfs attribute.
      
        If the removable media sysfs attribute is set on a device,
        GNOME/udisks will automatically mount the device on boot.
      
        We only want to set the SCSI RMB bit (and thus the removable media
        sysfs attribute) for devices where the ATA removable media device bit
        is set"
      
      * tag 'ata-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        ata: libata-scsi: Set the RMB bit only for removable media devices
      e8b0264d
    • Linus Torvalds's avatar
      Merge tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · e39388e4
      Linus Torvalds authored
      Pull EDAC fixes from Borislav Petkov:
      
       - Fix two issues with MI300 address translation logic
      
      * tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
        RAS/AMD/ATL: Fix MI300 bank hash
      e39388e4
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.10-rc4' of... · be2fa886
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fixes from Takashi Sakamoto:
      
       - Update tracepoints events introduced in v6.10-rc1 so that it includes
         the numeric identifier of host card in which the event happens
      
       - replace wiki URL with the current website URL in Kconfig
      
      * tag 'firewire-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: core: record card index in bus_reset_handle tracepoints event
        firewire: core: record card index in tracepoinrts events derived from bus_reset_arrange_template
        firewire: core: record card index in async_phy_inbound tracepoints event
        firewire: core: record card index in async_phy_outbound_complete tracepoints event
        firewire: core: record card index in async_phy_outbound_initiate tracepoints event
        firewire: core: record card index in tracepoinrts events derived from async_inbound_template
        firewire: core: record card index in tracepoinrts events derived from async_outbound_initiate_template
        firewire: core: record card index in tracepoinrts events derived from async_outbound_complete_template
        firewire: fix website URL in Kconfig
      be2fa886
    • Hans de Goede's avatar
      leds: class: Revert: "If no default trigger is given, make hw_control trigger the default trigger" · fcf2a997
      Hans de Goede authored
      Commit 66601a29 ("leds: class: If no default trigger is given, make
      hw_control trigger the default trigger") causes ledtrig-netdev to get
      set as default trigger on various network LEDs.
      
      This causes users to hit a pre-existing AB-BA deadlock issue in
      ledtrig-netdev between the LED-trigger locks and the rtnl mutex,
      resulting in hung tasks in kernels >= 6.9.
      
      Solving the deadlock is non trivial, so for now revert the change to
      set the hw_control trigger as default trigger, so that ledtrig-netdev
      no longer gets activated automatically for various network LEDs.
      
      The netdev trigger is not needed because the network LEDs are usually under
      hw-control and the netdev trigger tries to leave things that way so setting
      it as the active trigger for the LED class device is a no-op.
      
      Fixes: 66601a29 ("leds: class: If no default trigger is given, make hw_control trigger the default trigger")
      Reported-by: default avatarGenes Lists <lists@sapience.com>
      Closes: https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@sapience.com/Reported-by: default avatarJohannes Wüller <johanneswueller@gmail.com>
      Closes: https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarLee Jones <lee@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcf2a997
    • Wolfram Sang's avatar
      Merge tag 'i2c-host-fixes-6.10-rc4' of... · 7e9bb0cb
      Wolfram Sang authored
      Merge tag 'i2c-host-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
      
      Two fixes from Jean aim to correctly report i2c functionality,
      specifically ensuring that I2C_FUNC_SLAVE is reported when a
      device operates solely as a slave interface.
      7e9bb0cb
    • Yazen Ghannam's avatar
      RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation · ba437905
      Yazen Ghannam authored
      The currently used normalized address format is not applicable to all
      MI300 systems. This leads to incorrect results during address
      translation.
      
      Drop the fixed layout and construct the normalized address from system
      settings.
      
      Fixes: 87a61237 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")
      Signed-off-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-2-2f11547a178c@amd.com
      ba437905
  3. 15 Jun, 2024 21 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · a3e18a54
      Linus Torvalds authored
      Pull xfs fix from Chandan Babu:
       "Ensure xfs incore superblock's allocated inode counter, free inode
        counter, and free data block counter are all zero or positive when
        they are copied over from xfs_mount->m_[icount,ifree,fdblocks]
        respectively"
      
      * tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: make sure sb_fdblocks is non-negative
      a3e18a54
    • Linus Torvalds's avatar
      Merge tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd · 62e1f3b3
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two small smb3 server fixes:
      
         - set xatttr fix
      
         - pathname parsing check fix"
      
      * tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: fix missing use of get_write in in smb2_set_ea()
        ksmbd: move leading slash check to smb2_get_name()
      62e1f3b3
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 08a6b55a
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - Fix the 8 bytes get_user() logic on x86-32
      
       - Fix build bug that creates weird & mistaken target directory under
         arch/x86/
      
      * tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot: Don't add the EFI stub to targets, again
        x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
      08a6b55a
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 41d70722
      Linus Torvalds authored
      Pull timer fix from Ingo Molnar:
       "Fix boot-time warning in tick_setup_device()"
      
      * tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
      41d70722
    • Aleksandr Nogikh's avatar
      kcov: don't lose track of remote references during softirqs · 01c8f980
      Aleksandr Nogikh authored
      In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
      metadata of the current task into a per-CPU variable.  However, the
      kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
      coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
      KCOV objects.
      
      If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
      to get interrupted and kcov_remote_start() is called, it ultimately leads
      to kcov_remote_stop() NOT restoring the original KCOV reference.  So when
      the task exits, all registered remote KCOV handles remain active forever.
      
      The most uncomfortable effect (at least for syzkaller) is that the bug
      prevents the reuse of the same /sys/kernel/debug/kcov descriptor.  If
      we obtain it in the parent process and then e.g.  drop some
      capabilities and continuously fork to execute individual programs, at
      some point current->kcov of the forked process is lost,
      kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
      calls from subsequent forks fail.
      
      And, yes, the efficiency is also affected if we keep on losing remote
      kcov objects.
      a) kcov_remote_map keeps on growing forever.
      b) (If I'm not mistaken), we're also not freeing the memory referenced
      by kcov->area.
      
      Fix it by introducing a special kcov_mode that is assigned to the task
      that owns a KCOV remote object.  It makes kcov_mode_enabled() return true
      and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
      and write_comp_data().
      
      [nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment]
        Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com
      Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com
      Fixes: 5ff3b30a ("kcov: collect coverage from interrupts")
      Signed-off-by: default avatarAleksandr Nogikh <nogikh@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marco Elver <elver@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      01c8f980
    • Baolin Wang's avatar
      mm: shmem: fix getting incorrect lruvec when replacing a shmem folio · 9094b4a1
      Baolin Wang authored
      When testing shmem swapin, I encountered the warning below on my machine. 
      The reason is that replacing an old shmem folio with a new one causes
      mem_cgroup_migrate() to clear the old folio's memcg data.  As a result,
      the old folio cannot get the correct memcg's lruvec needed to remove
      itself from the LRU list when it is being freed.  This could lead to
      possible serious problems, such as LRU list crashes due to holding the
      wrong LRU lock, and incorrect LRU statistics.
      
      To fix this issue, we can fallback to use the mem_cgroup_replace_folio()
      to replace the old shmem folio.
      
      [ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960
      [ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff)
      [ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
      [ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
      [ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000
      [ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000
      [ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
      [ 5241.100338] ------------[ cut here ]------------
      [ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150
      [...]
      [ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150
      [ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150
      [ 5241.100376] sp : ffff80008b38b930
      [...]
      [ 5241.100398] Call trace:
      [ 5241.100399]  folio_lruvec_lock_irqsave+0x140/0x150
      [ 5241.100401]  __page_cache_release+0x90/0x300
      [ 5241.100404]  __folio_put+0x50/0x108
      [ 5241.100406]  shmem_replace_folio+0x1b4/0x240
      [ 5241.100409]  shmem_swapin_folio+0x314/0x528
      [ 5241.100411]  shmem_get_folio_gfp+0x3b4/0x930
      [ 5241.100412]  shmem_fault+0x74/0x160
      [ 5241.100414]  __do_fault+0x40/0x218
      [ 5241.100417]  do_shared_fault+0x34/0x1b0
      [ 5241.100419]  do_fault+0x40/0x168
      [ 5241.100420]  handle_pte_fault+0x80/0x228
      [ 5241.100422]  __handle_mm_fault+0x1c4/0x440
      [ 5241.100424]  handle_mm_fault+0x60/0x1f0
      [ 5241.100426]  do_page_fault+0x120/0x488
      [ 5241.100429]  do_translation_fault+0x4c/0x68
      [ 5241.100431]  do_mem_abort+0x48/0xa0
      [ 5241.100434]  el0_da+0x38/0xc0
      [ 5241.100436]  el0t_64_sync_handler+0x68/0xc0
      [ 5241.100437]  el0t_64_sync+0x14c/0x150
      [ 5241.100439] ---[ end trace 0000000000000000 ]---
      
      [baolin.wang@linux.alibaba.com: remove less helpful comments, per Matthew]
        Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com
      Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com
      Fixes: 85ce2c51 ("memcontrol: only transfer the memcg data for migration")
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9094b4a1
    • Peter Xu's avatar
      mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick · 0b1ef4fd
      Peter Xu authored
      Macro RANDOM_ORVALUE was used to make sure the pgtable entry will be
      populated with !none data in clear tests.
      
      The RANDOM_ORVALUE tried to cover mostly all the bits in a pgtable entry,
      even if there's no discussion on whether all the bits will be vaild.  Both
      S390 and PPC64 have their own masks to avoid touching some bits.  Now it's
      the turn for x86_64.
      
      The issue is there's a recent report from Mikhail Gavrilov showing that
      this can cause a warning with the newly added pte set check in commit
      8430557f on writable v.s.  userfaultfd-wp bit, even though the check
      itself was valid, the random pte is not.  We can choose to mask more bits
      out.
      
      However the need to have such random bits setup is questionable, as now
      it's already guaranteed to be true on below:
      
        - For pte level, the pgtable entry will be installed with value from
          pfn_pte(), where pfn points to a valid page.  Hence the pte will be
          !none already if populated with pfn_pte().
      
        - For upper-than-pte level, the pgtable entry should contain a directory
          entry always, which is also !none.
      
      All the cases look like good enough to test a pxx_clear() helper.  Instead
      of extending the bitmask, drop the "set random bits" trick completely.  Add
      some warning guards to make sure the entries will be !none before clear().
      
      Link: https://lkml.kernel.org/r/20240523132139.289719-1-peterx@redhat.com
      Fixes: 8430557f ("mm/page_table_check: support userfault wr-protect entries")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Link: https://lore.kernel.org/r/CABXGCsMB9A8-X+Np_Q+fWLURYL_0t3Y-MdoNabDM-Lzk58-DGA@mail.gmail.comTested-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0b1ef4fd
    • Kefeng Wang's avatar
      mm: fix possible OOB in numa_rebuild_large_mapping() · cfdd12b4
      Kefeng Wang authored
      The large folio is mapped with folio size(not greater PMD_SIZE) aligned
      virtual address during the pagefault, ie, 'addr = ALIGN_DOWN(vmf->address,
      nr_pages * PAGE_SIZE)' in do_anonymous_page().  But after the mremap(),
      the virtual address only requires PAGE_SIZE alignment.  Also pte is moved
      to new in move_page_tables(), then traversal of the new pte in the
      numa_rebuild_large_mapping() could hit the following issue,
      
         Unable to handle kernel paging request at virtual address 00000a80c021a788
         Mem abort info:
           ESR = 0x0000000096000004
           EC = 0x25: DABT (current EL), IL = 32 bits
           SET = 0, FnV = 0
           EA = 0, S1PTW = 0
           FSC = 0x04: level 0 translation fault
         Data abort info:
           ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
           CM = 0, WnR = 0, TnD = 0, TagAccess = 0
           GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
         user pgtable: 4k pages, 48-bit VAs, pgdp=00002040341a6000
         [00000a80c021a788] pgd=0000000000000000, p4d=0000000000000000
         Internal error: Oops: 0000000096000004 [#1] SMP
         ...
         CPU: 76 PID: 15187 Comm: git Kdump: loaded Tainted: G        W          6.10.0-rc2+ #209
         Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.79 08/21/2021
         pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
         pc : numa_rebuild_large_mapping+0x338/0x638
         lr : numa_rebuild_large_mapping+0x320/0x638
         sp : ffff8000b41c3b00
         x29: ffff8000b41c3b30 x28: ffff8000812a0000 x27: 00000000000a8000
         x26: 00000000000000a8 x25: 0010000000000001 x24: ffff20401c7170f0
         x23: 0000ffff33a1e000 x22: 0000ffff33a76000 x21: ffff20400869eca0
         x20: 0000ffff33976000 x19: 00000000000000a8 x18: ffffffffffffffff
         x17: 0000000000000000 x16: 0000000000000020 x15: ffff8000b41c36a8
         x14: 0000000000000000 x13: 205d373831353154 x12: 5b5d333331363732
         x11: 000000000011ff78 x10: 000000000011ff10 x9 : ffff800080273f30
         x8 : 000000320400869e x7 : c0000000ffffd87f x6 : 00000000001e6ba8
         x5 : ffff206f3fb5af88 x4 : 0000000000000000 x3 : 0000000000000000
         x2 : 0000000000000000 x1 : fffffdffc0000000 x0 : 00000a80c021a780
         Call trace:
          numa_rebuild_large_mapping+0x338/0x638
          do_numa_page+0x3e4/0x4e0
          handle_pte_fault+0x1bc/0x238
          __handle_mm_fault+0x20c/0x400
          handle_mm_fault+0xa8/0x288
          do_page_fault+0x124/0x498
          do_translation_fault+0x54/0x80
          do_mem_abort+0x4c/0xa8
          el0_da+0x40/0x110
          el0t_64_sync_handler+0xe4/0x158
          el0t_64_sync+0x188/0x190
      
      Fix it by making the start and end not only within the vma range, but also
      within the page table range.
      
      Link: https://lkml.kernel.org/r/20240612122822.4033433-1-wangkefeng.wang@huawei.com
      Fixes: d2136d74 ("mm: support multi-size THP numa balancing")
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Liu Shixin <liushixin2@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cfdd12b4
    • Hugh Dickins's avatar
      mm/migrate: fix kernel BUG at mm/compaction.c:2761! · 8e279f97
      Hugh Dickins authored
      I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and
      if DEBUG_VM were off, then pages would be lost on a local list.
      
      Our convention is that if migrate_pages() reports complete success (0),
      then the migratepages list will be empty; but if it reports an error or
      some pages remaining, then its caller must putback_movable_pages().
      
      There's a new case in which migrate_pages() has been reporting complete
      success, but returning with pages left on the migratepages list: when
      migrate_pages_batch() successfully split a folio on the deferred list, but
      then the "Failure isn't counted" call does not dispose of them all.
      
      Since that block is expecting the large folio to have been counted as 1
      failure already, and since the return code is later adjusted to success
      whenever the returned list is found empty, the simple way to fix this
      safely is to count splitting the deferred folio as "a failure".
      
      Link: https://lkml.kernel.org/r/46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com
      Fixes: 7262f208 ("mm/migrate: split source folio if it is on deferred split list")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8e279f97
    • Mark Brown's avatar
      selftests: mm: make map_fixed_noreplace test names stable · e7d2a28b
      Mark Brown authored
      KTAP parsers interpret the output of ksft_test_result_*() as being the
      name of the test.  The map_fixed_noreplace test uses a dynamically
      allocated base address for the mmap()s that it tests and currently
      includes this in the test names that it logs so the test names that are
      logged are not stable between runs.  It also uses multiples of PAGE_SIZE
      which mean that runs for kernels with different PAGE_SIZE configurations
      can't be directly compared.  Both these factors cause issues for CI
      systems when interpreting and displaying results.
      
      Fix this by replacing the current test names with fixed strings describing
      the intent of the mappings that are logged, the existing messages with the
      actual addresses and sizes are retained as diagnostic prints to aid in
      debugging.
      
      Link: https://lkml.kernel.org/r/20240605-kselftest-mm-fixed-noreplace-v1-1-a235db8b9be9@kernel.org
      Fixes: 4838cf70 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e7d2a28b
    • Jeff Xu's avatar
      mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC · 653c5c75
      Jeff Xu authored
      When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't
      have proper documentation.  This led to a lot of confusion, especially
      about whether or not memfd created with the MFD_NOEXEC_SEAL flag is
      sealable.  Before MFD_NOEXEC_SEAL, memfd had to explicitly set
      MFD_ALLOW_SEALING to be sealable, so it's a fair question.
      
      As one might have noticed, unlike other flags in memfd_create,
      MFD_NOEXEC_SEAL is actually a combination of multiple flags.  The idea is
      to make it easier to use memfd in the most common way, which is NOEXEC +
      F_SEAL_EXEC + MFD_ALLOW_SEALING.  This works with sysctl vm.noexec to help
      existing applications move to a more secure way of using memfd.
      
      Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
      MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those
      are based on the viewpoint that each flag is an atomic unit, which is a
      reasonable assumption.  However, MFD_NOEXEC_SEAL was designed with the
      intent of promoting the most secure method of using memfd, therefore a
      combination of multiple functionalities into one bit.
      
      Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year,
      and multiple applications and distributions have backported and utilized
      it.  Altering ABI now presents a degree of risk and may lead to
      disruption.
      
      MFD_NOEXEC_SEAL is a new flag, and applications must change their code to
      use it.  There is no backward compatibility problem.
      
      When sysctl vm.noexec == 1 or 2, applications that don't set
      MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd.  And
      old-application might break, that is by-design, in such a system vm.noexec
      = 0 shall be used.  Also no backward compatibility problem.
      
      I propose to include this documentation patch to assist in clarifying the
      semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future
      confusion.
      
      Finally, I would like to express my gratitude to David Rheinsberg and
      Barnabás Pőcze for initiating the discussion on the topic of sealability.
      
      [1]
      https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
      
      [jeffxu@chromium.org: updates per Randy]
        Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@chromium.org
      [jeffxu@chromium.org: v3]
        Link: https://lkml.kernel.org/r/20240611231409.3899809-2-jeffxu@chromium.org
      Link: https://lkml.kernel.org/r/20240607203543.2151433-2-jeffxu@google.comSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Barnabás Pőcze <pobrn@protonmail.com>
      Cc: Daniel Verkamp <dverkamp@chromium.org>
      Cc: David Rheinsberg <david@readahead.eu>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      653c5c75
    • Rafael Aquini's avatar
      mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default · 3afb76a6
      Rafael Aquini authored
      An ASLR regression was noticed [1] and tracked down to file-mapped areas
      being backed by THP in recent kernels.  The 21-bit alignment constraint
      for such mappings reduces the entropy for randomizing the placement of
      64-bit library mappings and breaks ASLR completely for 32-bit libraries.
      
      The reported issue is easily addressed by increasing vm.mmap_rnd_bits and
      vm.mmap_rnd_compat_bits.  This patch just provides a simple way to set
      ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values
      allowed by the architecture at build time.
      
      [1] https://zolutal.github.io/aslrnt/
      
      [akpm@linux-foundation.org: default to `y' if 32-bit, per Rafael]
      Link: https://lkml.kernel.org/r/20240606180622.102099-1-aquini@redhat.com
      Fixes: 1854bc6e ("mm/readahead: Align file mappings for non-DAX")
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Samuel Holland <samuel.holland@sifive.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3afb76a6
    • Peter Oberparleiter's avatar
      gcov: add support for GCC 14 · c1558bc5
      Peter Oberparleiter authored
      Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
      long .gcda files with no usable data.  To fix this, update GCOV_COUNTERS
      to match the value defined by GCC 14.
      
      Tested with GCC versions 14.1.0 and 13.2.0.
      
      Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.comSigned-off-by: default avatarPeter Oberparleiter <oberpar@linux.ibm.com>
      Reported-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reported-by: default avatarChuck Lever III <chuck.lever@oracle.com>
      Tested-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c1558bc5
    • Oleg Nesterov's avatar
      zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING · 7fea700e
      Oleg Nesterov authored
      kernel_wait4() doesn't sleep and returns -EINTR if there is no
      eligible child and signal_pending() is true.
      
      That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
      enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
      return false and avoid a busy-wait loop.
      
      Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com
      Fixes: 12db8b69 ("entry: Add support for TIF_NOTIFY_SIGNAL")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarRachel Menge <rachelmenge@linux.microsoft.com>
      Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Tested-by: default avatarWei Fu <fuweid89@gmail.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Allen Pais <apais@linux.microsoft.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Joel Granados <j.granados@samsung.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Mike Christie <michael.christie@oracle.com>
      Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: Zqiang <qiang.zhang1211@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7fea700e
    • Ran Xiaokai's avatar
      mm: huge_memory: fix misused mapping_large_folio_support() for anon folios · 6a50c9b5
      Ran Xiaokai authored
      When I did a large folios split test, a WARNING "[ 5059.122759][ T166]
      Cannot split file folio to non-0 order" was triggered.  But the test cases
      are only for anonmous folios.  while mapping_large_folio_support() is only
      reasonable for page cache folios.
      
      In split_huge_page_to_list_to_order(), the folio passed to
      mapping_large_folio_support() maybe anonmous folio.  The folio_test_anon()
      check is missing.  So the split of the anonmous THP is failed.  This is
      also the same for shmem_mapping().  We'd better add a check for both.  But
      the shmem_mapping() in __split_huge_page() is not involved, as for
      anonmous folios, the end parameter is set to -1, so (head[i].index >= end)
      is always false.  shmem_mapping() is not called.
      
      Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon
      mapping, So we can detect the wrong use more easily.
      
      THP folios maybe exist in the pagecache even the file system doesn't
      support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is
      enabled, khugepaged will try to collapse read-only file-backed pages to
      THP.  But the mapping does not actually support multi order large folios
      properly.
      
      Using /sys/kernel/debug/split_huge_pages to verify this, with this patch,
      large anon THP is successfully split and the warning is ceased.
      
      Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn
      Fixes: c010d47f ("mm: thp: split huge page to any lower order pages")
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRan Xiaokai <ran.xiaokai@zte.com.cn>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: xu xin <xu.xin16@zte.com.cn>
      Cc: Yang Yang <yang.yang29@zte.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6a50c9b5
    • Suren Baghdasaryan's avatar
      lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get() · a273559e
      Suren Baghdasaryan authored
      put_page_tag_ref() should be called only when get_page_tag_ref() returns a
      valid reference because only in that case get_page_tag_ref() enters RCU
      read section while put_page_tag_ref() will call rcu_read_unlock() even if
      the provided reference is NULL.  Fix pgalloc_tag_get() which does not
      follow this rule causing RCU imbalance.  Add a warning in
      put_page_tag_ref() to catch any future mistakes.
      
      Link: https://lkml.kernel.org/r/20240601233840.617458-1-surenb@google.com
      Fixes: cc92eba1 ("mm: fix non-compound multi-order memory accounting in __free_pages")
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202405271029.6d2f9c4c-lkp@intel.comAcked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a273559e
    • Suren Baghdasaryan's avatar
      lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n · c944bf60
      Suren Baghdasaryan authored
      Memory allocation profiling is trying to register sysctl interface even
      when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined. 
      Prevent that by skipping sysctl registration for such configurations.
      
      Link: https://lkml.kernel.org/r/20240601233831.617124-1-surenb@google.com
      Fixes: 22d407b1 ("lib: add allocation tagging support for memory allocation profiling")
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202405280616.wcOGWJEj-lkp@intel.com/Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c944bf60
    • Lorenzo Stoakes's avatar
      MAINTAINERS: remove Lorenzo as vmalloc reviewer · 3ab85f40
      Lorenzo Stoakes authored
      I haven't had the bandwidth to review vmalloc patches recently and I
      suspect I won't be able to do so consistently moving forwards, so I think
      it's best if I remove myself as reviewer for the time being.
      
      Link: https://lkml.kernel.org/r/20240602205510.108807-1-lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3ab85f40
    • David Hildenbrand's avatar
      Revert "mm: init_mlocked_on_free_v3" · 384a746b
      David Hildenbrand authored
      There was insufficient review and no agreement that this is the right
      approach.
      
      There are serious flaws with the implementation that make processes using
      mlock() not even work with simple fork() [1] and we get reliable crashes
      when rebooting.
      
      Further, simply because we might be unmapping a single PTE of a large
      mlocked folio, we shouldn't zero out the whole folio.
      
      ... especially because the code can also *corrupt* urelated memory because
      	kernel_init_pages(page, folio_nr_pages(folio));
      
      Could end up writing outside of the actual folio if we work with a tail
      page.
      
      Let's revert it.  Once there is agreement that this is the right approach,
      the issues were fixed and there was reasonable review and proper testing,
      we can consider it again.
      
      [1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com
      
      Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
      Fixes: ba42b524 ("mm: init_mlocked_on_free_v3")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarDavid Wang <00107082@163.com>
      Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/Reported-by: default avatarLance Yang <ioworker0@gmail.com>
      Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.comAcked-by: default avatarLance Yang <ioworker0@gmail.com>
      Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      384a746b
    • Peter Xu's avatar
      mm/page_table_check: fix crash on ZONE_DEVICE · 8bb592c2
      Peter Xu authored
      Not all pages may apply to pgtable check.  One example is ZONE_DEVICE
      pages: they map PFNs directly, and they don't allocate page_ext at all
      even if there's struct page around.  One may reference
      devm_memremap_pages().
      
      When both ZONE_DEVICE and page-table-check enabled, then try to map some
      dax memories, one can trigger kernel bug constantly now when the kernel
      was trying to inject some pfn maps on the dax device:
      
       kernel BUG at mm/page_table_check.c:55!
      
      While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
      fault resolutions, skip all the checks if page_ext doesn't even exist in
      pgtable checker, which applies to ZONE_DEVICE but maybe more.
      
      Link: https://lkml.kernel.org/r/20240605212146.994486-1-peterx@redhat.com
      Fixes: df4e817b ("mm: page table check")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8bb592c2
    • Yury Norov's avatar
      gcc: disable '-Warray-bounds' for gcc-9 · 8e5bd4ea
      Yury Norov authored
      '-Warray-bounds' is already disabled for gcc-10+.  Now that we've merged
      bitmap_{read,write), I see the following error when building the kernel
      with gcc-9.4 (Ubuntu 20.04.4 LTS) for x86_64 allmodconfig:
      
      drivers/pinctrl/pinctrl-cy8c95x0.c: In function `cy8c95x0_read_regs_mask.isra.0':
      include/linux/bitmap.h:756:18: error: array subscript [1, 288230376151711744] is outside array bounds of `long unsigned int[1]' [-Werror=array-bounds]
        756 |  value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
            |               ~~~^~~~~~~~~~~
      
      The immediate reason is that the commit b4475970 ("bitmap: make
      bitmap_{get,set}_value8() use bitmap_{read,write}()") switched the
      bitmap_get_value8() to an alias of bitmap_read(); the same for 'set'.
      
      Now; the code that triggers Warray-bounds, calls the function like this:
      
        #define MAX_BANK 8
        #define BANK_SZ 8
        #define MAX_LINE        (MAX_BANK * BANK_SZ)
        DECLARE_BITMAP(tval, MAX_LINE); // 64-bit map: unsigned long tval[1]
      
        read_val |= bitmap_get_value8(tval, i * BANK_SZ) & ~bits;
      
      bitmap_read() is implemented such that it may conditionally dereference a
      pointer beyond the boundary like this:
      
      	unsigned long offset = start % BITS_PER_LONG;
              unsigned long space = BITS_PER_LONG - offset;
      
              if (space >= nbits)
                      return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
      
              value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
              value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
              return (value_low >> offset) | (value_high << space);
      
      In case of bitmap_get_value8(), it's impossible to violate the boundary
      because 'space >= nbits' is never the true for byte-aligned 8-bit access. 
      So, this is clearly a false-positive.
      
      The same type of false-positives break my allmodconfig build in many
      places.  gcc-8, is clear, however.
      
      Link: https://lkml.kernel.org/r/20240522225830.1201778-1-yury.norov@gmail.com
      Fixes: b4475970 ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()")
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Yoann Congal <yoann.congal@smile.fr>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8e5bd4ea