1. 01 Apr, 2022 4 commits
    • Ming Lei's avatar
      dm: fix bio polling to handle possibile BLK_STS_AGAIN · 52919840
      Ming Lei authored
      Expanded testing of DM's bio polling support (using more fio threads
      to dm-linear ontop of null_blk) exposed the possibility for polled
      bios to hang (repeatedly polling in io_uring) when null_blk responds
      with BLK_STS_AGAIN (due to lack of resources):
      
      1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
         notify kiocb is done, that is the completion interface between block
         layer and io_uring
      
      2) io_complete_rw_iopoll() is called from io_do_iopoll()
      
      3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
         driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
         doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)
      
      4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
         of requests (easier to reproduce by setting low hw_queue_depth).
      
      5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
         retry in dm layer.
      
      This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
      dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
      using queue_io().
      
      Fixes: b99fdcdc ("dm: support bio polling")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      [snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      52919840
    • Mikulas Patocka's avatar
      dm: fix dm_io and dm_target_io flags race condition on Alpha · aad5b23e
      Mikulas Patocka authored
      Early alpha processors cannot write a single byte or short; they read 8
      bytes, modify the value in registers and write back 8 bytes.
      
      This could cause race condition in the structure dm_io - if the fields
      flags and io_count are modified simultaneously.
      
      Fix this bug by using 32-bit flags if we are on Alpha and if we are
      compiling for a processor that doesn't have the byte-word-extension.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Fixes: bd4a6dd2 ("dm: reduce size of dm_io and dm_target_io structs")
      [snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      aad5b23e
    • Mikulas Patocka's avatar
      dm integrity: set journal entry unused when shrinking device · cc09e8a9
      Mikulas Patocka authored
      Commit f6f72f32 ("dm integrity: don't replay journal data past the
      end of the device") skips journal replay if the target sector points
      beyond the end of the device. Unfortunatelly, it doesn't set the
      journal entry unused, which resulted in this BUG being triggered:
      BUG_ON(!journal_entry_is_unused(je))
      
      Fix this by calling journal_entry_set_unused() for this case.
      
      Fixes: f6f72f32 ("dm integrity: don't replay journal data past the end of the device")
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      [snitzer: revised header]
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      cc09e8a9
    • Mikulas Patocka's avatar
      dm ioctl: log an error if the ioctl structure is corrupted · dbdcc906
      Mikulas Patocka authored
      This will help triage bugs when userspace is passing invalid ioctl
      structure to the kernel.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      [snitzer: log errors using DMERR instead of DMWARN]
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      dbdcc906
  2. 26 Mar, 2022 7 commits
    • Linus Torvalds's avatar
      Merge tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block · 3f728213
      Linus Torvalds authored
      Pull block layer 64-bit data integrity support from Jens Axboe:
       "This adds support for 64-bit data integrity in the block layer and in
        NVMe"
      
      * tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block:
        crypto: fix crc64 testmgr digest byte order
        nvme: add support for enhanced metadata
        block: add pi for extended integrity
        crypto: add rocksoft 64b crc guard tag framework
        lib: add rocksoft model crc64
        linux/kernel: introduce lower_48_bits function
        asm-generic: introduce be48 unaligned accessors
        nvme: allow integrity on extended metadata formats
        block: support pi with extended metadata
      3f728213
    • Linus Torvalds's avatar
      Merge tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block · 752d422e
      Linus Torvalds authored
      Pull bio allocation fix from Jens Axboe:
       "We got some reports of users seeing:
      
      	Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x1192888
      
        which is a regression caused by the bio allocation cleanups"
      
      * tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block:
        fs: do not pass __GFP_HIGHMEM to bio_alloc in do_mpage_readpage
      752d422e
    • Linus Torvalds's avatar
      Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block · 561593a0
      Linus Torvalds authored
      Pull NVMe write streams removal from Jens Axboe:
       "This removes the write streams support in NVMe. No vendor ever really
        shipped working support for this, and they are not interested in
        supporting it.
      
        With the NVMe support gone, we have nothing in the tree that supports
        this. Remove passing around of the hints.
      
        The only discussion point in this patchset imho is the fact that the
        file specific write hint setting/getting fcntl helpers will now return
        -1/EINVAL like they did before we supported write hints. No known
        applications use these functions, I only know of one prototype that I
        help do for RocksDB, and that's not used. That said, with a change
        like this, it's always a bit controversial. Alternatively, we could
        just make them return 0 and pretend it worked. It's placement based
        hints after all"
      
      * tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
        fs: remove fs.f_write_hint
        fs: remove kiocb.ki_hint
        block: remove the per-bio/request write hint
        nvme: remove support or stream based temperature hint
      561593a0
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 9bf3fc50
      Linus Torvalds authored
      Pull devicetree updates from Rob Herring:
      
       - Add Krzysztof Kozlowski as co-maintainer for DT bindings providing
         much needed help.
      
       - DT schema validation now takes DTB files as input rather than
         intermediate YAML files. This decouples the validation from the
         source level syntax information. There's a bunch of schema fixes as a
         result of switching to DTB based validation which exposed some errors
         and incomplete schemas and examples.
      
       - Kbuild improvements to explicitly warn users running 'make
         dt_binding_check' on missing yamllint
      
       - Expand DT_SCHEMA_FILES kbuild variable to take just a partial
         filename or path instead of the full path to 1 file.
      
       - Convert various bindings to schema format: mscc,vsc7514-switch,
         multiple GNSS bindings, ahci-platform, i2c-at91, multiple UFS
         bindings, cortina,gemini-sata-bridge, cortina,gemini-ethernet, Atmel
         SHA, Atmel TDES, Atmel AES, armv7m-systick, Samsung Exynos display
         subsystem, nuvoton,npcm7xx-timer, samsung,s3c2410-i2c, zynqmp_dma,
         msm/mdp4, rda,8810pl-uart
      
       - New schemas for u-boot environment variable partition, TI clksel
      
       - New compatible strings for Renesas RZ/V2L SoC
      
       - Vendor prefixes for Xen, HPE, deprecated Synopsys, deprecated
         HiSilicon
      
       - Add/fix schemas for QEMU Arm 'virt' machine
      
       - Drop unused of_alias_get_alias_list() function
      
       - Add a script to check DT unittest EXPECT message output. Pass
         messages also now print by default at PR_INFO level to help test
         automation.
      
      * tag 'devicetree-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (96 commits)
        dt-bindings: kbuild: Make DT_SCHEMA_LINT a recursive variable
        dt-bindings: nvmem: add U-Boot environment variables binding
        dt-bindings: ufs: qcom: Add SM6350 compatible string
        dt-bindings: dmaengine: sifive,fu540-c000: include generic schema
        dt-bindings: gpio: pca95xx: drop useless consumer example
        Revert "of: base: Introduce of_alias_get_alias_list() to check alias IDs"
        dt-bindings: virtio,mmio: Allow setting devices 'dma-coherent'
        dt-bindings: gnss: Add two more chips
        dt-bindings: gnss: Rewrite sirfstar binding in YAML
        dt-bindings: gnss: Modify u-blox to use common bindings
        dt-bindings: gnss: Rewrite common bindings in YAML
        dt-bindings: ata: ahci-platform: Add rk3568-dwc-ahci compatible
        dt-bindings: ata: ahci-platform: Add power-domains property
        dt-bindings: ata: ahci-platform: Convert DT bindings to yaml
        dt-bindings: kbuild: Use DTB files for validation
        dt-bindings: kbuild: Pass DT_SCHEMA_FILES to dt-validate
        dt-bindings: Add QEMU virt machine compatible
        dt-bindings: arm: Convert QEMU fw-cfg to DT schema
        dt-bindings: i2c: at91: Add SAMA7G5 compatible strings list
        dt-bindings: i2c: convert i2c-at91 to json-schema
        ...
      9bf3fc50
    • Linus Torvalds's avatar
      Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · bddac7c1
      Linus Torvalds authored
      This reverts commit aa6f8dcb.
      
      It turns out this breaks at least the ath9k wireless driver, and
      possibly others.
      
      What the ath9k driver does on packet receive is to set up the DMA
      transfer with:
      
        int ath_rx_init(..)
        ..
                      bf->bf_buf_addr = dma_map_single(sc->dev, skb->data,
                                                       common->rx_bufsize,
                                                       DMA_FROM_DEVICE);
      
      and then the receive logic (through ath_rx_tasklet()) will fetch
      incoming packets
      
        static bool ath_edma_get_buffers(..)
        ..
              dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
                                      common->rx_bufsize, DMA_FROM_DEVICE);
      
              ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data);
              if (ret == -EINPROGRESS) {
                      /*let device gain the buffer again*/
                      dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
                                      common->rx_bufsize, DMA_FROM_DEVICE);
                      return false;
              }
      
      and it's worth noting how that first DMA sync:
      
          dma_sync_single_for_cpu(..DMA_FROM_DEVICE);
      
      is there to make sure the CPU can read the DMA buffer (possibly by
      copying it from the bounce buffer area, or by doing some cache flush).
      The iommu correctly turns that into a "copy from bounce bufer" so that
      the driver can look at the state of the packets.
      
      In the meantime, the device may continue to write to the DMA buffer, but
      we at least have a snapshot of the state due to that first DMA sync.
      
      But that _second_ DMA sync:
      
          dma_sync_single_for_device(..DMA_FROM_DEVICE);
      
      is telling the DMA mapping that the CPU wasn't interested in the area
      because the packet wasn't there.  In the case of a DMA bounce buffer,
      that is a no-op.
      
      Note how it's not a sync for the CPU (the "for_device()" part), and it's
      not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part).
      
      Or rather, it _should_ be a no-op.  That's what commit aa6f8dcb
      broke: it made the code bounce the buffer unconditionally, and changed
      the DMA_FROM_DEVICE to just unconditionally and illogically be
      DMA_TO_DEVICE.
      
      [ Side note: purely within the confines of the swiotlb driver it wasn't
        entirely illogical: The reason it did that odd DMA_FROM_DEVICE ->
        DMA_TO_DEVICE conversion thing is because inside the swiotlb driver,
        it uses just a swiotlb_bounce() helper that doesn't care about the
        whole distinction of who the sync is for - only which direction to
        bounce.
      
        So it took the "sync for device" to mean that the CPU must have been
        the one writing, and thought it meant DMA_TO_DEVICE. ]
      
      Also note how the commentary in that commit was wrong, probably due to
      that whole confusion, claiming that the commit makes the swiotlb code
      
                                        "bounce unconditionally (that is, also
          when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
          data from the swiotlb buffer"
      
      which is nonsensical for two reasons:
      
       - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was
         exactly when it always did - and should do - the bounce.
      
       - since this is a sync for the device (not for the CPU), we're clearly
         fundamentally not coping back stale data from the bounce buffers at
         all, because we'd be copying *to* the bounce buffers.
      
      So that commit was just very confused.  It confused the direction of the
      synchronization (to the device, not the cpu) with the direction of the
      DMA (from the device).
      Reported-and-bisected-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Reported-by: default avatarOlha Cherevyk <olha.cherevyk@gmail.com>
      Cc: Halil Pasic <pasic@linux.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Kalle Valo <kvalo@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Toke Høiland-Jørgensen <toke@toke.dk>
      Cc: Maxime Bizon <mbizon@freebox.fr>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bddac7c1
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi · 52d543b5
      Linus Torvalds authored
      Pull IPMI updates from Corey Minyard:
      
       - Little fixes for various things people have noticed.
      
       - One enhancement, the IPMI over IPMB (I2c) is modified to allow it to
         take a separate sender and receiver device. The Raspberry Pi has an
         I2C slave device that cannot send.
      
      * tag 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi:
        ipmi: initialize len variable
        ipmi: kcs: aspeed: Remove old bindings support
        ipmi:ipmb: Add the ability to have a separate slave and master device
        ipmi:ipmi_ipmb: Unregister the SMI on remove
        ipmi: kcs: aspeed: Add AST2600 compatible string
        ipmi: ssif: replace strlcpy with strscpy
        ipmi/watchdog: Constify ident
        ipmi: Add the git repository to the MAINTAINERS file
      52d543b5
    • Linus Torvalds's avatar
      Merge tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · a452c4eb
      Linus Torvalds authored
      Pull reiserfs updates from Jan Kara:
       "The biggest change in this pull is the addition of a deprecation
        message about reiserfs with the outlook that we'd eventually be able
        to remove it from the kernel. Because it is practically unmaintained
        and untested and odd enough that people don't want to bother with it
        anymore...
      
        Otherwise there are small udf and ext2 fixes"
      
      * tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: remove redundant assignment of variable etype
        reiserfs: Deprecate reiserfs
        ext2: correct max file size computing
        reiserfs: get rid of AOP_FLAG_CONT_EXPAND flag
      a452c4eb
  3. 25 Mar, 2022 29 commits
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · a8988507
      Linus Torvalds authored
      Pull fsnotify updates from Jan Kara:
       "A few fsnotify improvements and cleanups"
      
      * tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: remove redundant parameter judgment
        fsnotify: optimize FS_MODIFY events with no ignored masks
        fsnotify: fix merge with parent's ignored mask
      a8988507
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2022-03-25' of git://anongit.freedesktop.org/drm/drm · cb7cbaae
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes were queued up in and in light of the fbdev regressions,
        I've pulled those in as well.
      
        core:
         - Make audio and color plane support checking only happen when a CEA
           extension block is found.
         - Small selftest fix.
      
        fbdev:
         - two regressions fixes from speedup patches.
      
        ttm:
         - Fix a small regression from ttm_resource_fini()
      
        i915:
         - Reject unsupported TMDS rates on ICL+
         - Treat SAGV block time 0 as SAGV disabled
         - Fix PSF GV point mask when SAGV is not possible
         - Fix renamed INTEL_INFO->media.arch/ver field"
      
      * tag 'drm-next-2022-03-25' of git://anongit.freedesktop.org/drm/drm:
        fbdev: Fix cfb_imageblit() for arbitrary image widths
        fbdev: Fix sys_imageblit() for arbitrary image widths
        drm/edid: fix CEA extension byte #3 parsing
        drm/edid: check basic audio support on CEA extension block
        drm/i915: Fix renamed struct field
        drm/i915: Fix PSF GV point mask when SAGV is not possible
        drm/i915: Treat SAGV block time 0 as SAGV disabled
        drm/i915: Reject unsupported TMDS rates on ICL+
        drm/selftest: plane_helper: Put test structures in static storage
        drm/ttm: Fix a kernel oops due to an invalid read
      cb7cbaae
    • Linus Torvalds's avatar
      Merge tag 'backlight-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 46f538bf
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
       "New Device Support:
         - Add support for PM6150L to Qualcomm WLED
      
        Fix-ups"
         - Use kcalloc() to avoid open-coding; pwm_bl
         - Device Tree changes; qcom-wled
         - Cleanup or simplify code; backlight"
      
      * tag 'backlight-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        backlight: backlight: Slighly simplify devm_of_find_backlight()
        backlight: qcom-wled: Add PM6150L compatible
        dt-bindings: backlight: qcom-wled: Add PM6150L compatible
        backlight: pwm_bl: Avoid open coded arithmetic in memory allocation
      46f538bf
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 8350e833
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New Drivers:
         - Add support for Maxim MAX77714 PMIC
      
        Removed Drivers:
         - Remove support for ST-Ericsson AB8500 DebugFS
      
        New Device Support:
         - Add support for Silergy SY7636A to Simple MFD I2C
         - Add support for MediaTek MT6366 PMIC to MT6358 IRQ
         - Add support for Charger to Intel PMIC CRC
         - Add support for Raptor Lake to Intel LPSS PCI
      
        New Functionality:
         - Add support for Reboot to Rockchip RK808
      
        Fix-ups:
         - Device Tree changes (includcing YAML conversion) for
           silergy,sy7636a, maxim,max77843, google,cros-ec, maxim,max14577,
           maxim,max77802, maxim,max77714, qcom,tcsr, qcom,spmi-pmic,
           stericsson,ab8500, stericsson,db8500-prcmu,
           samsung,exynos5433-lpass, mt6397, syscon, brcm,cru
         - Visible to menuconfig; simple-mfd-i2c
         - Clean-up or clarify code; max77686, intel_soc_pmic_crc
         - Improve error handling; mc13xxx-core, stmfx, asic3
         - Pass device information to child devices; iqs62x, intel-lpss-acpi
         - Individually identify IRQ domains; intel_soc_pmic_core
         - Remove superfluous code; dbx500-prcmu, exynos-lpass
         - Staticify and constify; arizona-i2c
         - Mark sometimes used data as __maybe_unused; atmel-flexcom
         - Account for different ACPI tables on AOSP/Windows platforms; arizona-spi
         - Use provided (platform) APIs; ab8500-core
         - Trivial (whitespace, spelling); rohm-bd9576"
      
      * tag 'mfd-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (50 commits)
        dt-bindings: mfd: syscon: Add microchip,lan966x-cpu-syscon compatible
        mfd: bd9576: fix typos in comments
        mfd: Use platform_get_irq() to get the interrupt
        mfd: db8500-prcmu: Remove unused inline function
        mfd: arizona-spi: Add Android board ACPI table handling
        mfd: arizona-spi: Split Windows ACPI init code into its own function
        mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
        MAINTAINERS: Rectify entry for ROHM MULTIFUNCTION BD9571MWV-M PMIC DEVICE DRIVERS
        mfd: intel-lpss: Provide an SSP type to the driver
        dt-bindings: mfd: brcm,cru: Rename pinctrl node
        dt-bindings: Add compatibles for undocumented trivial syscons
        mfd: atmel-flexcom: Fix compilation warning
        dt-bindings: mfd: Add compatible for the MediaTek MT6366 PMIC
        dt-bindings: mfd: samsung,exynos5433-lpass: Convert to dtschema
        mfd: exynos-lpass: Drop unneeded syscon.h include
        mfd: intel-lpss: Add Intel Raptor Lake PCH-S PCI IDs
        mfd: ab8500: Drop debugfs module
        mfd: sta2x11: Use GFP_KERNEL instead of GFP_ATOMIC
        mfd: ab8500: Rewrite bindings in YAML
        mfd: qcom-spmi-pmic: Add pm8953 compatible
        ...
      8350e833
    • Linus Torvalds's avatar
      Merge tag 'mtd/changes-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · e35a4a4e
      Linus Torvalds authored
      Pull MTD updates from Miquel Raynal:
       "There has been a lot of activity in the MTD subsystem recently, with a
        number of SPI-NOR cleanups as well as the introduction of ECC engines
        that can be used by SPI controllers (hence a few SPI patches in here).
      
        Core MTD changes:
         - Replace the expert mode symbols with a single helper
         - Fix misuses of of_match_ptr()
         - Remove partid and partname debugfs files
         - tests: Fix eraseblock read speed miscalculation for lower partition
           sizes
         - TRX parser: Allow to use on MediaTek MIPS SoCs
      
        MTD driver changes:
         - spear_smi: use GFP_KERNEL
         - mchp48l640: Add SPI ID table
         - mchp23k256: Add SPI ID table
         - blkdevs: Avoid soft lockups with some mtd/spi devices
         - aspeed-smc: Improve probe resilience
      
        Hyperbus changes:
         - HBMC_AM654 should depend on ARCH_K3
      
        NAND core changes:
         - ECC:
            - Add infrastructure to support hardware engines
            - Add a new helper to retrieve the ECC context
            - Provide a helper to retrieve a pilelined engine device
      
        NAND-ECC changes:
         - Macronix ECC engine:
            - Add Macronix external ECC engine support
            - Support SPI pipelined mode
            - Make two read-only arrays static const
            - Fix compile test issue
      
        Raw NAND core changes:
         - Fix misuses of of_match_node()
         - Rework of_get_nand_bus_width()
         - Remove of_get_nand_on_flash_bbt() wrapper
         - Protect access to rawnand devices while in suspend
         - bindings: Document the wp-gpios property
      
        Rax NAND controller driver changes:
         - atmel: Fix refcount issue in atmel_nand_controller_init
         - nandsim:
            - Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern
            - Merge repeat codes in ns_switch_state
            - Replace overflow check with kzalloc to single kcalloc
         - rockchip: Fix platform_get_irq.cocci warning
         - stm32_fmc2: Add NAND Write Protect support
         - pl353: Set the nand chip node as the flash node
         - brcmnand: Fix sparse warnings in bcma_nand
         - omap_elm: Remove redundant variable 'errors'
         - gpmi:
            - Support fast edo timings for mx28
            - Validate controller clock rate
            - Fix controller timings setting
         - brcmnand:
            - Add BCMA shim
            - BCMA controller uses command shift of 0
            - Allow platform data instantation
            - Add platform data structure for BCMA
            - Allow working without interrupts
            - Move OF operations out of brcmnand_init_cs()
            - Avoid pdev in brcmnand_init_cs()
            - Allow SoC to provide I/O operations
            - Assign soc as early as possible
      
        Onenand changes:
         - Check for error irq
      
        SPI-NAND core changes:
         - Delay a little bit the dirmap creation
         - Create direct mapping descriptors for ECC operations
      
        SPI-NAND driver changes:
         - macronix: Use random program load
      
        SPI NOR core changes:
         - Move vendor specific code out of the core into vendor drivers.
         - Unify all function and object names in the vendor modules.
         - Make setup() callback optional to improve readability.
         - Skip erase logic when the SPI_NOR_NO_ERASE flag is set at flash
           declaration.
      
        SPI changes:
         - Macronix SPI controller:
            - Fix the transmit path
            - Create a helper to configure the controller before an operation
            - Create a helper to ease the start of an operation
            - Add support for direct mapping
            - Add support for pipelined ECC operations
         - spi-mem:
            - Introduce a capability structure
            - Check the controller extra capabilities
            - cadence-quadspi/mxic: Provide capability structures
            - Kill the spi_mem_dtr_supports_op() helper
            - Add an ecc parameter to the spi_mem_op structure
      
        Binding changes:
         - Dropped mtd/cortina,gemini-flash.txt
         - Convert BCM47xx partitions to json-schema
         - Vendor prefixes: Clarify Macronix prefix
         - SPI NAND: Convert spi-nand description file to yaml
         - Raw NAND chip: Create a NAND chip description
         - Raw NAND controller:
            - Harmonize the property types
            - Fix a comment in the examples
            - Fix the reg property description
         - Describe Macronix NAND ECC engine
         - Macronix SPI controller:
            - Document the nand-ecc-engine property
            - Convert to yaml
            - The interrupt property is not mandatory"
      
      * tag 'mtd/changes-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (104 commits)
        mtd: nand: ecc: mxic: Fix compile test issue
        mtd: nand: mxic-ecc: make two read-only arrays static const
        mtd: hyperbus: HBMC_AM654 should depend on ARCH_K3
        mtd: core: Remove partid and partname debugfs files
        dt-bindings: mtd: partitions: convert BCM47xx to the json-schema
        mtd: tests: Fix eraseblock read speed miscalculation for lower partition sizes
        mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
        mtd: rawnand: rockchip: fix platform_get_irq.cocci warning
        mtd: spi-nor: Skip erase logic when SPI_NOR_NO_ERASE is set
        mtd: spi-nor: renumber flags
        mtd: spi-nor: slightly change code style in spi_nor_sr_ready()
        mtd: spi-nor: spansion: rename vendor specific functions and defines
        mtd: spi-nor: spansion: convert USE_CLSR to a manufacturer flag
        mtd: spi-nor: move all spansion specifics into spansion.c
        mtd: spi-nor: spansion: slightly rework control flow in late_init()
        mtd: spi-nor: micron-st: rename vendor specific functions and defines
        mtd: spi-nor: micron-st: convert USE_FSR to a manufacturer flag
        mtd: spi-nor: move all micron-st specifics into micron-st.c
        mtd: spi-nor: xilinx: correct the debug message
        mtd: spi-nor: xilinx: rename vendor specific functions and defines
        ...
      e35a4a4e
    • Linus Torvalds's avatar
      Merge tag 'for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 8eb48fc7
      Linus Torvalds authored
      Pull power supply and reset updates from Sebastian Reichel:
       "Power-supply core:
      
         - Introduce "Bypass" charging type used by USB PPS standard
      
         - Refactor power_supply_set_input_current_limit_from_supplier()
      
         - Add fwnode support to power_supply_get_battery_info()
      
        Drivers:
      
         - ab8500: continue migrating towards using standard core APIs
      
         - axp288 fuel-gauge: refactor driver to be fully resource managed
      
         - battery-samsung-sdi: new in-kernel provider for (constant) Samsung
           battery info
      
         - bq24190: disable boost regulator on shutdown
      
         - bq24190: add support for battery-info on ACPI based systems
      
         - bq25890: prepare driver for usage on ACPI based systems
      
         - bq25890: add boost regulator support
      
         - cpcap-battery: add NVMEM based battery detection support
      
         - injoinic ip5xxx: new driver for power bank IC
      
         - upi ug3105: new battery driver
      
         - misc small improvements and fixes"
      
      * tag 'for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (94 commits)
        power: ab8500_chargalg: Use CLOCK_MONOTONIC
        power: supply: Add a driver for Injoinic power bank ICs
        dt-bindings: trivial-devices: Add Injoinic power bank ICs
        dt-bindings: vendor-prefixes: Add Injoinic
        power: supply: ab8500: Remove unused variable
        power: supply: da9150-fg: Remove unnecessary print function dev_err()
        power: supply: ab8500: fix a handful of spelling mistakes
        power: supply: ab8500_fg: Account for line impedance
        dt-bindings: power: supply: ab8500_fg: Add line impedance
        power: supply: axp20x_usb_power: fix platform_get_irq.cocci warnings
        power: supply: axp20x_ac_power: fix platform_get_irq.cocci warning
        power: supply: wm8350-power: Add missing free in free_charger_irq
        power: supply: wm8350-power: Handle error for wm8350_register_irq
        power: supply: Static data for Samsung batteries
        power: supply: ab8500_fg: Use VBAT-to-Ri if possible
        power: supply: Support VBAT-to-Ri lookup tables
        power: supply: ab8500: Standardize BTI resistance
        power: supply: ab8500: Standardize alert mode charging
        power: supply: ab8500: Standardize maintenance charging
        power: supply: bq24190_charger: Delay applying charge_type changes when OTG 5V Vbus boost is on
        ...
      8eb48fc7
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 148a6504
      Linus Torvalds authored
      Pull pci updates from Bjorn Helgaas:
       "Enumeration:
         - Move the VGA arbiter from drivers/gpu to drivers/pci because it's
           PCI-specific, not GPU-specific (Bjorn Helgaas)
         - Select the default VGA device consistently whether it's enumerated
           before or after VGA arbiter init, which fixes arches that enumerate
           PCI devices late (Huacai Chen)
      
        Resource management:
         - Support BAR sizes up to 8TB (Dongdong Liu)
      
        PCIe native device hotplug:
         - Fix "Command Completed" tracking to avoid spurious timouts when
           powering off empty slots (Liguang Zhang)
         - Quirk Qualcomm devices that don't implement Command Completed
           correctly, again to avoid spurious timeouts (Manivannan Sadhasivam)
      
        Peer-to-peer DMA:
         - Add Intel 3rd Gen Intel Xeon Scalable Processors to whitelist
           (Michael J. Ruhl)
      
        APM X-Gene PCIe controller driver:
         - Revert generic DT parsing changes that broke some machines in the
           field (Marc Zyngier)
      
        Freescale i.MX6 PCIe controller driver:
         - Allow controller probe to succeed even when no devices currently
           present to allow hot-add later (Fabio Estevam)
         - Enable power management on i.MX6QP (Richard Zhu)
         - Assert CLKREQ# on i.MX8MM so enumeration doesn't hang when no
           device is connected (Richard Zhu)
      
        Marvell Aardvark PCIe controller driver:
         - Fix MSI and MSI-X support (Marek Behún, Pali Rohár)
         - Add support for ERR and PME interrupts (Pali Rohár)
      
        Marvell MVEBU PCIe controller driver:
         - Add DT binding and support for "num-lanes" (Pali Rohár)
         - Add support for INTx interrupts (Pali Rohár)
      
        Microsoft Hyper-V host bridge driver:
         - Avoid unnecessary hypercalls when unmasking IRQs on ARM64 (Boqun
           Feng)
      
        Qualcomm PCIe controller driver:
         - Add SM8450 DT binding and driver support (Dmitry Baryshkov)
      
        Renesas R-Car PCIe controller driver:
         - Help the controller get to the L1 state since the hardware can't do
           it on its own (Marek Vasut)
         - Return PCI_ERROR_RESPONSE (~0) for reads that fail on PCIe (Marek
           Vasut)
      
        SiFive FU740 PCIe controller driver:
         - Drop redundant '-gpios' from DT GPIO lookup (Ben Dooks)
         - Force 2.5GT/s for initial device probe (Ben Dooks)
      
        Socionext UniPhier Pro5 controller driver:
         - Add NX1 DT binding and driver support (Kunihiko Hayashi)
      
        Synopsys DesignWare PCIe controller driver:
         - Restore MSI configuration so MSI works after resume (Jisheng
           Zhang)"
      
      * tag 'pci-v5.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
        x86/PCI: Add #includes to asm/pci_x86.h
        PCI: ibmphp: Remove unused assignments
        PCI: cpqphp: Remove unused assignments
        PCI: fu740: Remove unused assignments
        PCI: kirin: Remove unused assignments
        PCI: Remove unused assignments
        PCI: Declare pci_filp_private only when HAVE_PCI_MMAP
        PCI: Avoid broken MSI on SB600 USB devices
        PCI: fu740: Force 2.5GT/s for initial device probe
        PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
        PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
        PCI: imx6: Assert i.MX8MM CLKREQ# even if no device present
        PCI: imx6: Invoke the PHY exit function after PHY power off
        PCI: rcar: Use PCI_SET_ERROR_RESPONSE after read which triggered an exception
        PCI: rcar: Finish transition to L1 state in rcar_pcie_config_access()
        PCI: dwc: Restore MSI Receiver mask during resume
        PCI: fu740: Drop redundant '-gpios' from DT GPIO lookup
        PCI/VGA: Replace full MIT license text with SPDX identifier
        PCI/VGA: Use unsigned format string to print lock counts
        PCI/VGA: Log bridge control messages when adding devices
        ...
      148a6504
    • Linus Torvalds's avatar
      Merge tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 636f64db
      Linus Torvalds authored
      Pull RAS updates from Borislav Petkov:
      
       - More noinstr fixes
      
       - Add an erratum workaround for Intel CPUs which, in certain
         circumstances, end up consuming an unrelated uncorrectable memory
         error when using fast string copy insns
      
       - Remove the MCE tolerance level control as it is not really needed or
         used anymore
      
      * tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Remove the tolerance level control
        x86/mce: Work around an erratum on fast string copy instructions
        x86/mce: Use arch atomic and bit helpers
      636f64db
    • Linus Torvalds's avatar
      Merge tag 'gpio-updates-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · ebcb577a
      Linus Torvalds authored
      Pull gpio updates from Bartosz Golaszewski:
       "Relatively few updates for this release cycle. We have a single new
        driver and some minor changes in drivers, more work on limiting the
        usage of of_node in drivers and DT updates:
      
         - new driver: gpio-en7523
      
         - dt-bindings: convertion of faraday,ftgpio010 to YAML, new
           compatible string in gpio-vf610 and a bugfix in an example
      
         - gpiolib core: several improvements and some code shrink
      
         - documentation: convert all public docs into kerneldoc format
      
         - set IRQ bus token in gpio-crystalcove (addresses a debugfs issue)
      
         - add a missing return value check for kstrdup() in gpio-merrifield
      
         - allow gpio-tps68470 to be built as module
      
         - more work on limiting usage of of_node in GPIO drivers
      
         - several sysfs interface improvements
      
         - use SDPX in gpio-ts4900"
      
      * tag 'gpio-updates-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: ts4900: Use SPDX header
        gpiolib: Use list_first_entry()/list_last_entry()
        gpiolib: sysfs: Simplify edge handling in the code
        gpiolib: sysfs: Move kstrtox() calls outside of the mutex lock
        gpiolib: sysfs: Move sysfs_emit() calls outside of the mutex lock
        gpiolib: make struct comments into real kernel docs
        dt-bindings: gpio: convert faraday,ftgpio01 to yaml
        dt-bindings: gpio: gpio-vf610: Add imx93 compatible string
        gpiolib: Simplify error path in gpiod_get_index() when requesting GPIO
        gpiolib: Use short form of ternary operator in gpiod_get_index()
        gpiolib: Introduce for_each_gpio_desc_with_flag() macro
        gpio: Add support for Airoha EN7523 GPIO controller
        dt-bindings: arm: airoha: Add binding for Airoha GPIO controller
        dt-bindings: gpio: fix gpio-hog example
        gpio: tps68470: Allow building as module
        gpio: tegra: Get rid of duplicate of_node assignment
        gpio: altera-a10sr: Switch to use fwnode instead of of_node
        gpio: merrifield: check the return value of devm_kstrdup()
        gpio: crystalcove: Set IRQ domain bus token to DOMAIN_BUS_WIRED
      ebcb577a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 5e206459
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
      
       - rework of generic input handling which ultimately makes the
         processing of tablet events more generic and reliable (Benjamin
         Tissoires)
      
       - fixes for handling unnumbered reports fully correctly in i2c-hid
         (Angela Czubak, Dmitry Torokhov)
      
       - untangling of intermingled code for sending and handling output
         reports in i2c-hid (Dmitry Torokhov)
      
       - Apple magic keyboard support improvements for newer models (José
         Expósito)
      
       - Apple T2 Macs support improvements (Aun-Ali Zaidi, Paul Pawlowski)
      
       - driver for Razer Blackwidow keyboards (Jelle van der Waa)
      
       - driver for SiGma Micro keyboards (Desmond Lim)
      
       - integration of first part of DIGImend patches in order to ultimately
         vastly improve Linux support of tablets (Nikolai Kondrashov, José
         Expósito)
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (55 commits)
        HID: intel-ish-hid: Use dma_alloc_coherent for firmware update
        Input: docs: add more details on the use of BTN_TOOL
        HID: input: accommodate priorities for slotted devices
        HID: input: remove the need for HID_QUIRK_INVERT
        HID: input: enforce Invert usage to be processed before InRange
        HID: core: for input reports, process the usages by priority list
        HID: compute an ordered list of input fields to process
        HID: input: move up out-of-range processing of input values
        HID: input: rework spaghetti code with switch statements
        HID: input: tag touchscreens as such if the physical is not there
        HID: core: split data fetching from processing in hid_input_field()
        HID: core: de-duplicate some code in hid_input_field()
        HID: core: statically allocate read buffers
        HID: uclogic: Support multiple frame input devices
        HID: uclogic: Define report IDs before their descriptors
        HID: uclogic: Put version first in rdesc namespace
        HID: uclogic: Use "frame" instead of "buttonpad"
        HID: uclogic: Use different constants for frame report IDs
        HID: uclogic: Specify total report size to buttonpad macro
        HID: uclogic: Switch to matching subreport bytes
        ...
      5e206459
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.18-1' of... · 14646776
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver updates from Hans de Goede:
        "New drivers:
          - AMD Host System Management Port (HSMP)
          - Intel Software Defined Silicon
      
        Removed drivers (functionality folded into other drivers):
          - intel_cht_int33fe_microb
          - surface3_button
      
        amd-pmc:
          - s2idle bug-fixes
          - Support for AMD Spill to DRAM STB feature
      
        hp-wmi:
          - Fix SW_TABLET_MODE detection method (and other fixes)
          - Support omen thermal profile policy v1
      
        serial-multi-instantiate:
          - Add SPI device support
          - Add support for CS35L41 amplifiers used in new laptops
      
        think-lmi:
          - syfs-class-firmware-attributes Certificate authentication support
      
        thinkpad_acpi:
          - Fixes + quirks
          - Add platform_profile support on AMD based ThinkPads
      
        x86-android-tablets:
          - Improve Asus ME176C / TF103C support
          - Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets
      
        Lots of various other small fixes and hardware-id additions"
      
      * tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (60 commits)
        platform/x86: think-lmi: Certificate authentication support
        Documentation: syfs-class-firmware-attributes: Lenovo Certificate support
        platform/x86: amd-pmc: Only report STB errors when STB enabled
        platform/x86: amd-pmc: Drop CPU QoS workaround
        platform/x86: amd-pmc: Output error codes in messages
        platform/x86: amd-pmc: Move to later in the suspend process
        ACPI / x86: Add support for LPS0 callback handler
        platform/x86: thinkpad_acpi: consistently check fan_get_status return.
        platform/x86: hp-wmi: support omen thermal profile policy v1
        platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated
        platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls
        platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method
        platform/x86: hp-wmi: Fix hp_wmi_read_int() reporting error (0x05)
        platform/x86: amd-pmc: Validate entry into the deepest state on resume
        platform/x86: thinkpad_acpi: Don't use test_bit on an integer
        platform/x86: thinkpad_acpi: Fix compiler warning about uninitialized err variable
        platform/x86: thinkpad_acpi: clean up dytc profile convert
        platform/x86: x86-android-tablets: Depend on EFI and SPI
        platform/x86: amd-pmc: uninitialized variable in amd_pmc_s2d_init()
        platform/x86: intel-uncore-freq: fix uncore_freq_common_init() error codes
        ...
      14646776
    • Linus Torvalds's avatar
      Merge tag 'kbuild-gnu11-v5.18' of... · 50560ce6
      Linus Torvalds authored
      Merge tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild update for C11 language base from Masahiro Yamada:
       "Kbuild -std=gnu11 updates for v5.18
      
        Linus pointed out the benefits of C99 some years ago, especially
        variable declarations in loops [1]. At that time, we were not ready
        for the migration due to old compilers.
      
        Recently, Jakob Koschel reported a bug in list_for_each_entry(), which
        leaks the invalid pointer out of the loop [2]. In the discussion, we
        agreed that the time had come. Now that GCC 5.1 is the minimum
        compiler version, there is nothing to prevent us from going to
        -std=gnu99, or even straight to -std=gnu11.
      
        Discussions for a better list iterator implementation are ongoing, but
        this patch set must land first"
      
      [1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/
      [2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/
      
      * tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Kbuild: use -std=gnu11 for KBUILD_USERCFLAGS
        Kbuild: move to -std=gnu11
        Kbuild: use -Wdeclaration-after-statement
        Kbuild: add -Wno-shift-negative-value where -Wextra is used
      50560ce6
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 29c8c183
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
       "This is the material which was staged after willystuff in linux-next.
      
        Subsystems affected by this patch series: mm (debug, selftests,
        pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
        and selftests"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
        selftests: kselftest framework: provide "finished" helper
        mm: madvise: MADV_DONTNEED_LOCKED
        mm: fix race between MADV_FREE reclaim and blkdev direct IO read
        mm: generalize ARCH_HAS_FILTER_PGPROT
        mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
        mm: warn on deleting redirtied only if accounted
        mm/huge_memory: remove stale locking logic from __split_huge_pmd()
        mm/huge_memory: remove stale page_trans_huge_mapcount()
        mm/swapfile: remove stale reuse_swap_page()
        mm/khugepaged: remove reuse_swap_page() usage
        mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
        mm: streamline COW logic in do_swap_page()
        mm: slightly clarify KSM logic in do_swap_page()
        mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
        mm: optimize do_wp_page() for exclusive pages in the swapcache
        mm/huge_memory: make is_transparent_hugepage() static
        userfaultfd/selftests: enable hugetlb remap and remove event testing
        selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
        mm: enable MADV_DONTNEED for hugetlb mappings
        kasan: disable LOCKDEP when printing reports
        ...
      29c8c183
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · aa5b537b
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
      
       - Support for Sv57-based virtual memory.
      
       - Various improvements for the MicroChip PolarFire SOC and the
         associated Icicle dev board, which should allow upstream kernels to
         boot without any additional modifications.
      
       - An improved memmove() implementation.
      
       - Support for the new Ssconfpmf and SBI PMU extensions, which allows
         for a much more useful perf implementation on RISC-V systems.
      
       - Support for restartable sequences.
      
      * tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (36 commits)
        rseq/selftests: Add support for RISC-V
        RISC-V: Add support for restartable sequence
        MAINTAINERS: Add entry for RISC-V PMU drivers
        Documentation: riscv: Remove the old documentation
        RISC-V: Add sscofpmf extension support
        RISC-V: Add perf platform driver based on SBI PMU extension
        RISC-V: Add RISC-V SBI PMU extension definitions
        RISC-V: Add a simple platform driver for RISC-V legacy perf
        RISC-V: Add a perf core library for pmu drivers
        RISC-V: Add CSR encodings for all HPMCOUNTERS
        RISC-V: Remove the current perf implementation
        RISC-V: Improve /proc/cpuinfo output for ISA extensions
        RISC-V: Do no continue isa string parsing without correct XLEN
        RISC-V: Implement multi-letter ISA extension probing framework
        RISC-V: Extract multi-letter extension names from "riscv, isa"
        RISC-V: Minimal parser for "riscv, isa" strings
        RISC-V: Correctly print supported extensions
        riscv: Fixed misaligned memory access. Fixed pointer comparison.
        MAINTAINERS: update riscv/microchip entry
        riscv: dts: microchip: add new peripherals to icicle kit device tree
        ...
      aa5b537b
    • Linus Torvalds's avatar
      Merge tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d710d370
      Linus Torvalds authored
      Pull s390 updates from Vasily Gorbik:
      
       - Raise minimum supported machine generation to z10, which comes with
         various cleanups and code simplifications (usercopy/spectre
         mitigation/etc).
      
       - Rework extables and get rid of anonymous out-of-line fixups.
      
       - Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
         Covert pte_val()/pXd_val() macros to functions.
      
       - Optimize kretprobe handling by avoiding extra kprobe on
         __kretprobe_trampoline.
      
       - Add support for CEX8 crypto cards.
      
       - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
      
       - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
         group sections which simplifies kpatch support.
      
       - Always use the packed stack layout and extend kernel unwinder tests.
      
       - Add sanity checks for ftrace code patching.
      
       - Add s390dbf debug log for the vfio_ap device driver.
      
       - Various virtual vs physical address confusion fixes.
      
       - Various small fixes and improvements all over the code.
      
      * tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
        s390/test_unwind: add kretprobe tests
        s390/kprobes: Avoid additional kprobe in kretprobe handling
        s390: convert ".insn" encoding to instruction names
        s390: assume stckf is always present
        s390/nospec: move to single register thunks
        s390: raise minimum supported machine generation to z10
        s390/uaccess: Add copy_from/to_user_key functions
        s390/nospec: align and size extern thunks
        s390/nospec: add an option to use thunk-extern
        s390/nospec: generate single register thunks if possible
        s390/pci: make zpci_set_irq()/zpci_clear_irq() static
        s390: remove unused expoline to BC instructions
        s390/irq: use assignment instead of cast
        s390/traps: get rid of magic cast for per code
        s390/traps: get rid of magic cast for program interruption code
        s390/signal: fix typo in comments
        s390/asm-offsets: remove unused defines
        s390/test_unwind: avoid build warning with W=1
        s390: remove .fixup section
        s390/bpf: encode register within extable entry
        ...
      d710d370
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20220325' of https://github.com/jcmvbkbc/linux-xtensa · 744465da
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - remove dependency on the compiler's libgcc
      
       - allow selection of internal kernel ABI via Kconfig
      
       - enable compiler plugins support for gcc-12 or newer
      
       - various minor cleanups and fixes
      
      * tag 'xtensa-20220325' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: define update_mmu_tlb function
        xtensa: fix xtensa_wsr always writing 0
        xtensa: enable plugin support
        xtensa: clean up kernel exit assembly code
        xtensa: rearrange NMI exit path
        xtensa: merge stack alignment definitions
        xtensa: fix DTC warning unit_address_format
        xtensa: fix stop_machine_cpuslocked call in patch_text
        xtensa: make secondary reset vector support conditional
        xtensa: add kernel ABI selection to Kconfig
        xtensa: don't link with libgcc
        xtensa: add helpers for division, remainder and shifts
        xtensa: add missing XCHAL_HAVE_WINDOWED check
        xtensa: use XCHAL_NUM_AREGS as pt_regs::areg size
        xtensa: rename PT_SIZE to PT_KERNEL_SIZE
        xtensa: Remove unused early_read_config_byte() et al declarations
        xtensa: use strscpy to copy strings
        net: xtensa: use strscpy to copy strings
      744465da
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 1f1c153e
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
       "Livepatch support for 32-bit is probably the standout new feature,
        otherwise mostly just lots of bits and pieces all over the board.
      
        There's a series of commits cleaning up function descriptor handling,
        which touches a few other arches as well as LKDTM. It has acks from
        Arnd, Kees and Helge.
      
        Summary:
      
         - Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
      
         - Add support for livepatch to 32-bit.
      
         - Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
      
         - Merge vdso64 and vdso32 into a single directory.
      
         - Fix build errors with newer binutils.
      
         - Add support for UADDR64 relocations, which are emitted by some
           toolchains. This allows powerpc to build with the latest lld.
      
         - Fix (another) potential userspace r13 corruption in transactional
           memory handling.
      
         - Cleanups of function descriptor handling & related fixes to LKDTM.
      
        Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh
        Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar
        Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET,
        Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique
        Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu
        Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason
        Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol
        Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar,
        Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek,
        Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
        Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy
        Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant,
        Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding,
        Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean,
        Wedson Almeida Filho, and YueHaibing"
      
      * tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
        powerpc/pseries: Fix use after free in remove_phb_dynamic()
        powerpc/time: improve decrementer clockevent processing
        powerpc/time: Fix KVM host re-arming a timer beyond decrementer range
        powerpc/tm: Fix more userspace r13 corruption
        powerpc/xive: fix return value of __setup handler
        powerpc/64: Add UADDR64 relocation support
        powerpc: 8xx: fix a return value error in mpc8xx_pic_init
        powerpc/ps3: remove unneeded semicolons
        powerpc/64: Force inlining of prevent_user_access() and set_kuap()
        powerpc/bitops: Force inlining of fls()
        powerpc: declare unmodified attribute_group usages const
        powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n
        powerpc/secvar: fix refcount leak in format_show()
        powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
        powerpc: Move C prototypes out of asm-prototypes.h
        powerpc/kexec: Declare kexec_paca static
        powerpc/smp: Declare current_set static
        powerpc: Cleanup asm-prototypes.c
        powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S
        powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S
        ...
      1f1c153e
    • Linus Torvalds's avatar
      Merge tag 'mips_5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 9a8b3d5f
      Linus Torvalds authored
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - added support for QCN550x (ath79)
      
       - enabled KCSAN
      
       - removed TX39XX support
      
       - various cleanups and fixes
      
      * tag 'mips_5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (31 commits)
        MIPS: Fix build error for loongson64 and sgi-ip27
        MIPS: ingenic: correct unit node address
        MIPS: Fix wrong comments in asm/prom.h
        MIPS: Remove redundant definitions of device_tree_init()
        MIPS: Remove redundant check in device_tree_init()
        MIPS: pgalloc: fix memory leak caused by pgd_free()
        MIPS: RB532: fix return value of __setup handler
        MIPS: Only use current_stack_pointer on GCC
        MIPS: boot/compressed: Use array reference for image bounds
        mips: cdmm: Fix refcount leak in mips_cdmm_phys_base
        mips: remove reference to "newer Loongson-3"
        mips: Always permit to build u-boot images
        MIPS: Sanitise Cavium switch cases in TLB handler synthesizers
        DEC: Limit PMAX memory probing to R3k systems
        mips: DEC: honor CONFIG_MIPS_FP_SUPPORT=n
        MIPS: fix fortify panic when copying asm exception handlers
        mips: ralink: fix a refcount leak in ill_acc_of_setup()
        mips: Implement "current_stack_pointer"
        MIPS: Remove TX39XX support
        MIPS: Modernize READ_IMPLIES_EXEC
        ...
      9a8b3d5f
    • Linus Torvalds's avatar
      Merge tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 34af78c4
      Linus Torvalds authored
      Pull iommu updates from Joerg Roedel:
      
       - IOMMU Core changes:
            - Removal of aux domain related code as it is basically dead and
              will be replaced by iommu-fd framework
            - Split of iommu_ops to carry domain-specific call-backs separatly
            - Cleanup to remove useless ops->capable implementations
            - Improve 32-bit free space estimate in iova allocator
      
       - Intel VT-d updates:
            - Various cleanups of the driver
            - Support for ATS of SoC-integrated devices listed in ACPI/SATC
              table
      
       - ARM SMMU updates:
            - Fix SMMUv3 soft lockup during continuous stream of events
            - Fix error path for Qualcomm SMMU probe()
            - Rework SMMU IRQ setup to prepare the ground for PMU support
            - Minor cleanups and refactoring
      
       - AMD IOMMU driver:
            - Some minor cleanups and error-handling fixes
      
       - Rockchip IOMMU driver:
            - Use standard driver registration
      
       - MSM IOMMU driver:
            - Minor cleanup and change to standard driver registration
      
       - Mediatek IOMMU driver:
            - Fixes for IOTLB flushing logic
      
      * tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (47 commits)
        iommu/amd: Improve amd_iommu_v2_exit()
        iommu/amd: Remove unused struct fault.devid
        iommu/amd: Clean up function declarations
        iommu/amd: Call memunmap in error path
        iommu/arm-smmu: Account for PMU interrupts
        iommu/vt-d: Enable ATS for the devices in SATC table
        iommu/vt-d: Remove unused function intel_svm_capable()
        iommu/vt-d: Add missing "__init" for rmrr_sanity_check()
        iommu/vt-d: Move intel_iommu_ops to header file
        iommu/vt-d: Fix indentation of goto labels
        iommu/vt-d: Remove unnecessary prototypes
        iommu/vt-d: Remove unnecessary includes
        iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO
        iommu/vt-d: Remove domain and devinfo mempool
        iommu/vt-d: Remove iova_cache_get/put()
        iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info()
        iommu/vt-d: Remove intel_iommu::domains
        iommu/mediatek: Always tlb_flush_all when each PM resume
        iommu/mediatek: Add tlb_lock in tlb_flush_all
        iommu/mediatek: Remove the power status checking in tlb flush all
        ...
      34af78c4
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 6f2689a7
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "This series consists of the usual driver updates (qla2xxx, pm8001,
        libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
        and bug fixes.
      
        The high blast radius core update is the removal of write same, which
        affects block and several non-SCSI devices. The other big change,
        which is more local, is the removal of the SCSI pointer"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
        scsi: scsi_ioctl: Drop needless assignment in sg_io()
        scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
        scsi: lpfc: Copyright updates for 14.2.0.0 patches
        scsi: lpfc: Update lpfc version to 14.2.0.0
        scsi: lpfc: SLI path split: Refactor BSG paths
        scsi: lpfc: SLI path split: Refactor Abort paths
        scsi: lpfc: SLI path split: Refactor SCSI paths
        scsi: lpfc: SLI path split: Refactor CT paths
        scsi: lpfc: SLI path split: Refactor misc ELS paths
        scsi: lpfc: SLI path split: Refactor VMID paths
        scsi: lpfc: SLI path split: Refactor FDISC paths
        scsi: lpfc: SLI path split: Refactor LS_RJT paths
        scsi: lpfc: SLI path split: Refactor LS_ACC paths
        scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
        scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
        scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
        scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
        scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
        scsi: lpfc: SLI path split: Refactor lpfc_iocbq
        scsi: lpfc: Use kcalloc()
        ...
      6f2689a7
    • Linus Torvalds's avatar
      Merge tag 'for-5.18/dm-changes' of... · b1f8ccda
      Linus Torvalds authored
      Merge tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Significant refactoring and fixing of how DM core does bio-based IO
         accounting with focus on fixing wildly inaccurate IO stats for
         dm-crypt (and other DM targets that defer bio submission in their own
         workqueues). End result is proper IO accounting, made possible by
         targets being updated to use the new dm_submit_bio_remap() interface.
      
       - Add hipri bio polling support (REQ_POLLED) to bio-based DM.
      
       - Reduce dm_io and dm_target_io structs so that a single dm_io (which
         contains dm_target_io and first clone bio) weighs in at 256 bytes.
         For reference the bio struct is 128 bytes.
      
       - Various other small cleanups, fixes or improvements in DM core and
         targets.
      
       - Update MAINTAINERS with my kernel.org email address to allow
         distinction between my "upstream" and "Red" Hats.
      
      * tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits)
        dm: consolidate spinlocks in dm_io struct
        dm: reduce size of dm_io and dm_target_io structs
        dm: switch dm_target_io booleans over to proper flags
        dm: switch dm_io booleans over to proper flags
        dm: update email address in MAINTAINERS
        dm: return void from __send_empty_flush
        dm: factor out dm_io_complete
        dm cache: use dm_submit_bio_remap
        dm: simplify dm_sumbit_bio_remap interface
        dm thin: use dm_submit_bio_remap
        dm: add WARN_ON_ONCE to dm_submit_bio_remap
        dm: support bio polling
        block: add ->poll_bio to block_device_operations
        dm mpath: use DMINFO instead of printk with KERN_INFO
        dm: stop using bdevname
        dm-zoned: remove the ->name field in struct dmz_dev
        dm: remove unnecessary local variables in __bind
        dm: requeue IO if mapping table not yet available
        dm io: remove stale comment block for dm_io()
        dm thin metadata: remove unused dm_thin_remove_block and __remove
        ...
      b1f8ccda
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 2dacc1e5
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
      
       - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
      
       - Minor cleanups: coding style, useless includes and documentation
      
       - Reorganize how multicast processing works in rxe
      
       - Replace a red/black tree with xarray in rxe which improves performance
      
       - DSCP support and HW address handle re-use in irdma
      
       - Simplify the mailbox command handling in hns
      
       - Simplify iser now that FMR is eliminated
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
        RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
        IB/iser: Fix error flow in case of registration failure
        IB/iser: Generalize map/unmap dma tasks
        IB/iser: Use iser_fr_desc as registration context
        IB/iser: Remove iser_reg_data_sg helper function
        RDMA/rxe: Use standard names for ref counting
        RDMA/rxe: Replace red-black trees by xarrays
        RDMA/rxe: Shorten pool names in rxe_pool.c
        RDMA/rxe: Move max_elem into rxe_type_info
        RDMA/rxe: Replace obj by elem in declaration
        RDMA/rxe: Delete _locked() APIs for pool objects
        RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
        RDMA/rxe: Replace mr by rkey in responder resources
        RDMA/rxe: Fix ref error in rxe_av.c
        RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
        RDMA/irdma: Add support for address handle re-use
        RDMA/qib: Fix typos in comments
        RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
        Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
        RDMA/rxe: Remove useless argument for update_state()
        ...
      2dacc1e5
    • Kees Cook's avatar
      selftests: kselftest framework: provide "finished" helper · 25fd2d41
      Kees Cook authored
      Instead of having each time that wants to use ksft_exit() have to figure
      out the internals of kselftest.h, add the helper ksft_finished() that
      makes sure the passes, xfails, and skips are equal to the test plan count.
      
      Link: https://lkml.kernel.org/r/20220201013717.2464392-1-keescook@chromium.orgSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25fd2d41
    • Johannes Weiner's avatar
      mm: madvise: MADV_DONTNEED_LOCKED · 9457056a
      Johannes Weiner authored
      MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT
      and MCL_ONFAULT allowing to mlock without populating, there are valid use
      cases for depopulating locked ranges as well.
      
      Users mlock memory to protect secrets.  There are allocators for secure
      buffers that want in-use memory generally mlocked, but cleared and
      invalidated memory to give up the physical pages.  This could be done with
      explicit munlock -> mlock calls on free -> alloc of course, but that adds
      two unnecessary syscalls, heavy mmap_sem write locks, vma splits and
      re-merges - only to get rid of the backing pages.
      
      Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
      okay with on-demand initial population.  It seems valid to selectively
      free some memory during the lifetime of such a process, without having to
      mess with its overall policy.
      
      Why add a separate flag? Isn't this a pretty niche usecase?
      
      - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
        conceivable that someone, somewhere is relying on mlock to protect
        data from perhaps broader invalidation calls. Changing this behavior
        now could lead to quiet data corruption.
      
      - It also clarifies expectations around MADV_FREE and maybe
        MADV_REMOVE. It avoids the situation where one quietly behaves
        different than the others. MADV_FREE_LOCKED can be added later.
      
      - The combination of mlock() and madvise() in the first place is
        probably niche. But where it happens, I'd say that dropping pages
        from a locked region once they don't contain secrets or won't page
        anymore is much saner than relying on mlock to protect memory from
        speculative or errant invalidation calls. It's just that we can't
        change the default behavior because of the two previous points.
      
      Given that, an explicit new flag seems to make the most sense.
      
      [hannes@cmpxchg.org: fix mips build]
      
      Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9457056a
    • Mauricio Faria de Oliveira's avatar
      mm: fix race between MADV_FREE reclaim and blkdev direct IO read · 6c8e2a25
      Mauricio Faria de Oliveira authored
      Problem:
      =======
      
      Userspace might read the zero-page instead of actual data from a direct IO
      read on a block device if the buffers have been called madvise(MADV_FREE)
      on earlier (this is discussed below) due to a race between page reclaim on
      MADV_FREE and blkdev direct IO read.
      
      - Race condition:
        ==============
      
      During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks
      if the page is not dirty, then discards its rmap PTE(s) (vs.  remap back
      if the page is dirty).
      
      However, after try_to_unmap_one() returns to shrink_page_list(), it might
      keep the page _anyway_ if page_ref_freeze() fails (it expects exactly
      _one_ page reference, from the isolation for page reclaim).
      
      Well, blkdev_direct_IO() gets references for all pages, and on READ
      operations it only sets them dirty _later_.
      
      So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct
      IO read from block devices, and page reclaim happens during
      __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
      returns, but BEFORE the pages are set dirty, the situation happens.
      
      The direct IO read eventually completes.  Now, when userspace reads the
      buffers, the PTE is no longer there and the page fault handler
      do_anonymous_page() services that with the zero-page, NOT the data!
      
      A synthetic reproducer is provided.
      
      - Page faults:
        ===========
      
      If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't
      happen, because that faults-in all pages as writeable, so
      do_anonymous_page() sets up a new page/rmap/PTE, and that is used by
      direct IO.  The userspace reads don't fault as the PTE is there (thus
      zero-page is not used/setup).
      
      But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE
      is no longer there; the subsequent page faults can't help:
      
      The data-read from the block device probably won't generate faults due to
      DMA (no MMU) but even in the case it wouldn't use DMA, that happens on
      different virtual addresses (not user-mapped addresses) because `struct
      bio_vec` stores `struct page` to figure addresses out (which are different
      from user-mapped addresses) for the read.
      
      Thus userspace reads (to user-mapped addresses) still fault, then
      do_anonymous_page() gets another `struct page` that would address/ map to
      other memory than the `struct page` used by `struct bio_vec` for the read.
      (The original `struct page` is not available, since it wasn't freed, as
      page_ref_freeze() failed due to more page refs.  And even if it were
      available, its data cannot be trusted anymore.)
      
      Solution:
      ========
      
      One solution is to check for the expected page reference count in
      try_to_unmap_one().
      
      There should be one reference from the isolation (that is also checked in
      shrink_page_list() with page_ref_freeze()) plus one or more references
      from page mapping(s) (put in discard: label).  Further references mean
      that rmap/PTE cannot be unmapped/nuked.
      
      (Note: there might be more than one reference from mapping due to
      fork()/clone() without CLONE_VM, which use the same `struct page` for
      references, until the copy-on-write page gets copied.)
      
      So, additional page references (e.g., from direct IO read) now prevent the
      rmap/PTE from being unmapped/dropped; similarly to the page is not freed
      per shrink_page_list()/page_ref_freeze()).
      
      - Races and Barriers:
        ==================
      
      The new check in try_to_unmap_one() should be safe in races with
      bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's
      done under the PTE lock.
      
      The fast path doesn't take the lock, but it checks if the PTE has changed
      and if so, it drops the reference and leaves the page for the slow path
      (which does take that lock).
      
      The fast path requires synchronization w/ full memory barrier: it writes
      the page reference count first then it reads the PTE later, while
      try_to_unmap() writes PTE first then it reads page refcount.
      
      And a second barrier is needed, as the page dirty flag should not be read
      before the page reference count (as in __remove_mapping()).  (This can be
      a load memory barrier only; no writes are involved.)
      
      Call stack/comments:
      
      - try_to_unmap_one()
        - page_vma_mapped_walk()
          - map_pte()			# see pte_offset_map_lock():
              pte_offset_map()
              spin_lock()
      
        - ptep_get_and_clear()	# write PTE
        - smp_mb()			# (new barrier) GUP fast path
        - page_ref_count()		# (new check) read refcount
      
        - page_vma_mapped_walk_done()	# see pte_unmap_unlock():
            pte_unmap()
            spin_unlock()
      
      - bio_iov_iter_get_pages()
        - __bio_iov_iter_get_pages()
          - iov_iter_get_pages()
            - get_user_pages_fast()
              - internal_get_user_pages_fast()
      
                # fast path
                - lockless_pages_from_mm()
                  - gup_{pgd,p4d,pud,pmd,pte}_range()
                      ptep = pte_offset_map()		# not _lock()
                      pte = ptep_get_lockless(ptep)
      
                      page = pte_page(pte)
                      try_grab_compound_head(page)	# inc refcount
                                                  	# (RMW/barrier
                                                   	#  on success)
      
                      if (pte_val(pte) != pte_val(*ptep)) # read PTE
                              put_compound_head(page) # dec refcount
                              			# go slow path
      
                # slow path
                - __gup_longterm_unlocked()
                  - get_user_pages_unlocked()
                    - __get_user_pages_locked()
                      - __get_user_pages()
                        - follow_{page,p4d,pud,pmd}_mask()
                          - follow_page_pte()
                              ptep = pte_offset_map_lock()
                              pte = *ptep
                              page = vm_normal_page(pte)
                              try_grab_page(page)	# inc refcount
                              pte_unmap_unlock()
      
      - Huge Pages:
        ==========
      
      Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE
      (aka lazyfree) pages are PageAnon() && !PageSwapBacked()
      (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
      thus should reach shrink_page_list() -> split_huge_page_to_list() before
      try_to_unmap[_one](), so it deals with normal pages only.
      
      (And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens,
      which should not or be rare, the page refcount should be greater than
      mapcount: the head page is referenced by tail pages.  That also prevents
      checking the head `page` then incorrectly call page_remove_rmap(subpage)
      for a tail page, that isn't even in the shrink_page_list()'s page_list (an
      effect of split huge pmd/pmvw), as it might happen today in this unlikely
      scenario.)
      
      MADV_FREE'd buffers:
      ===================
      
      So, back to the "if MADV_FREE pages are used as buffers" note.  The case
      is arguable, and subject to multiple interpretations.
      
      The madvise(2) manual page on the MADV_FREE advice value says:
      
      1) 'After a successful MADV_FREE ... data will be lost when
         the kernel frees the pages.'
      2) 'the free operation will be canceled if the caller writes
         into the page' / 'subsequent writes ... will succeed and
         then [the] kernel cannot free those dirtied pages'
      3) 'If there is no subsequent write, the kernel can free the
         pages at any time.'
      
      Thoughts, questions, considerations... respectively:
      
      1) Since the kernel didn't actually free the page (page_ref_freeze()
         failed), should the data not have been lost? (on userspace read.)
      2) Should writes performed by the direct IO read be able to cancel
         the free operation?
         - Should the direct IO read be considered as 'the caller' too,
           as it's been requested by 'the caller'?
         - Should the bio technique to dirty pages on return to userspace
           (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
           be considered in another/special way here?
      3) Should an upcoming write from a previously requested direct IO
         read be considered as a subsequent write, so the kernel should
         not free the pages? (as it's known at the time of page reclaim.)
      
      And lastly:
      
      Technically, the last point would seem a reasonable consideration and
      balance, as the madvise(2) manual page apparently (and fairly) seem to
      assume that 'writes' are memory access from the userspace process (not
      explicitly considering writes from the kernel or its corner cases; again,
      fairly)..  plus the kernel fix implementation for the corner case of the
      largely 'non-atomic write' encompassed by a direct IO read operation, is
      relatively simple; and it helps.
      
      Reproducer:
      ==========
      
      @ test.c (simplified, but works)
      
      	#define _GNU_SOURCE
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      
      	int main() {
      		int fd, i;
      		char *buf;
      
      		fd = open(DEV, O_RDONLY | O_DIRECT);
      
      		buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
                      	   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      
      		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
      			buf[i] = 1; // init to non-zero
      
      		madvise(buf, BUF_SIZE, MADV_FREE);
      
      		read(fd, buf, BUF_SIZE);
      
      		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
      			printf("%p: 0x%x\n", &buf[i], buf[i]);
      
      		return 0;
      	}
      
      @ block/fops.c (formerly fs/block_dev.c)
      
      	+#include <linux/swap.h>
      	...
      	... __blkdev_direct_IO[_simple](...)
      	{
      	...
      	+	if (!strcmp(current->comm, "good"))
      	+		shrink_all_memory(ULONG_MAX);
      	+
               	ret = bio_iov_iter_get_pages(...);
      	+
      	+	if (!strcmp(current->comm, "bad"))
      	+		shrink_all_memory(ULONG_MAX);
      	...
      	}
      
      @ shell
      
              # NUM_PAGES=4
              # PAGE_SIZE=$(getconf PAGE_SIZE)
      
              # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
              # DEV=$(losetup -f --show test.img)
      
              # gcc -DDEV=\"$DEV\" \
                    -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
                    -DPAGE_SIZE=${PAGE_SIZE} \
                     test.c -o test
      
              # od -tx1 $DEV
              0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
              *
              0040000
      
              # mv test good
              # ./good
              0x7f7c10418000: 0x79
              0x7f7c10419000: 0x79
              0x7f7c1041a000: 0x79
              0x7f7c1041b000: 0x79
      
              # mv good bad
              # ./bad
              0x7fa1b8050000: 0x0
              0x7fa1b8051000: 0x0
              0x7fa1b8052000: 0x0
              0x7fa1b8053000: 0x0
      
      Note: the issue is consistent on v5.17-rc3, but it's intermittent with the
      support of MADV_FREE on v4.5 (60%-70% error; needs swap).  [wrap
      do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].
      
      - v5.17-rc3:
      
              # for i in {1..1000}; do ./good; done \
                  | cut -d: -f2 | sort | uniq -c
                 4000  0x79
      
              # mv good bad
              # for i in {1..1000}; do ./bad; done \
                  | cut -d: -f2 | sort | uniq -c
                 4000  0x0
      
              # free | grep Swap
              Swap:             0           0           0
      
      - v4.5:
      
              # for i in {1..1000}; do ./good; done \
                  | cut -d: -f2 | sort | uniq -c
                 4000  0x79
      
              # mv good bad
              # for i in {1..1000}; do ./bad; done \
                  | cut -d: -f2 | sort | uniq -c
                 2702  0x0
                 1298  0x79
      
              # swapoff -av
              swapoff /swap
      
              # for i in {1..1000}; do ./bad; done \
                  | cut -d: -f2 | sort | uniq -c
                 4000  0x79
      
      Ceph/TCMalloc:
      =============
      
      For documentation purposes, the use case driving the analysis/fix is Ceph
      on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to
      release unused memory to the system from the mmap'ed page heap (might be
      committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan()
      -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() ->
      TCMalloc_SystemCommit() -> do nothing.
      
      Note: TCMalloc switched back to MADV_DONTNEED a few commits after the
      release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue
      just 'disappeared' on Ceph on later Ubuntu releases but is still present
      in the kernel, and can be hit by other use cases.
      
      The observed issue seems to be the old Ceph bug #22464 [1], where checksum
      mismatches are observed (and instrumentation with buffer dumps shows
      zero-pages read from mmap'ed/MADV_FREE'd page ranges).
      
      The issue in Ceph was reasonably deemed a kernel bug (comment #50) and
      mostly worked around with a retry mechanism, but other parts of Ceph could
      still hit that (rocksdb).  Anyway, it's less likely to be hit again as
      TCMalloc switched out of MADV_FREE by default.
      
      (Some kernel versions/reports from the Ceph bug, and relation with
      the MADV_FREE introduction/changes; TCMalloc versions not checked.)
      - 4.4 good
      - 4.5 (madv_free: introduction)
      - 4.9 bad
      - 4.10 good? maybe a swapless system
      - 4.12 (madv_free: no longer free instantly on swapless systems)
      - 4.13 bad
      
      [1] https://tracker.ceph.com/issues/22464
      
      Thanks:
      ======
      
      Several people contributed to analysis/discussions/tests/reproducers in
      the first stages when drilling down on ceph/tcmalloc/linux kernel:
      
      - Dan Hill
      - Dan Streetman
      - Dongdong Tao
      - Gavin Guo
      - Gerald Yang
      - Heitor Alves de Siqueira
      - Ioanna Alifieraki
      - Jay Vosburgh
      - Matthew Ruffell
      - Ponnuvel Palaniyappan
      
      Reviews, suggestions, corrections, comments:
      
      - Minchan Kim
      - Yu Zhao
      - Huang, Ying
      - John Hubbard
      - Christoph Hellwig
      
      [mfo@canonical.com: v4]
        Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com
      
      Fixes: 802a3a92 ("mm: reclaim MADV_FREE pages")
      Signed-off-by: default avatarMauricio Faria de Oliveira <mfo@canonical.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Dan Hill <daniel.hill@canonical.com>
      Cc: Dan Streetman <dan.streetman@canonical.com>
      Cc: Dongdong Tao <dongdong.tao@canonical.com>
      Cc: Gavin Guo <gavin.guo@canonical.com>
      Cc: Gerald Yang <gerald.yang@canonical.com>
      Cc: Heitor Alves de Siqueira <halves@canonical.com>
      Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
      Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com>
      Cc: <stable@vger.kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c8e2a25
    • Anshuman Khandual's avatar
      mm: generalize ARCH_HAS_FILTER_PGPROT · 24e988c7
      Anshuman Khandual authored
      ARCH_HAS_FILTER_PGPROT config has duplicate definitions on platforms that
      subscribe it.  Instead make it a generic config option which can be
      selected on applicable platforms when required.
      
      Link: https://lkml.kernel.org/r/1643004823-16441-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24e988c7
    • Hugh Dickins's avatar
      mm: unmap_mapping_range_tree() with i_mmap_rwsem shared · 2c865995
      Hugh Dickins authored
      Revert 48ec833b ("Revert "mm/memory.c: share the i_mmap_rwsem"") to
      reinstate c8475d14 ("mm/memory.c: share the i_mmap_rwsem"): the
      unmap_mapping_range family of functions do the unmapping of user pages
      (ultimately via zap_page_range_single) without modifying the interval tree
      itself, and unmapping races are necessarily guarded by page table lock,
      thus the i_mmap_rwsem should be shared in unmap_mapping_pages() and
      unmap_mapping_folio().
      
      Commit 48ec833b was intended as a short-term measure, allowing the
      other shared lock changes into 3.19 final, before investigating three
      trinity crashes, one of which had been bisected to commit c8475d14:
      
      [1] https://lkml.org/lkml/2014/11/14/342
      https://lore.kernel.org/lkml/5466142C.60100@oracle.com/
      [2] https://lkml.org/lkml/2014/12/22/213
      https://lore.kernel.org/lkml/549832E2.8060609@oracle.com/
      [3] https://lkml.org/lkml/2014/12/9/741
      https://lore.kernel.org/lkml/5487ACC5.1010002@oracle.com/
      
      Two of those were Bad page states: free_pages_prepare() found PG_mlocked
      still set - almost certain to have been fixed by 4.4 commit b87537d9
      ("mm: rmap use pte lock not mmap_sem to set PageMlocked").  The NULL deref
      on rwsem in [2]: unclear, only happened once, not bisected to c8475d14.
      
      No change to the i_mmap_lock_write() around __unmap_hugepage_range_final()
      in unmap_single_vma(): IIRC that's a special usage, helping to serialize
      hugetlbfs page table sharing, not to be dabbled with lightly.  No change
      to other uses of i_mmap_lock_write() by hugetlbfs.
      
      I am not aware of any significant gains from the concurrency allowed by
      this commit: it is submitted more to resolve an ancient misunderstanding.
      
      Link: https://lkml.kernel.org/r/e4a5e356-6c87-47b2-3ce8-c2a95ae84e20@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c865995
    • Hugh Dickins's avatar
      mm: warn on deleting redirtied only if accounted · 566d3362
      Hugh Dickins authored
      filemap_unaccount_folio() has a WARN_ON_ONCE(folio_test_dirty(folio)).  It
      is good to warn of late dirtying on a persistent filesystem, but late
      dirtying on tmpfs can only lose data which is expected to be thrown away;
      and it's a pity if that warning comes ONCE on tmpfs, then hides others
      which really matter.  Make it conditional on mapping_cap_writeback().
      
      Cleanup: then folio_account_cleaned() no longer needs to check that for
      itself, and so no longer needs to know the mapping.
      
      Link: https://lkml.kernel.org/r/b5a1106c-7226-a5c6-ad41-ad4832cae1f@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jan Kara <jack@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      566d3362
    • David Hildenbrand's avatar
      mm/huge_memory: remove stale locking logic from __split_huge_pmd() · 7f760917
      David Hildenbrand authored
      Let's remove the stale logic that was required for reuse_swap_page().
      
      [akpm@linux-foundation.org: simplification, per Yang Shi]
      
      Link: https://lkml.kernel.org/r/20220131162940.210846-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f760917