1. 03 Apr, 2020 3 commits
    • Nathan Chancellor's avatar
      remoteproc/omap: Fix set_load call in omap_rproc_request_timer · e6d05acd
      Nathan Chancellor authored
      When building arm allyesconfig:
      
      drivers/remoteproc/omap_remoteproc.c:174:44: error: too many arguments
      to function call, expected 2, have 3
              timer->timer_ops->set_load(timer->odt, 0, 0);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~                ^
      1 error generated.
      
      This is due to commit 02e6d546 ("clocksource/drivers/timer-ti-dm:
      Enable autoreload in set_pwm") in the clockevents tree interacting with
      commit e28edc57 ("remoteproc/omap: Request a timer(s) for remoteproc
      usage") from the rpmsg tree.
      
      This should have been fixed during the merge of the remoteproc tree
      since it happened after the clockevents tree merge; however, it does not
      look like my email was noticed by either maintainer and I did not pay
      attention when the pull was sent since I was on CC.
      
      Fixes: c6570114 ("Merge tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc")
      Link: https://lore.kernel.org/lkml/20200327185055.GA22438@ubuntu-m2-xlarge-x86/Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarSuman Anna <s-anna@ti.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6d05acd
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · bef7b2a7
      Linus Torvalds authored
      Pull devicetree updates from Rob Herring:
      
       - Unit test for overlays with GPIO hogs
      
       - Improve dma-ranges parsing to handle dma-ranges with multiple entries
      
       - Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
      
       - Improve overlay error reporting
      
       - Device link support for power-domains and hwlocks bindings
      
       - Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
         PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
         ITE Tech
      
       - Add deprecated Marvell vendor prefix 'mrvl'
      
       - A bunch of binding conversions to DT schema continues. Of note, the
         common serial and USB connector bindings are converted.
      
       - Add more Arm CPU compatibles
      
       - Drop Mark Rutland as DT maintainer :(
      
      * tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
        MAINTAINERS: drop an old reference to stm32 pwm timers doc
        MAINTAINERS: dt: update etnaviv file reference
        dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
        dt-bindings: uniphier-system-bus: fix warning in the example
        dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
        dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
        dt-bindings: power: Fix example for power-domain
        dt-bindings: arm: Add some constraints for PSCI nodes
        of: some unittest overlays not untracked
        of: gpio unittest kfree() wrong object
        dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
        dt-bindings: serial: sh-sci: Convert to json-schema
        dt-bindings: serial: Document serialN aliases
        dt-bindings: thermal: tsens: Set 'additionalProperties: false'
        dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
        dt-bindings: vendor-prefixes: Add Beacon vendor prefix
        dt-bindings: vendor-prefixes: Add Topwise
        of: of_private.h: Replace zero-length array with flexible-array member
        docs: dt: fix a broken reference to input.yaml
        docs: dt: fix references to ap806-system-controller.txt
        ...
      bef7b2a7
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 79f51b7b
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "This series has a huge amount of churn because it pulls in Mauro's doc
        update changing all our txt files to rst ones.
      
        Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
        zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
        some other minor updates.
      
        The major core change is Hannes moving functions out of the aacraid
        driver and into the core"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
        scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
        scsi: ufs: Do not rely on prefetched data
        scsi: dc395x: remove dc395x_bios_param
        scsi: libiscsi: Fix error count for active session
        scsi: hpsa: correct race condition in offload enabled
        scsi: message: fusion: Replace zero-length array with flexible-array member
        scsi: qedi: Add PCI shutdown handler support
        scsi: qedi: Add MFW error recovery process
        scsi: ufs: Enable block layer runtime PM for well-known logical units
        scsi: ufs-qcom: Override devfreq parameters
        scsi: ufshcd: Let vendor override devfreq parameters
        scsi: ufshcd: Update the set frequency to devfreq
        scsi: ufs: Resume ufs host before accessing ufs device
        scsi: ufs-mediatek: customize the delay for enabling host
        scsi: ufs: make HCE polling more compact to improve initialization latency
        scsi: ufs: allow custom delay prior to host enabling
        scsi: ufs-mediatek: use common delay function
        scsi: ufs: introduce common and flexible delay function
        scsi: ufs: use an enum for host capabilities
        scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
        ...
      79f51b7b
  2. 02 Apr, 2020 37 commits
    • Linus Torvalds's avatar
      Merge tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · e109f506
      Linus Torvalds authored
      Pull MTD updates from Miquel Raynal:
       "MTD core changes:
         - Fix issue where write_cached_data() fails but write() still returns
           success
      
         - maps: sa1100-flash: Replace zero-length array with flexible-array
           member
      
         - phram: Fix a double free issue in error path
      
         - Convert fallthrough comments into statements
      
         - MAINTAINERS: Add the IRC channel to the MTD related subsystems
      
        Raw NAND core changes:
         - Add support for manufacturer specific suspend/resume operation
      
         - Add support for manufacturer specific lock/unlock operation
      
         - Replace zero-length array with flexible-array member
      
         - Fix a typo ("manufecturer")
      
         - Ensure nand_soft_waitrdy wait period is enough
      
        Raw NAND controller driver changes:
         - Brcmnand:
             * Add support for flash-edu for dma transfers (+ bindings)
      
         - Cadence:
             * Reinit completion before executing a new command
             * Change bad block marker size
             * Fix the calculation of the avaialble OOB size
             * Get meta data size from registers
      
         - Qualcom:
             * Use dma_request_chan() instead dma_request_slave_channel()
             * Release resources on failure within qcom_nandc_alloc()
      
         - Allwinner:
             * Use dma_request_chan() instead dma_request_slave_channel()
      
         - Marvell:
             * Use dma_request_chan() instead dma_request_slave_channel()
             * Release DMA channel on error
      
         - Freescale:
             * Use dma_request_chan() instead dma_request_slave_channel()
      
         - Macronix:
             * Add support for Macronix NAND randomizer (+ bindings)
      
         - Ams-delta:
             * Rename structures and functions to gpio_nand*
             * Make the driver custom I/O ready
             * Drop useless local variable
             * Support custom driver initialisation
             * Add module device tables
             * Handle more GPIO pins as optional
             * Make read pulses optional
             * Don't hardcode read/write pulse widths
             * Push inversion handling to gpiolib
             * Enable OF partition info support
             * Drop board specific partition info
             * Use struct gpio_nand_platdata
             * Write protect device during probe
      
         - Ingenic:
             * Use devm_platform_ioremap_resource()
             * Add dependency on MIPS || COMPILE_TEST
      
         - Denali:
             * Deassert write protect pin
      
         - ST:
             * Use dma_request_chan() instead dma_request_slave_channel()
      
        Raw NAND chip driver changes:
         - Toshiba:
             * Support reading the number of bitflips for BENAND (Built-in ECC NAND)
      
         - Macronix:
             * Add support for deep power down mode
             * Add support for block protection
      
        SPI-NAND core changes:
         - Do not erase the block before writing a bad block marker
      
         - Explicitly use MTD_OPS_RAW to write the bad block marker to OOB
      
         - Stop using spinand->oobbuf for buffering bad block markers
      
         - Rework detect procedure for different READ_ID operation
      
        SPI-NAND driver changes:
         - Toshiba:
             * Support for new Kioxia Serial NAND
             * Rename function name to change suffix and prefix (8Gbit)
             * Add comment about Kioxia ID
      
         - Micron:
             * Add new Micron SPI NAND devices with multiple dies
             * Add M70A series Micron SPI NAND devices
             * identify SPI NAND device with Continuous Read mode
             * Add new Micron SPI NAND devices
             * Describe the SPI NAND device MT29F2G01ABAGD
             * Generalize the OOB layout structure and function names
      
        SPI NOR core changes:
         - Move all the manufacturer specific quirks/code out of the core, to
           make the core logic more readable and thus ease maintenance.
      
         - Move the SFDP logic out of the core, it provides a better
           separation between the SFDP parsing and core logic.
      
         - Trim what is exposed in spi-nor.h. The SPI NOR controllers drivers
           must not be able to use structures that are meant just for the SPI
           NOR core.
      
         - Use the spi-mem direct mapping API to let advanced controllers
           optimize the read/write operations when they support direct
           mapping.
      
         - Add generic formula for the Status Register block protection
           handling. It fixes some long standing locking limitations and eases
           the addition of the 4bit block protection support.
      
         - Add block protection support for flashes with 4 block protection
           bits in the Status Register.
      
        SPI NOR controller drivers changes:
         - The mtk-quadspi driver is replaced by the new spi-mem spi-mtk-nor
           driver.
      
         - Merge tag 'mtk-mtd-spi-move' into spi-nor/next to avoid conflicts.
      
        HyperBus changes:
         - Print error msg when compatible is wrong or missing
      
         - Move mapping of direct access window from core to individual
           drivers"
      
      * tag 'mtd/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (103 commits)
        mtd: Convert fallthrough comments into statements
        mtd: rawnand: toshiba: Support reading the number of bitflips for BENAND (Built-in ECC NAND)
        MAINTAINERS: Add the IRC channel to the MTD related subsystems
        mtd: Fix issue where write_cached_data() fails but write() still returns success
        mtd: maps: sa1100-flash: Replace zero-length array with flexible-array member
        mtd: phram: fix a double free issue in error path
        mtd: spinand: toshiba: Support for new Kioxia Serial NAND
        mtd: spinand: toshiba: Rename function name to change suffix and prefix (8Gbit)
        mtd: rawnand: macronix: Add support for deep power down mode
        mtd: rawnand: Add support for manufacturer specific suspend/resume operation
        mtd: spi-nor: Enable locking for n25q512ax3/n25q512a
        mtd: spi-nor: Add SR 4bit block protection support
        mtd: spi-nor: Add generic formula for SR block protection handling
        mtd: spi-nor: Set all BP bits to one when lock_len == mtd->size
        mtd: spi-nor: controllers: aspeed-smc: Replace zero-length array with flexible-array member
        mtd: spi-nor: Clear WEL bit when erase or program errors occur
        MAINTAINERS: update entry after SPI NOR controller move
        mtd: spi-nor: Trim what is exposed in spi-nor.h
        mtd: spi-nor: Drop the MFR definitions
        mtd: spi-nor: Get rid of the now empty spi_nor_ids[] table
        ...
      e109f506
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma · e964f1e0
      Linus Torvalds authored
      Pull dmaengine updates from Vinod Koul:
       "Core:
         - Some code cleanup and optimization in core by Andy
      
         - Debugfs support for displaying dmaengine channels by Peter
      
        Drivers:
         - New driver for uniphier-xdmac controller
      
         - Updates to stm32 dma, mdma and dmamux drivers and PM support
      
         - More updates to idxd drivers
      
         - Bunch of changes in tegra-apb driver and cleaning up of pm
           functions
      
         - Bunch of spelling fixes and Replace zero-length array patches
      
         - Shutdown hook for fsl-dpaa2-qdma driver
      
         - Support for interleaved transfers for ti-edma and virtualization
           support for k3-dma driver
      
         - Support for reset and updates in xilinx_dma driver
      
         - Improvements and locking updates in at_hdma driver"
      
      * tag 'dmaengine-5.7-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (89 commits)
        dt-bindings: dma: renesas,usb-dmac: add r8a77961 support
        dmaengine: uniphier-xdmac: Remove redandant error log for platform_get_irq
        dmaengine: tegra-apb: Improve DMA synchronization
        dmaengine: tegra-apb: Don't save/restore IRQ flags in interrupt handler
        dmaengine: tegra-apb: mark PM functions as __maybe_unused
        dmaengine: fix spelling mistake "exceds" -> "exceeds"
        dmaengine: sprd: Set request pending flag when DMA controller is active
        dmaengine: ppc4xx: Use scnprintf() for avoiding potential buffer overflow
        dmaengine: idxd: remove global token limit check
        dmaengine: idxd: reflect shadow copy of traffic class programming
        dmaengine: idxd: Merge definition of dsa_batch_desc into dsa_hw_desc
        dmaengine: Create debug directories for DMA devices
        dmaengine: ti: k3-udma: Implement custom dbg_summary_show for debugfs
        dmaengine: Add basic debugfs support
        dmaengine: fsl-dpaa2-qdma: remove set but not used variable 'dpaa2_qdma'
        dmaengine: ti: edma: fix null dereference because of a typo in pointer name
        dmaengine: fsl-dpaa2-qdma: Adding shutdown hook
        dmaengine: uniphier-xdmac: Add UniPhier external DMA controller driver
        dt-bindings: dmaengine: Add UniPhier external DMA controller bindings
        dmaengine: ti: k3-udma: Implement support for atype (for virtualization)
        ...
      e964f1e0
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5c8db3eb
      Linus Torvalds authored
      Pull i2c updates from Wolfram Sang:
       "I2C has:
      
         - using defines for bus speeds to avoid mistakes in hardcoded values;
           lots of small driver updates because of that. Thanks, Andy!
      
         - API change: i2c_setup_smbus_alert() was renamed to
           i2c_new_smbus_alert_device() and returns ERRPTR now. All in-tree
           users have been converted
      
         - in the core, a rare race condition when deleting the cdev has been
           fixed. Thanks, Kevin!
      
         - lots of driver updates. Thanks, everyone!
      
        I also want to mention: The amount of review and testing tags given
        was quite high this time. Thank you to these people, too. I hope we
        can keep it like this!"
      
      * 'i2c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (34 commits)
        i2c: rcar: clean up after refactoring i2c_timings
        macintosh: convert to i2c_new_scanned_device
        i2c: drivers: Use generic definitions for bus frequencies
        i2c: algo: Use generic definitions for bus frequencies
        i2c: stm32f7: switch to I²C generic property parsing
        i2c: rcar: Consolidate timings calls in rcar_i2c_clock_calculate()
        i2c: core: Allow override timing properties with 0
        i2c: core: Provide generic definitions for bus frequencies
        i2c: mxs: Use dma_request_chan() instead dma_request_slave_channel()
        i2c: imx: remove duplicate print after platform_get_irq()
        i2c: designware: Fix spelling typos in the comments
        i2c: designware: Discard i2c_dw_read_comp_param() function
        i2c: designware: Detect the FIFO size in the common code
        i2c: dev: Fix the race between the release of i2c_dev and cdev
        i2c: qcom-geni: Drop of_platform.h include
        i2c: qcom-geni: Grow a dev pointer to simplify code
        i2c: qcom-geni: Let firmware specify irq trigger flags
        i2c: stm32f7: do not backup read-only PECR register
        i2c: smbus: remove outdated references to irq level triggers
        i2c: convert SMBus alert setup function to return an ERRPTR
        ...
      5c8db3eb
    • Linus Torvalds's avatar
      Merge tag 'sound-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 848960e5
      Linus Torvalds authored
      Pull sound updates from Takashi Iwai:
       "This became again a busy development cycle.  There are few ALSA core
        updates (merely API cleanups and sparse fixes), with the majority of
        other changes are found in ASoC scene.
      
        Here are some highlights:
      
        ALSA core:
         - More helper macros for sparse warning fixes (e.g. bitwise types)
         - Slight optimization of PCM OSS locks
         - Make common handling for PCM / compress buffers (for SOF)
      
        ASoC:
         - Lots of code refactoring and modernization for (still ongoing)
           componentization works
         - Conversion of SND_SOC_ALL_CODECS to use imply
         - Continued refactoring and fixing of the Intel SOF/SST support,
           including the initial (but still incomplete) SoundWire support
         - SoundWire and more advanced clocking support for Realtek RT5682
         - Support for amlogic GX, Meson 8, Meson 8B and T9015 DAC, Broadcom
           DSL/PON, Ingenic JZ4760 and JZ4770, Realtek RL6231, and TI TAS2563
           and TLV320ADCX140
      
        HD-audio:
         - Optimizations in HDMI jack handling
         - A few new quirks and fixups for Realtek codecs
      
        USB-audio:
         - Delayed registration support
         - New quirks for Motu, Kingston, Presonus"
      
      * tag 'sound-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (415 commits)
        ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
        Revert "ALSA: uapi: Drop asound.h inclusion from asoc.h"
        ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
        ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
        ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
        ALSA: hda/realtek - a fake key event is triggered by running shutup
        ALSA: hda: default enable CA0132 DSP support
        ASoC: amd: acp3x-pcm-dma: clean up two indentation issues
        ASoC: tlv320adcx140: Remove undocumented property
        ASoC: Intel: sof_sdw: Add Volteer support with RT5682 SNDW helper function
        ASoC: Intel: common: add match table for TGL RT5682 SoundWire driver
        ASoC: Intel: boards: add sof_sdw machine driver
        ASoC: Intel: soc-acpi: update topology and driver name for SoundWire platforms
        ASoC: rt5682: move DAI clock registry to I2S mode
        ASoC: pxa: magician: convert to use i2c_new_client_device()
        ASoC: SOF: Intel: hda-ctrl: add reset cycle before parsing capabilities
        Asoc: SOF: Intel: hda: check SoundWire wakeen interrupt in irq thread
        ASoC: SOF: Intel: hda: add WAKEEN interrupt support for SoundWire
        ASoC: SOF: Intel: hda: add parameter to control SoundWire clock stop quirks
        ASoC: SOF: Intel: hda: merge IPC, stream and SoundWire interrupt handlers
        ...
      848960e5
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · bc3b3f4b
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the bulk of pin control changes for the v5.7 kernel cycle.
      
        There are no core changes this time, only driver developments:
      
         - New driver for the Dialog Semiconductor DA9062 Power Management
           Integrated Circuit (PMIC).
      
         - Renesas SH-PFC has improved consistency, with group and register
           checks in the configuration checker.
      
         - New subdriver for the Qualcomm IPQ6018.
      
         - Add the RGMII pin control functionality to Qualcomm IPQ8064.
      
         - Performance and code quality cleanups in the Mediatek driver.
      
         - Improve the Broadcom BCM2835 support to cover all the GPIOs that
           exist in it.
      
         - The Allwinner/Sunxi driver properly masks non-wakeup IRQs on
           suspend.
      
         - Add some missing groups and functions to the Ingenic driver.
      
         - Convert some of the Freescale device tree bindings to use the new
           and all improved JSON YAML markup.
      
         - Refactorings and support for the SFIO/GPIO in the Tegra194 SoC
           driver.
      
         - Support high impedance mode in the Spreadtrum/Unisoc driver"
      
      * tag 'pinctrl-v5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (64 commits)
        pinctrl: qcom: fix compilation error
        pinctrl: qcom: use scm_call to route GPIO irq to Apps
        pinctrl: sprd: Add pin high impedance mode support
        pinctrl: sprd: Use the correct pin output configuration
        pinctrl: tegra: Add SFIO/GPIO programming on Tegra194
        pinctrl: tegra: Renumber the GG.0 and GG.1 pins
        pinctrl: tegra: Do not add default pin range on Tegra194
        pinctrl: tegra: Pass struct tegra_pmx for pin range check
        pinctrl: tegra: Fix "Scmitt" -> "Schmitt" typo
        pinctrl: tegra: Fix whitespace issues for improved readability
        pinctrl: mediatek: Use scnprintf() for avoiding potential buffer overflow
        pinctrl: freescale: drop the dependency on ARM64 for i.MX8M
        Revert "pinctrl: mvebu: armada-37xx: use use platform api"
        dt-bindings: pinctrl: at91: Fix a typo ("descibe")
        pinctrl: meson: add tsin pinctrl for meson gxbb/gxl/gxm
        pinctrl: sprd: Fix the kconfig warning
        pinctrl: ingenic: add hdmi-ddc pin control group
        pinctrl: sirf/atlas7: Replace zero-length array with flexible-array member
        pinctrl: sprd: Allow the SPRD pinctrl driver building into a module
        pinctrl: Export some needed symbols at module load time
        ...
      bc3b3f4b
    • Linus Torvalds's avatar
      Merge tag 'hwlock-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 11786191
      Linus Torvalds authored
      Pull hwspinlock updates from Bjorn Andersson:
       "This marks all hwspinlock driver COMPILE_TESTable and replaces the
        zero-length array in hwspinlock_device with a flexible-array member"
      
      * tag 'hwlock-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        hwspinlock: hwspinlock_internal.h: Replace zero-length array with flexible-array member
        hwspinlock: Allow drivers to be built with COMPILE_TEST
      11786191
    • Linus Torvalds's avatar
      Merge tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · c6570114
      Linus Torvalds authored
      Pull remoteproc updates from Bjorn Andersson:
      
       - a range of improvements to the OMAP remoeteproc driver; among other
         things adding devicetree, suspend/resume and watchdog support, and
         adds support the remoteprocs in the DRA7xx SoC
      
       - support for 64-bit firmware, extends the ELF loader to support this
         and fixes for a number of race conditions in the recovery handling
      
       - a generic mechanism to allow remoteproc drivers to sync state with
         remote processors during a panic, and uses this to prepare Qualcomm
         remote processors for post mortem analysis
      
       - fixes to cleanly recover from crashes in the modem firmware on
         production Qualcomm devices
      
      * tag 'rproc-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (37 commits)
        remoteproc/omap: Switch to SPDX license identifiers
        remoteproc/omap: Add watchdog functionality for remote processors
        remoteproc/omap: Report device exceptions and trigger recovery
        remoteproc/omap: Add support for runtime auto-suspend/resume
        remoteproc/omap: Add support for system suspend/resume
        remoteproc/omap: Request a timer(s) for remoteproc usage
        remoteproc/omap: Check for undefined mailbox messages
        remoteproc/omap: Remove the platform_data header
        remoteproc/omap: Add support for DRA7xx remote processors
        remoteproc/omap: Initialize and assign reserved memory node
        remoteproc/omap: Add the rproc ops .da_to_va() implementation
        remoteproc/omap: Add support to parse internal memories from DT
        remoteproc/omap: Add a sanity check for DSP boot address alignment
        remoteproc/omap: Add device tree support
        dt-bindings: remoteproc: Add OMAP remoteproc bindings
        remoteproc: qcom: Introduce panic handler for PAS and ADSP
        remoteproc: qcom: q6v5: Add common panic handler
        remoteproc: Introduce "panic" callback in ops
        remoteproc: Traverse rproc_list under RCU read lock
        remoteproc: Fix NULL pointer dereference in rproc_virtio_notify
        ...
      c6570114
    • Linus Torvalds's avatar
      Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · ac438771
      Linus Torvalds authored
      Pull percpu updates from Dennis Zhou:
       "This is just a few documentation fixes for percpu refcount and bitmap
        helpers that went in v5.6, and moving my emails to all be at korg"
      
      * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
        percpu: update copyright emails to dennis@kernel.org
        include/bitmap.h: add new functions to documentation
        include/bitmap.h: add missing parameter in docs
        percpu_ref: Fix comment regarding percpu_ref_init flags
      ac438771
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8c1b724d
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "ARM:
         - GICv4.1 support
      
         - 32bit host removal
      
        PPC:
         - secure (encrypted) using under the Protected Execution Framework
           ultravisor
      
        s390:
         - allow disabling GISA (hardware interrupt injection) and protected
           VMs/ultravisor support.
      
        x86:
         - New dirty bitmap flag that sets all bits in the bitmap when dirty
           page logging is enabled; this is faster because it doesn't require
           bulk modification of the page tables.
      
         - Initial work on making nested SVM event injection more similar to
           VMX, and less buggy.
      
         - Various cleanups to MMU code (though the big ones and related
           optimizations were delayed to 5.8). Instead of using cr3 in
           function names which occasionally means eptp, KVM too has
           standardized on "pgd".
      
         - A large refactoring of CPUID features, which now use an array that
           parallels the core x86_features.
      
         - Some removal of pointer chasing from kvm_x86_ops, which will also
           be switched to static calls as soon as they are available.
      
         - New Tigerlake CPUID features.
      
         - More bugfixes, optimizations and cleanups.
      
        Generic:
         - selftests: cleanups, new MMU notifier stress test, steal-time test
      
         - CSV output for kvm_stat"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
        x86/kvm: fix a missing-prototypes "vmread_error"
        KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
        KVM: VMX: Add a trampoline to fix VMREAD error handling
        KVM: SVM: Annotate svm_x86_ops as __initdata
        KVM: VMX: Annotate vmx_x86_ops as __initdata
        KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
        KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
        KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
        KVM: VMX: Configure runtime hooks using vmx_x86_ops
        KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
        KVM: x86: Move init-only kvm_x86_ops to separate struct
        KVM: Pass kvm_init()'s opaque param to additional arch funcs
        s390/gmap: return proper error code on ksm unsharing
        KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
        KVM: Fix out of range accesses to memslots
        KVM: X86: Micro-optimize IPI fastpath delay
        KVM: X86: Delay read msr data iff writes ICR MSR
        KVM: PPC: Book3S HV: Add a capability for enabling secure guests
        KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
        KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
        ...
      8c1b724d
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f14a9532
      Linus Torvalds authored
      Pull x86 fix from Ingo Molnar:
       "A single fix addressing Sparse warnings. <asm/bitops.h> is changed
        non-trivially to avoid the warnings, but generated code is not
        supposed to be affected"
      
      * tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix bitops.h warning with a moved cast
      f14a9532
    • Linus Torvalds's avatar
      Merge branch 'next-integrity' of... · 7f218319
      Linus Torvalds authored
      Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
      
      Pull integrity updates from Mimi Zohar:
       "Just a couple of updates for linux-5.7:
      
         - A new Kconfig option to enable IMA architecture specific runtime
           policy rules needed for secure and/or trusted boot, as requested.
      
         - Some message cleanup (eg. pr_fmt, additional error messages)"
      
      * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
        ima: add a new CONFIG for loading arch-specific policies
        integrity: Remove duplicate pr_fmt definitions
        IMA: Add log statements for failure conditions
        IMA: Update KBUILD_MODNAME for IMA files to ima
      7f218319
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 6cad420c
      Linus Torvalds authored
      Merge updates from Andrew Morton:
       "A large amount of MM, plenty more to come.
      
        Subsystems affected by this patch series:
         - tools
         - kthread
         - kbuild
         - scripts
         - ocfs2
         - vfs
         - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
               sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
               hugetlbfs, hugetlb"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
        include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
        mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
        selftests/vm: fix map_hugetlb length used for testing read and write
        mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
        mm/hugetlb.c: clean code by removing unnecessary initialization
        hugetlb_cgroup: add hugetlb_cgroup reservation docs
        hugetlb_cgroup: add hugetlb_cgroup reservation tests
        hugetlb: support file_region coalescing again
        hugetlb_cgroup: support noreserve mappings
        hugetlb_cgroup: add accounting for shared mappings
        hugetlb: disable region_add file_region coalescing
        hugetlb_cgroup: add reservation accounting for private mappings
        mm/hugetlb_cgroup: fix hugetlb_cgroup migration
        hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
        hugetlb_cgroup: add hugetlb_cgroup reservation counter
        hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
        hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
        mm/memblock.c: remove redundant assignment to variable max_addr
        mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
        mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
        ...
      6cad420c
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 7be97138
      Linus Torvalds authored
      Pull xfs updates from Darrick Wong:
       "There's a lot going on this cycle with cleanups in the log code, the
        btree code, and the xattr code.
      
        We're tightening of metadata validation and online fsck checking, and
        introducing a common btree rebuilding library so that we can refactor
        xfs_repair and introduce online repair in a future cycle.
      
        We also fixed a few visible bugs -- most notably there's one in
        getdents that we introduced in 5.6; and a fix for hangs when disabling
        quotas.
      
        This series has been running fstests & other QA in the background for
        over a week and looks good so far.
      
        I anticipate sending a second pull request next week. That batch will
        change how xfs interacts with memory reclaim; how the log batches and
        throttles log items; how hard writes near ENOSPC will try to squeeze
        more space out of the filesystem; and hopefully fix the last of the
        umount hangs after a catastrophic failure. That should ease a lot of
        problems when running at the limits, but for now I'm leaving that in
        for-next for another week to make sure we got all the subtleties
        right.
      
        Summary:
      
         - Fix a hard to trigger race between iclog error checking and log
           shutdown.
      
         - Strengthen the AGF verifier.
      
         - Ratelimit some of the more spammy error messages.
      
         - Remove the icdinode uid/gid members and just use the ones in the
           vfs inode.
      
         - Hold ILOCK across insert/collapse range.
      
         - Clean up the extended attribute interfaces.
      
         - Clean up the attr flags mess.
      
         - Restore PF_MEMALLOC after exiting xfsaild thread to avoid
           triggering warnings in the process accounting code.
      
         - Remove the flexibly-sized array from struct xfs_agfl to eliminate
           compiler warnings about unaligned pointers and packed structures.
      
         - Various macro and typedef removals.
      
         - Stale metadata buffers if we decide they're corrupt outside of a
           verifier.
      
         - Check directory data/block/free block owners.
      
         - Fix a UAF when aborting inactivation of a corrupt xattr fork.
      
         - Teach online scrub to report failed directory and attr name lookups
           as a metadata corruption instead of a runtime error.
      
         - Avoid potential buffer overflows in sysfs files by using scnprintf.
      
         - Fix a regression in getdents lookups due to a mistake in pointer
           arithmetic.
      
         - Refactor btree cursor private data structures to use anonymous
           unions.
      
         - Cleanups in the log unmounting code.
      
         - Fix a potential mishandling of ENOMEM errors on multi-block
           directory buffer lookups.
      
         - Fix an incorrect test in the block allocation code.
      
         - Cleanups and name prefix shortening in the scrub code.
      
         - Introduce btree bulk loading code for online repair and scrub.
      
         - Fix a quotaoff log item leak (and hang) when the fs goes down
           midway through a quotaoff operation.
      
         - Remove di_version from the incore inode.
      
         - Refactor some of the log shutdown checking code.
      
         - Record the forcing of the log unmount records in the log force
           counters.
      
         - Fix a longstanding bug where quotacheck would purge the
           administrator's default quota grace interval and warning limits.
      
         - Reduce memory usage when scrubbing directory and xattr trees.
      
         - Don't let fsfreeze race with GETFSMAP or online scrub.
      
         - Handle bio_add_page failures more gracefully in xlog_write_iclog"
      
      * tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (108 commits)
        xfs: prohibit fs freezing when using empty transactions
        xfs: shutdown on failure to add page to log bio
        xfs: directory bestfree check should release buffers
        xfs: drop all altpath buffers at the end of the sibling check
        xfs: preserve default grace interval during quotacheck
        xfs: remove xlog_state_want_sync
        xfs: move the ioerror check out of xlog_state_clean_iclog
        xfs: refactor xlog_state_clean_iclog
        xfs: remove the aborted parameter to xlog_state_done_syncing
        xfs: simplify log shutdown checking in xfs_log_release_iclog
        xfs: simplify the xfs_log_release_iclog calling convention
        xfs: factor out a xlog_wait_on_iclog helper
        xfs: merge xlog_cil_push into xlog_cil_push_work
        xfs: remove the di_version field from struct icdinode
        xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
        xfs: simplify di_flags2 inheritance in xfs_ialloc
        xfs: only check the superblock version for dinode size calculation
        xfs: add a new xfs_sb_version_has_v3inode helper
        xfs: fix unmount hang and memory leak on shutdown during quotaoff
        xfs: factor out quotaoff intent AIL removal and memory free
        ...
      7be97138
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 7db83c07
      Linus Torvalds authored
      Pull hibernation fix from Darrick Wong:
       "Fix a regression where we broke the userspace hibernation driver by
        disallowing writes to the swap device"
      
      * tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        hibernate: Allow uswsusp to write to swap
      7db83c07
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 35a9fafe
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "We're fixing tracepoints and comments in this cycle, so there
        shouldn't be any surprises here.
      
        I anticipate sending a second pull request next week with a single bug
        fix for readahead, but it's still undergoing QA.
      
        Summary:
      
         - Fix a broken tracepoint
      
         - Fix a broken comment"
      
      * tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: fix comments in iomap_dio_rw
        iomap: Remove pgoff from tracepoints
      35a9fafe
    • Linus Torvalds's avatar
      Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9c577491
      Linus Torvalds authored
      Pull vfs pathwalk sanitizing from Al Viro:
       "Massive pathwalk rewrite and cleanups.
      
        Several iterations have been posted; hopefully this thing is getting
        readable and understandable now. Pretty much all parts of pathname
        resolutions are affected...
      
        The branch is identical to what has sat in -next, except for commit
        message in "lift all calls of step_into() out of follow_dotdot/
        follow_dotdot_rcu", crediting Qian Cai for reporting the bug; only
        commit message changed there."
      
      * 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (69 commits)
        lookup_open(): don't bother with fallbacks to lookup+create
        atomic_open(): no need to pass struct open_flags anymore
        open_last_lookups(): move complete_walk() into do_open()
        open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
        open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
        open_last_lookups(): consolidate fsnotify_create() calls
        take post-lookup part of do_last() out of loop
        link_path_walk(): sample parent's i_uid and i_mode for the last component
        __nd_alloc_stack(): make it return bool
        reserve_stack(): switch to __nd_alloc_stack()
        pick_link(): take reserving space on stack into a new helper
        pick_link(): more straightforward handling of allocation failures
        fold path_to_nameidata() into its only remaining caller
        pick_link(): pass it struct path already with normal refcounting rules
        fs/namei.c: kill follow_mount()
        non-RCU analogue of the previous commit
        helper for mount rootwards traversal
        follow_dotdot(): be lazy about changing nd->path
        follow_dotdot_rcu(): be lazy about changing nd->path
        follow_dotdot{,_rcu}(): massage loops
        ...
      9c577491
    • Qian Cai's avatar
      x86/kvm: fix a missing-prototypes "vmread_error" · 514ccc19
      Qian Cai authored
      The commit 842f4be9 ("KVM: VMX: Add a trampoline to fix VMREAD error
      handling") removed the declaration of vmread_error() causes a W=1 build
      failure with KVM_WERROR=y. Fix it by adding it back.
      
      arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes]
       asmlinkage void vmread_error(unsigned long field, bool fault)
                       ^~~~~~~~~~~~
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Message-Id: <20200402153955.1695-1-cai@lca.pw>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      514ccc19
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · d987ca1c
      Linus Torvalds authored
      Pull exec/proc updates from Eric Biederman:
       "This contains two significant pieces of work: the work to sort out
        proc_flush_task, and the work to solve a deadlock between strace and
        exec.
      
        Fixing proc_flush_task so that it no longer requires a persistent
        mount makes improvements to proc possible. The removal of the
        persistent mount solves an old regression that that caused the hidepid
        mount option to only work on remount not on mount. The regression was
        found and reported by the Android folks. This further allows Alexey
        Gladkov's work making proc mount options specific to an individual
        mount of proc to move forward.
      
        The work on exec starts solving a long standing issue with exec that
        it takes mutexes of blocking userspace applications, which makes exec
        extremely deadlock prone. For the moment this adds a second mutex with
        a narrower scope that handles all of the easy cases. Which makes the
        tricky cases easy to spot. With a little luck the code to solve those
        deadlocks will be ready by next merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
        signal: Extend exec_id to 64bits
        pidfd: Use new infrastructure to fix deadlocks in execve
        perf: Use new infrastructure to fix deadlocks in execve
        proc: io_accounting: Use new infrastructure to fix deadlocks in execve
        proc: Use new infrastructure to fix deadlocks in execve
        kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
        kernel: doc: remove outdated comment cred.c
        mm: docs: Fix a comment in process_vm_rw_core
        selftests/ptrace: add test cases for dead-locks
        exec: Fix a deadlock in strace
        exec: Add exec_update_mutex to replace cred_guard_mutex
        exec: Move exec_mmap right after de_thread in flush_old_exec
        exec: Move cleanup of posix timers on exec out of de_thread
        exec: Factor unshare_sighand out of de_thread and call it separately
        exec: Only compute current once in flush_old_exec
        pid: Improve the comment about waiting in zap_pid_ns_processes
        proc: Remove the now unnecessary internal mount of proc
        uml: Create a private mount of proc for mconsole
        uml: Don't consult current to find the proc_mnt in mconsole_proc
        proc: Use a list of inodes to flush from proc
        ...
      d987ca1c
    • Matthew Wilcox (Oracle)'s avatar
      include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP · 77d6b909
      Matthew Wilcox (Oracle) authored
      It's even more important to check that we don't have a tail page when
      calling hpage_nr_pages() when THP are disabled.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Link: http://lkml.kernel.org/r/20200318140253.6141-4-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77d6b909
    • Christophe Leroy's avatar
      mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS · bb297bb2
      Christophe Leroy authored
      When CONFIG_HUGETLB_PAGE is set but not CONFIG_HUGETLBFS, the following
      build failure is encoutered:
      
        In file included from arch/powerpc/mm/fault.c:33:0:
        include/linux/hugetlb.h: In function 'hstate_inode':
        include/linux/hugetlb.h:477:9: error: implicit declaration of function 'HUGETLBFS_SB' [-Werror=implicit-function-declaration]
          return HUGETLBFS_SB(i->i_sb)->hstate;
                 ^
        include/linux/hugetlb.h:477:30: error: invalid type argument of '->' (have 'int')
          return HUGETLBFS_SB(i->i_sb)->hstate;
                                      ^
      
      Gate hstate_inode() with CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE.
      
      Fixes: a137e1cc ("hugetlbfs: per mount huge page sizes")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Link: http://lkml.kernel.org/r/7e8c3a3c9a587b9cd8a2f146df32a421b961f3a2.1584432148.git.christophe.leroy@c-s.fr
      Link: https://patchwork.ozlabs.org/patch/1255548/#2386036Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb297bb2
    • Christophe Leroy's avatar
      selftests/vm: fix map_hugetlb length used for testing read and write · cabc30da
      Christophe Leroy authored
      Commit fa7b9a80 ("tools/selftest/vm: allow choosing mem size and page
      size in map_hugetlb") added the possibility to change the size of memory
      mapped for the test, but left the read and write test using the default
      value.  This is unnoticed when mapping a length greater than the default
      one, but segfaults otherwise.
      
      Fix read_bytes() and write_bytes() by giving them the real length.
      
      Also fix the call to munmap().
      
      Fixes: fa7b9a80 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarLeonardo Bras <leonardo@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/9a404a13c871c4bd0ba9ede68f69a1225180dd7e.1580978385.git.christophe.leroy@c-s.frSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cabc30da
    • Vlastimil Babka's avatar
      mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge() · d4af73e3
      Vlastimil Babka authored
      Commit f1e61557 ("mm: pack compound_dtor and compound_order into one
      word in struct page") changed compound_dtor from a pointer to an array
      index in order to pack it.  To check if page has the hugeltbfs
      compound_dtor, we can just compare the index directly without fetching the
      function pointer.  Said commit did that with PageHuge() and we can do the
      same with PageHeadHuge() to make the code a bit smaller and faster.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Neha Agarwal <nehaagarwal@google.com>
      Link: http://lkml.kernel.org/r/20200311172440.6988-1-vbabka@suse.czSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4af73e3
    • Mateusz Nosek's avatar
      mm/hugetlb.c: clean code by removing unnecessary initialization · 353b2de4
      Mateusz Nosek authored
      Previously variable 'check_addr' was initialized, but was not read later
      before reassigning.  So the initialization can be removed.
      Signed-off-by: default avatarMateusz Nosek <mateusznosek0@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Link: http://lkml.kernel.org/r/20200303212354.25226-1-mateusznosek0@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      353b2de4
    • Mina Almasry's avatar
      hugetlb_cgroup: add hugetlb_cgroup reservation docs · 6566704d
      Mina Almasry authored
      Add docs for how to use hugetlb_cgroup reservations, and their behavior.
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-9-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6566704d
    • Mina Almasry's avatar
      hugetlb_cgroup: add hugetlb_cgroup reservation tests · 29750f71
      Mina Almasry authored
      The tests use both shared and private mapped hugetlb memory, and monitors
      the hugetlb usage counter as well as the hugetlb reservation counter.
      They test different configurations such as hugetlb memory usage via
      hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without
      MAP_POPULATE.
      
      Also add test for hugetlb reservation reparenting, since this is a subtle
      issue.
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Sandipan Das <sandipan@linux.ibm.com>	[powerpc64]
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-8-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29750f71
    • Mina Almasry's avatar
      hugetlb: support file_region coalescing again · a9b3f867
      Mina Almasry authored
      An earlier patch in this series disabled file_region coalescing in order
      to hang the hugetlb_cgroup uncharge info on the file_region entries.
      
      This patch re-adds support for coalescing of file_region entries.
      Essentially everytime we add an entry, we call a recursive function that
      tries to coalesce the added region with the regions next to it.  The worst
      case call depth for this function is 3: one to coalesce with the region
      next to it, one to coalesce to the region prev, and one to reach the base
      case.
      
      This is an important performance optimization as private mappings add
      their entries page by page, and we could incur big performance costs for
      large mappings with lots of file_region entries in their resv_map.
      
      [almasrymina@google.com: fix CONFIG_CGROUP_HUGETLB ifdefs]
        Link: http://lkml.kernel.org/r/20200214204544.231482-1-almasrymina@google.com
      [almasrymina@google.com: remove check_coalesce_bug debug code]
        Link: http://lkml.kernel.org/r/20200219233610.13808-1-almasrymina@google.comSigned-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-7-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9b3f867
    • Mina Almasry's avatar
      hugetlb_cgroup: support noreserve mappings · 08cf9faf
      Mina Almasry authored
      Support MAP_NORESERVE accounting as part of the new counter.
      
      For each hugepage allocation, at allocation time we check if there is a
      reservation for this allocation or not.  If there is a reservation for
      this allocation, then this allocation was charged at reservation time, and
      we don't re-account it.  If there is no reserevation for this allocation,
      we charge the appropriate hugetlb_cgroup.
      
      The hugetlb_cgroup to uncharge for this allocation is stored in
      page[3].private.  We use new APIs added in an earlier patch to set this
      pointer.
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-6-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08cf9faf
    • Mina Almasry's avatar
      hugetlb_cgroup: add accounting for shared mappings · 075a61d0
      Mina Almasry authored
      For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
      in the resv_map entries, in file_region->reservation_counter.
      
      After a call to region_chg, we charge the approprate hugetlb_cgroup, and
      if successful, we pass on the hugetlb_cgroup info to a follow up
      region_add call.  When a file_region entry is added to the resv_map via
      region_add, we put the pointer to that cgroup in
      file_region->reservation_counter.  If charging doesn't succeed, we report
      the error to the caller, so that the kernel fails the reservation.
      
      On region_del, which is when the hugetlb memory is unreserved, we also
      uncharge the file_region->reservation_counter.
      
      [akpm@linux-foundation.org: forward declare struct file_region]
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-5-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      075a61d0
    • Mina Almasry's avatar
      hugetlb: disable region_add file_region coalescing · 0db9d74e
      Mina Almasry authored
      A follow up patch in this series adds hugetlb cgroup uncharge info the
      file_region entries in resv->regions.  The cgroup uncharge info may differ
      for different regions, so they can no longer be coalesced at region_add
      time.  So, disable region coalescing in region_add in this patch.
      
      Behavior change:
      
      Say a resv_map exists like this [0->1], [2->3], and [5->6].
      
      Then a region_chg/add call comes in region_chg/add(f=0, t=5).
      
      Old code would generate resv->regions: [0->5], [5->6].
      New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5],
      [5->6].
      
      Special care needs to be taken to handle the resv->adds_in_progress
      variable correctly.  In the past, only 1 region would be added for every
      region_chg and region_add call.  But now, each call may add multiple
      regions, so we can no longer increment adds_in_progress by 1 in
      region_chg, or decrement adds_in_progress by 1 after region_add or
      region_abort.  Instead, region_chg calls add_reservation_in_range() to
      count the number of regions needed and allocates those, and that info is
      passed to region_add and region_abort to decrement adds_in_progress
      correctly.
      
      We've also modified the assumption that region_add after region_chg never
      fails.  region_chg now pre-allocates at least 1 region for region_add.  If
      region_add needs more regions than region_chg has allocated for it, then
      it may fail.
      
      [almasrymina@google.com: fix file_region entry allocations]
        Link: http://lkml.kernel.org/r/20200219012736.20363-1-almasrymina@google.comSigned-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Link: http://lkml.kernel.org/r/20200211213128.73302-4-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0db9d74e
    • Mina Almasry's avatar
      hugetlb_cgroup: add reservation accounting for private mappings · e9fe92ae
      Mina Almasry authored
      Normally the pointer to the cgroup to uncharge hangs off the struct page,
      and gets queried when it's time to free the page.  With hugetlb_cgroup
      reservations, this is not possible.  Because it's possible for a page to
      be reserved by one task and actually faulted in by another task.
      
      The best place to put the hugetlb_cgroup pointer to uncharge for
      reservations is in the resv_map.  But, because the resv_map has different
      semantics for private and shared mappings, the code patch to
      charge/uncharge shared and private mappings is different.  This patch
      implements charging and uncharging for private mappings.
      
      For private mappings, the counter to uncharge is in
      resv_map->reservation_counter.  On initializing the resv_map this is set
      to NULL.  On reservation of a region in private mapping, the tasks
      hugetlb_cgroup is charged and the hugetlb_cgroup is placed is
      resv_map->reservation_counter.
      
      On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter.
      
      [akpm@linux-foundation.org: forward declare struct resv_map]
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-3-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9fe92ae
    • Mina Almasry's avatar
      mm/hugetlb_cgroup: fix hugetlb_cgroup migration · 9808895e
      Mina Almasry authored
      Commit c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge
      hugetlb reservations") mistakingly doesn't handle the migration of *both*
      the reservation hugetlb_cgroup and the fault hugetlb_cgroup correctly.
      
      What should happen is that both cgroups shuold be queried from the old
      page, then both set to NULL on the old page, then both inserted into the
      new page.
      
      The mistake also creates the following warning:
      
      mm/hugetlb_cgroup.c: In function 'hugetlb_cgroup_migrate':
      mm/hugetlb_cgroup.c:777:25: warning: variable 'h_cg' set but not used
      [-Wunused-but-set-variable]
        struct hugetlb_cgroup *h_cg;
                               ^~~~
      
      Solution is to add the missing steps, namly setting the reservation
      hugetlb_cgroup to NULL on the old page, and setting the fault
      hugetlb_cgroup on the new page.
      
      Fixes: c32300516047 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200218194727.46995-1-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9808895e
    • Mina Almasry's avatar
      hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations · 1adc4d41
      Mina Almasry authored
      Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage
      or hugetlb reservation counter.
      
      Adds a new interface to uncharge a hugetlb_cgroup counter via
      hugetlb_cgroup_uncharge_counter.
      
      Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init,
      hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline.
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: http://lkml.kernel.org/r/20200211213128.73302-2-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1adc4d41
    • Mina Almasry's avatar
      hugetlb_cgroup: add hugetlb_cgroup reservation counter · cdc2fcfe
      Mina Almasry authored
      These counters will track hugetlb reservations rather than hugetlb memory
      faulted in.  This patch only adds the counter, following patches add the
      charging and uncharging of the counter.
      
      This is patch 1 of an 9 patch series.
      
      Problem:
      
      Currently tasks attempting to reserve more hugetlb memory than is
      available get a failure at mmap/shmget time.  This is thanks to Hugetlbfs
      Reservations [1].  However, if a task attempts to reserve more hugetlb
      memory than its hugetlb_cgroup limit allows, the kernel will allow the
      mmap/shmget call, but will SIGBUS the task when it attempts to fault in
      the excess memory.
      
      We have users hitting their hugetlb_cgroup limits and thus we've been
      looking at this failure mode.  We'd like to improve this behavior such
      that users violating the hugetlb_cgroup limits get an error on mmap/shmget
      time, rather than getting SIGBUS'd when they try to fault the excess
      memory in.  This gives the user an opportunity to fallback more gracefully
      to non-hugetlbfs memory for example.
      
      The underlying problem is that today's hugetlb_cgroup accounting happens
      at hugetlb memory *fault* time, rather than at *reservation* time.  Thus,
      enforcing the hugetlb_cgroup limit only happens at fault time, and the
      offending task gets SIGBUS'd.
      
      Proposed Solution:
      
      A new page counter named
      'hugetlb.xMB.rsvd.[limit|usage|max_usage]_in_bytes'. This counter has
      slightly different semantics than
      'hugetlb.xMB.[limit|usage|max_usage]_in_bytes':
      
      - While usage_in_bytes tracks all *faulted* hugetlb memory,
        rsvd.usage_in_bytes tracks all *reserved* hugetlb memory and hugetlb
        memory faulted in without a prior reservation.
      
      - If a task attempts to reserve more memory than limit_in_bytes allows,
        the kernel will allow it to do so.  But if a task attempts to reserve
        more memory than rsvd.limit_in_bytes, the kernel will fail this
        reservation.
      
      This proposal is implemented in this patch series, with tests to verify
      functionality and show the usage.
      
      Alternatives considered:
      
      1. A new cgroup, instead of only a new page_counter attached to the
         existing hugetlb_cgroup.  Adding a new cgroup seemed like a lot of code
         duplication with hugetlb_cgroup.  Keeping hugetlb related page counters
         under hugetlb_cgroup seemed cleaner as well.
      
      2. Instead of adding a new counter, we considered adding a sysctl that
         modifies the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do
         accounting at reservation time rather than fault time.  Adding a new
         page_counter seems better as userspace could, if it wants, choose to
         enforce different cgroups differently: one via limit_in_bytes, and
         another via rsvd.limit_in_bytes.  This could be very useful if you're
         transitioning how hugetlb memory is partitioned on your system one
         cgroup at a time, for example.  Also, someone may find usage for both
         limit_in_bytes and rsvd.limit_in_bytes concurrently, and this approach
         gives them the option to do so.
      
      Testing:
      - Added tests passing.
      - Used libhugetlbfs for regression testing.
      
      [1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.htmlSigned-off-by: default avatarMina Almasry <almasrymina@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20200211213128.73302-1-almasrymina@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdc2fcfe
    • Mike Kravetz's avatar
      hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race · 87bf91d3
      Mike Kravetz authored
      hugetlbfs page faults can race with truncate and hole punch operations.
      Current code in the page fault path attempts to handle this by 'backing
      out' operations if we encounter the race.  One obvious omission in the
      current code is removing a page newly added to the page cache.  This is
      pretty straight forward to address, but there is a more subtle and
      difficult issue of backing out hugetlb reservations.  To handle this
      correctly, the 'reservation state' before page allocation needs to be
      noted so that it can be properly backed out.  There are four distinct
      possibilities for reservation state: shared/reserved, shared/no-resv,
      private/reserved and private/no-resv.  Backing out a reservation may
      require memory allocation which could fail so that needs to be taken
      into account as well.
      
      Instead of writing the required complicated code for this rare
      occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
      mode for the duration of page fault processing.  Hold i_mmap_rwsem in
      write mode when modifying i_size.  In this way, truncation can not
      proceed when page faults are being processed.  In addition, i_size
      will not change during fault processing so a single check can be made
      to ensure faults are not beyond (proposed) end of file.  Faults can
      still race with hole punch, but that race is handled by existing code
      and the use of hugetlb_fault_mutex.
      
      With this modification, checks for races with truncation in the page
      fault path can be simplified and removed.  remove_inode_hugepages no
      longer needs to take hugetlb_fault_mutex in the case of truncation.
      Comments are expanded to explain reasoning behind locking.
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87bf91d3
    • Mike Kravetz's avatar
      hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization · c0d0381a
      Mike Kravetz authored
      Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.
      
      While discussing the issue with huge_pte_offset [1], I remembered that
      there were more outstanding hugetlb races.  These issues are:
      
      1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
         invalid via a call to huge_pmd_unshare by another thread.
      2) hugetlbfs page faults can race with truncation causing invalid global
         reserve counts and state.
      
      A previous attempt was made to use i_mmap_rwsem in this manner as
      described at [2].  However, those patches were reverted starting with [3]
      due to locking issues.
      
      To effectively use i_mmap_rwsem to address the above issues it needs to be
      held (in read mode) during page fault processing.  However, during fault
      processing we need to lock the page we will be adding.  Lock ordering
      requires we take page lock before i_mmap_rwsem.  Waiting until after
      taking the page lock is too late in the fault process for the
      synchronization we want to do.
      
      To address this lock ordering issue, the following patches change the lock
      ordering for hugetlb pages.  This is not too invasive as hugetlbfs
      processing is done separate from core mm in many places.  However, I don't
      really like this idea.  Much ugliness is contained in the new routine
      hugetlb_page_mapping_lock_write() of patch 1.
      
      The only other way I can think of to address these issues is by catching
      all the races.  After catching a race, cleanup, backout, retry ...  etc,
      as needed.  This can get really ugly, especially for huge page
      reservations.  At one time, I started writing some of the reservation
      backout code for page faults and it got so ugly and complicated I went
      down the path of adding synchronization to avoid the races.  Any other
      suggestions would be welcome.
      
      [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
      [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
      [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
      [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
      [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/
      
      This patch (of 2):
      
      While looking at BUGs associated with invalid huge page map counts, it was
      discovered and observed that a huge pte pointer could become 'invalid' and
      point to another task's page table.  Consider the following:
      
      A task takes a page fault on a shared hugetlbfs file and calls
      huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
      shared pmd.
      
      Now, another task truncates the hugetlbfs file.  As part of truncation, it
      unmaps everyone who has the file mapped.  If the range being truncated is
      covered by a shared pmd, huge_pmd_unshare will be called.  For all but the
      last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
      to the pmd.  If the task in the middle of the page fault is not the last
      user, the ptep returned by huge_pte_alloc now points to another task's
      page table or worse.  This leads to bad things such as incorrect page
      map/reference counts or invalid memory references.
      
      To fix, expand the use of i_mmap_rwsem as follows:
      - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
        huge_pmd_share is only called via huge_pte_alloc, so callers of
        huge_pte_alloc take i_mmap_rwsem before calling.  In addition, callers
        of huge_pte_alloc continue to hold the semaphore until finished with
        the ptep.
      - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
      
      One problem with this scheme is that it requires taking i_mmap_rwsem
      before taking the page lock during page faults.  This is not the order
      specified in the rest of mm code.  Handling of hugetlbfs pages is mostly
      isolated today.  Therefore, we use this alternative locking order for
      PageHuge() pages.
      
               mapping->i_mmap_rwsem
                 hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
                   page->flags PG_locked (lock_page)
      
      To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
      introduced to write lock the i_mmap_rwsem associated with a page.
      
      In most cases it is easy to get address_space via vma->vm_file->f_mapping.
      However, in the case of migration or memory errors for anon pages we do
      not have an associated vma.  A new routine _get_hugetlb_page_mapping()
      will use anon_vma to get address_space in these cases.
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0d0381a
    • Colin Ian King's avatar
      mm/memblock.c: remove redundant assignment to variable max_addr · 49aef717
      Colin Ian King authored
      The variable max_addr is being initialized with a value that is never read
      and it is being updated later with a new value.  The initialization is
      redundant and can be removed.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20200228235003.112718-1-colin.king@canonical.com
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49aef717
    • Randy Dunlap's avatar
      mm: mempolicy: require at least one nodeid for MPOL_PREFERRED · aa9f7d51
      Randy Dunlap authored
      Using an empty (malformed) nodelist that is not caught during mount option
      parsing leads to a stack-out-of-bounds access.
      
      The option string that was used was: "mpol=prefer:,".  However,
      MPOL_PREFERRED requires a single node number, which is not being provided
      here.
      
      Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's
      nodeid.
      
      Fixes: 095f1fc4 ("mempolicy: rework shmem mpol parsing and display")
      Reported-by: default avatarEntropy Moe <3ntr0py1337@gmail.com>
      Reported-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: syzbot+b055b1a6b2b958707a21@syzkaller.appspotmail.com
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Link: http://lkml.kernel.org/r/89526377-7eb6-b662-e1d8-4430928abde9@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa9f7d51