1. 07 Jan, 2021 4 commits
  2. 30 Dec, 2020 1 commit
  3. 22 Dec, 2020 4 commits
    • Daniel Lezcano's avatar
      powercap/drivers/dtpm: Add CPU energy model based support · 0e8f68d7
      Daniel Lezcano authored
      With the powercap dtpm controller, we are able to plug devices with
      power limitation features in the tree.
      
      The following patch introduces the CPU power limitation based on the
      energy model and the performance states.
      
      The power limitation is done at the performance domain level. If some
      CPUs are unplugged, the corresponding power will be subtracted from
      the performance domain total power.
      
      It is up to the platform to initialize the dtpm tree and add the CPU.
      
      Here is an example to create a simple tree with one root node called
      "pkg" and the CPU's performance domains.
      
      static int dtpm_register_pkg(struct dtpm_descr *descr)
      {
      	struct dtpm *pkg;
      	int ret;
      
      	pkg = dtpm_alloc(NULL);
      	if (!pkg)
      		return -ENOMEM;
      
      	ret = dtpm_register(descr->name, pkg, descr->parent);
      	if (ret)
      		return ret;
      
      	return dtpm_register_cpu(pkg);
      }
      
      static struct dtpm_descr descr = {
      	.name = "pkg",
      	.init = dtpm_register_pkg,
      };
      DTPM_DECLARE(descr);
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0e8f68d7
    • Daniel Lezcano's avatar
      powercap/drivers/dtpm: Add API for dynamic thermal power management · a20d0ef9
      Daniel Lezcano authored
      On the embedded world, the complexity of the SoC leads to an
      increasing number of hotspots which need to be monitored and mitigated
      as a whole in order to prevent the temperature to go above the
      normative and legally stated 'skin temperature'.
      
      Another aspect is to sustain the performance for a given power budget,
      for example virtual reality where the user can feel dizziness if the
      GPU performance is capped while a big CPU is processing something
      else. Or reduce the battery charging because the dissipated power is
      too high compared with the power consumed by other devices.
      
      The userspace is the most adequate place to dynamically act on the
      different devices by limiting their power given an application
      profile: it has the knowledge of the platform.
      
      These userspace daemons are in charge of the Dynamic Thermal Power
      Management (DTPM).
      
      Nowadays, the dtpm daemons are abusing the thermal framework as they
      act on the cooling device state to force a specific and arbitrary
      state without taking care of the governor decisions. Given the closed
      loop of some governors that can confuse the logic or directly enter in
      a decision conflict.
      
      As the number of cooling device support is limited today to the CPU
      and the GPU, the dtpm daemons have little control on the power
      dissipation of the system. The out of tree solutions are hacking
      around here and there in the drivers, in the frameworks to have
      control on the devices. The common solution is to declare them as
      cooling devices.
      
      There is no unification of the power limitation unit, opaque states
      are used.
      
      This patch provides a way to create a hierarchy of constraints using
      the powercap framework. The devices which are registered as power
      limit-able devices are represented in this hierarchy as a tree. They
      are linked together with intermediate nodes which are just there to
      propagate the constraint to the children.
      
      The leaves of the tree are the real devices, the intermediate nodes
      are virtual, aggregating the children constraints and power
      characteristics.
      
      Each node have a weight on a 2^10 basis, in order to reflect the
      percentage of power distribution of the children's node. This
      percentage is used to dispatch the power limit to the children.
      
      The weight is computed against the max power of the siblings.
      
      This simple approach allows to do a fair distribution of the power
      limit.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a20d0ef9
    • Daniel Lezcano's avatar
      Documentation/powercap/dtpm: Add documentation for dtpm · f5ad1c74
      Daniel Lezcano authored
      The dynamic thermal and power management is a technique to dynamically
      adjust the power consumption of different devices in order to ensure a
      global thermal constraint.
      
      An userspace daemon is usually monitoring the temperature and the
      power to take immediate action on the device.
      
      The DTPM framework provides an unified API to userspace to act on the
      power.
      
      Document this framework.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f5ad1c74
    • Daniel Lezcano's avatar
      units: Add Watt units · 2ee5f8f0
      Daniel Lezcano authored
      As there are the temperature units, let's add the Watt macros definition.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: default avatarLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2ee5f8f0
  4. 16 Dec, 2020 5 commits
    • Linus Torvalds's avatar
      Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b4ec8054
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "These update cpufreq (core and drivers), cpuidle (polling state
        implementation and the PSCI driver), the OPP (operating performance
        points) framework, devfreq (core and drivers), the power capping RAPL
        (Running Average Power Limit) driver, the Energy Model support, the
        generic power domains (genpd) framework, the ACPI device power
        management, the core system-wide suspend code and power management
        utilities.
      
        Specifics:
      
         - Use local_clock() instead of jiffies in the cpufreq statistics to
           improve accuracy (Viresh Kumar).
      
         - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
           drivers (Viresh Kumar).
      
         - Clean up the cpufreq core, the intel_pstate driver and the
           schedutil cpufreq governor (Rafael Wysocki).
      
         - Fix up error code paths in the sti-cpufreq and mediatek cpufreq
           drivers (Yangtao Li, Qinglang Miao).
      
         - Fix cpufreq_online() to return error codes instead of success (0)
           in all cases when it fails (Wang ShaoBo).
      
         - Add mt8167 support to the mediatek cpufreq driver and blacklist
           mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
      
         - Modify the tegra194 cpufreq driver to always return values from the
           frequency table as the current frequency and clean up that driver
           (Sumit Gupta, Jon Hunter).
      
         - Modify the arm_scmi cpufreq driver to allow it to discover the
           power scale present in the performance protocol and provide this
           information to the Energy Model (Lukasz Luba).
      
         - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
           Rohár).
      
         - Clean up the CPPC cpufreq driver (Ionela Voinescu).
      
         - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
           Bergmann).
      
         - Rework the poling interval selection for the polling state in
           cpuidle (Mel Gorman).
      
         - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
           (Ulf Hansson).
      
         - Modify the OPP framework to support empty (node-less) OPP tables in
           DT for passing dependency information (Nicola Mazzucato).
      
         - Fix potential lockdep issue in the OPP core and clean up the OPP
           core (Viresh Kumar).
      
         - Modify dev_pm_opp_put_regulators() to accept a NULL argument and
           update its users accordingly (Viresh Kumar).
      
         - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
      
         - Add support for governor feature flags to devfreq, make devfreq
           sysfs file permissions depend on the governor and clean up the
           devfreq core (Chanwoo Choi).
      
         - Clean up the tegra20 devfreq driver and deprecate it to allow
           another driver based on EMC_STAT to be used instead of it (Dmitry
           Osipenko).
      
         - Add interconnect support to the tegra30 devfreq driver, allow it to
           take the interconnect and OPP information from DT and clean it up
           (Dmitry Osipenko).
      
         - Add interconnect support to the exynos-bus devfreq driver along
           with interconnect properties documentation (Sylwester Nawrocki).
      
         - Add suport for AMD Fam17h and Fam19h processors to the RAPL power
           capping driver (Victor Ding, Kim Phillips).
      
         - Fix handling of overly long constraint names in the powercap
           framework (Lukasz Luba).
      
         - Fix the wakeup configuration handling for bridges in the ACPI
           device power management core (Rafael Wysocki).
      
         - Add support for using an abstract scale for power units in the
           Energy Model (EM) and document it (Lukasz Luba).
      
         - Add em_cpu_energy() micro-optimization to the EM (Pavankumar
           Kondeti).
      
         - Modify the generic power domains (genpd) framwework to support
           suspend-to-idle (Ulf Hansson).
      
         - Fix creation of debugfs nodes in genpd (Thierry Strudel).
      
         - Clean up genpd (Lina Iyer).
      
         - Clean up the core system-wide suspend code and make it print driver
           flags for devices with debug enabled (Alex Shi, Patrice Chotard,
           Chen Yu).
      
         - Modify the ACPI system reboot code to make it prepare for system
           power off to avoid confusing the platform firmware (Kai-Heng Feng).
      
         - Update the pm-graph (multiple changes, mostly usability-related)
           and cpupower (online and offline CPU information support) PM
           utilities (Todd Brandt, Brahadambal Srinivasan)"
      
      * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
        cpufreq: Fix cpufreq_online() return value on errors
        cpufreq: Fix up several kerneldoc comments
        cpufreq: stats: Use local_clock() instead of jiffies
        cpufreq: schedutil: Simplify sugov_update_next_freq()
        cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
        PM: domains: create debugfs nodes when adding power domains
        opp: of: Allow empty opp-table with opp-shared
        dt-bindings: opp: Allow empty OPP tables
        media: venus: dev_pm_opp_put_*() accepts NULL argument
        drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
        drm/lima: dev_pm_opp_put_*() accepts NULL argument
        PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
        cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
        cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
        opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
        opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
        cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
        opp: Reduce the size of critical section in _opp_kref_release()
        PM / EM: Micro optimization in em_cpu_energy
        cpufreq: arm_scmi: Discover the power scale in performance protocol
        ...
      b4ec8054
    • Linus Torvalds's avatar
      Merge tag 'thermal-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux · b109bc72
      Linus Torvalds authored
      Pull thermal updates from Daniel Lezcano:
      
       - Add upper and lower limits clamps for the cooling device state in the
         power allocator governor (Michael Kao)
      
       - Add upper and lower limits support for the power allocator governor
         (Lukasz Luba)
      
       - Optimize conditions testing for the trip points (Bernard Zhao)
      
       - Replace spin_lock_irqsave by spin_lock in hard IRQ on the rcar driver
         (Tian Tao)
      
       - Add MT8516 dt-bindings and device reset optional support (Fabien
         Parent)
      
       - Add a quiescent period to cool down the PCH when entering S0iX
         (Sumeet Pawnikar)
      
       - Use bitmap API instead of re-inventing the wheel on sun8i (Yangtao
         Li)
      
       - Remove useless NULL check in the hwmon driver (Bernard Zhao)
      
       - Update the current state in the cpufreq cooling device only if the
         frequency change is effective (Zhuguangqing)
      
       - Improve the schema validation for the rcar DT bindings (Geert
         Uytterhoeven)
      
       - Fix the user time unit in the documentation (Viresh Kumar)
      
       - Add PCI ids for Lewisburg PCH (Andres Freund)
      
       - Add hwmon support on amlogic (Martin Blumenstingl)
      
       - Fix build failure for PCH entering on in S0iX (Randy Dunlap)
      
       - Improve the k_* coefficient for the power allocator governor (Lukasz
         Luba)
      
       - Fix missing const on a sysfs attribute (Rikard Falkeborn)
      
       - Remove broken interrupt support on rcar to be replaced by a new one
         (Niklas Söderlund)
      
       - Improve the error code handling at init time on imx8mm (Fabio
         Estevam)
      
       - Compute interval validity once instead at each temperature reading
         iteration on acerhdf (Daniel Lezcano)
      
       - Add r8a779a0 support (Niklas Söderlund)
      
       - Add PCI ids for AlderLake PCH and mmio refactoring (Srinivas
         Pandruvada)
      
       - Add RFIM and mailbox support on int340x (Srinivas Pandruvada)
      
       - Use macro for temperature calculation on PCH (Sumeet Pawnikar)
      
       - Simplify return conditions at probe time on Broadcom (Zheng Yongjun)
      
       - Fix workload name on PCH (Srinivas Pandruvada)
      
       - Migrate the devfreq cooling device code to the energy model API
         (Lukasz Luba)
      
       - Emit a warning if the thermal_zone_device_update is called without
         the .get_temp() ops (Daniel Lezcano)
      
       - Add critical and hot ops for the thermal zone (Daniel Lezcano)
      
       - Remove notification usage when critical is reached on rcar (Daniel
         Lezcano)
      
       - Fix devfreq build when ENERGY_MODEL is not set (Lukasz Luba)
      
      * tag 'thermal-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (45 commits)
        thermal/drivers/devfreq_cooling: Fix the build when !ENERGY_MODEL
        thermal/drivers/rcar: Remove notification usage
        thermal/core: Add critical and hot ops
        thermal/core: Emit a warning if the thermal zone is updated without ops
        drm/panfrost: Register devfreq cooling and attempt to add Energy Model
        thermal: devfreq_cooling: remove old power model and use EM
        thermal: devfreq_cooling: add new registration functions with Energy Model
        thermal: devfreq_cooling: use a copy of device status
        thermal: devfreq_cooling: change tracing function and arguments
        thermal: int340x: processor_thermal: Correct workload type name
        thermal: broadcom: simplify the return expression of bcm2711_thermal_probe()
        thermal: intel: pch: use macro for temperature calculation
        thermal: int340x: processor_thermal: Add mailbox driver
        thermal: int340x: processor_thermal: Add RFIM driver
        thermal: int340x: processor_thermal: Add AlderLake PCI device id
        thermal: int340x: processor_thermal: Refactor MMIO interface
        thermal: rcar_gen3_thermal: Add r8a779a0 support
        dt-bindings: thermal: rcar-gen3-thermal: Add r8a779a0 support
        platform/x86/drivers/acerhdf: Check the interval value when it is set
        platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval
        ...
      b109bc72
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ee249d30
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - support for inhibiting input devices at request from userspace. If a
         device implements open/close methods, it can also put device into low
         power state. This is needed, for example, to disable keyboard and
         touchpad on convertibles when they are transitioned into tablet mode
      
       - now that ordinary input devices can be configured for polling mode,
         dedicated input polling device implementation has been removed
      
       - GTCO tablet driver has been removed, as it used problematic custom
         HID parser, devices are EOL, and there is no interest from the
         manufacturer
      
       - a new driver for Dialog DA7280 haptic chips has been introduced
      
       - a new driver for power button on Dell Wyse 3020
      
       - support for eKTF2132 in ektf2127 driver
      
       - support for SC2721 and SC2730 in sc27xx-vibra driver
      
       - enhancements for Atmel touchscreens, AD7846 touchscreens, Elan
         touchpads, ADP5589, ST1232 touchscreen, TM2 touchkey drivers
      
       - fixes and cleanups to allow clean builds with W=1
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits)
        Input: da7280 - fix spelling mistake "sequemce" -> "sequence"
        Input: cyapa_gen6 - fix out-of-bounds stack access
        Input: sc27xx - add support for sc2730 and sc2721
        dt-bindings: input: Add compatible string for SC2721 and SC2730
        dt-bindings: input: Convert sc27xx-vibra.txt to json-schema
        Input: stmpe - add axis inversion and swapping capability
        Input: adp5589-keys - do not explicitly control IRQ for wakeup
        Input: adp5589-keys - do not unconditionally configure as wakeup source
        Input: ipx4xx-beeper - convert comma to semicolon
        Input: parkbd - convert comma to semicolon
        Input: new da7280 haptic driver
        dt-bindings: input: Add document bindings for DA7280
        MAINTAINERS: da7280 updates to the Dialog Semiconductor search terms
        Input: elantech - fix protocol errors for some trackpoints in SMBus mode
        Input: elan_i2c - add new trackpoint report type 0x5F
        Input: elants - document some registers and values
        Input: atmel_mxt_ts - simplify the return expression of mxt_send_bootloader_cmd()
        Input: imx_keypad - add COMPILE_TEST support
        Input: applespi - use new structure for SPI transfer delays
        Input: synaptics-rmi4 - use new structure for SPI transfer delays
        ...
      ee249d30
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.11-1' of... · 61f91425
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver updates from Hans de Goede:
       "Highlights:
      
         - New driver for changing BIOS settings from within Linux on Dell
           devices. This introduces a new generic sysfs API for this. Lenovo
           is working on also supporting this API on their devices
      
         - New Intel PMT telemetry and crashlog drivers
      
         - Support for SW_TABLET_MODE reporting for the acer-wmi and intel-hid
           drivers
      
         - Preparation work for improving support for Microsoft Surface
           hardware
      
         - Various fixes / improvements / quirks for the panasonic-laptop and
           others"
      
      * tag 'platform-drivers-x86-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (81 commits)
        platform/x86: ISST: Mark mmio_range_devid_0 and mmio_range_devid_1 with static keyword
        platform/x86: intel-hid: add Rocket Lake ACPI device ID
        x86/platform: classmate-laptop: add WiFi media button
        platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system
        platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems
        tools/power/x86/intel-speed-select: Update version for v5.11
        tools/power/x86/intel-speed-select: Account for missing sysfs for die_id
        tools/power/x86/intel-speed-select: Read TRL from mailbox
        platform/x86: intel-hid: Do not create SW_TABLET_MODE input-dev when a KIOX010A ACPI dev is present
        platform/x86: intel-hid: Add alternative method to enable switches
        platform/x86: intel-hid: Add support for SW_TABLET_MODE
        platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models
        platform/x86: ISST: Change PCI device macros
        platform/x86: ISST: Allow configurable offset range
        platform/x86: ISST: Check for unaligned mmio address
        acer-wireless: send an EV_SYN/SYN_REPORT between state changes
        platform/x86: dell-wmi-sysman: work around for BIOS bug
        platform/x86: mlx-platform: remove an unused variable
        platform/x86: thinkpad_acpi: remove trailing semicolon in macro definition
        platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init
        ...
      61f91425
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 0f974581
      Linus Torvalds authored
      Pull hwmon updates from Guenter Roeck:
       "New drivers:
         - SB-TSI sensors
         - Lineat Technology LTC2992
         - Delta power supplies Q54SJ108A2
         - Maxim MAX127
         - Corsair PSU
         - STMicroelectronics PM6764 Voltage Regulator
      
        New chip support:
         - P10 added to fsi/occ driver
         - NCT6687D added to nct6883 driver
         - Intel-based Xserves added to applesmc driver
         - AMD family 19h model 01h added to amd_energy driver
      
        And various minor bug fixes and improvements"
      
      * tag 'hwmon-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (41 commits)
        dt-bindings: (hwmon/sbtsi_temp) Add SB-TSI hwmon driver bindings
        hwmon: (sbtsi) Add documentation
        hwmon: (sbtsi) Add basic support for SB-TSI sensors
        hwmon: (iio_hwmon) Drop bogus __refdata annotation
        hwmon: (xgene) Drop bogus __refdata annotation
        dt-bindings: hwmon: convert AD ADM1275 bindings to dt-schema
        hwmon: (occ) Add new temperature sensor type
        fsi: occ: Add support for P10
        dt-bindings: fsi: Add P10 OCC device documentation
        dt-bindings: hwmon: convert TI ADS7828 bindings to dt-schema
        dt-bindings: hwmon: convert AD AD741x bindings to dt-schema
        dt-bindings: hwmon: convert TI INA2xx bindings to dt-schema
        hwmon: (ltc2992) Fix less than zero comparisons with an unsigned integer
        hwmon: (pmbus/q54sj108a2) Correct title underline length
        dt-bindings: hwmon: Add documentation for ltc2992
        hwmon: (ltc2992) Add support for GPIOs.
        hwmon: (ltc2992) Add support
        hwmon: (pmbus) Driver for Delta power supplies Q54SJ108A2
        hwmon: Add driver for STMicroelectronics PM6764 Voltage Regulator
        hwmon: (nct6683) Support NCT6687D.
        ...
      0f974581
  5. 15 Dec, 2020 26 commits
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · ce51c2b7
      Linus Torvalds authored
      Pull MMC updates from Ulf Hansson:
       "MMC core:
         - Initial support for SD express card/host
      
        MMC host:
         - mxc: Convert the driver to DT-only
         - mtk-sd: Add HS400 enhanced strobe support
         - mtk-sd: Add support for the MT8192 SoC variant
         - sdhci-acpi: Allow changing HS200/HS400 driver strength for AMDI0040
         - sdhci-esdhc-imx: Convert the driver to DT-only
         - sdhci-pci-gli: Improve performance for HS400 mode for GL9763E
         - sdhci-pci-gli: Reduce power consumption for GL9755
         - sdhci-xenon: Introduce ACPI support
         - tmio: Fix command error processing
         - tmio: Inform the core about the max_busy_timeout
         - tmio/renesas_sdhi: Support custom calculation of busy-wait time
         - renesas_sdhi: Reset SCC only when available
         - rtsx_pci: Add SD Express mode support for RTS5261
         - rtsx_pci: Various fixes and improvements for RTS5261
      
        MEMSTICK:
         - Minor fixes/improvements"
      
      * tag 'mmc-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (72 commits)
        dt-bindings: mmc: eliminate yamllint warnings
        mmc: sdhci-xenon: introduce ACPI support
        mmc: sdhci-xenon: use clk only with DT
        mmc: sdhci-xenon: switch to device_* API
        mmc: sdhci-xenon: use match data for controllers variants
        dt-bindings: mmc: Fix xlnx,mio-bank property values for arasan driver
        mmc: renesas_sdhi: populate hook for longer busy_wait
        mmc: tmio: add hook for custom busy_wait calculation
        mmc: tmio: set max_busy_timeout
        dt-bindings: mmc: imx: fix the wrongly dropped imx8qm compatible string
        mmc: sdhci-pci-gli: Disable slow mode in HS400 mode for GL9763E
        mmc: sdhci: Use more concise device_property_read_u64
        memstick: r592: Fix error return in r592_probe()
        mmc: mxc: Convert the driver to DT-only
        mmc: mxs: Remove the unused .id_table
        mmc: sdhci-of-arasan: Fix fall-through warnings for Clang
        mmc: sdhci-pci-gli: Reduce power consumption for GL9755
        mmc: mediatek: depend on COMMON_CLK to fix compile tests
        mmc: pxamci: Fix error return code in pxamci_probe
        mmc: sdhci: Update firmware interface API
        ...
      ce51c2b7
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 9d0d8867
      Linus Torvalds authored
      Pull i2c updates from Wolfram Sang:
       "A bit smaller this time with mostly usual driver updates. Slave
        support for imx stands out a little"
      
      * 'i2c/for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (30 commits)
        i2c: remove check that can never be true
        i2c: Warn when device removing fails
        dt-bindings: i2c: Update DT binding docs to support SiFive FU740 SoC
        dt-bindings: i2c: Add compatible string for AM64 SoC
        i2c: designware: Make register offsets all of the same width
        i2c: designware: Switch header to use BIT() and GENMASK()
        i2c: pxa: move to generic GPIO recovery
        i2c: sh_mobile: Mark adapter suspended during suspend
        i2c: owl: Add compatible for the Actions Semi S500 I2C controller
        dt-bindings: i2c: owl: Convert Actions Semi Owl binding to a schema
        i2c: imx: support slave mode for imx I2C driver
        i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL
        i2c: ocores: Avoid false-positive error log message.
        Revert "i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630"
        i2c: mxs: Remove unneeded platform_device_id
        i2c: pca-platform: drop two members from driver data that are assigned to only
        i2c: imx: Remove unused .id_table support
        i2c: nvidia-gpu: drop empty stub for runtime pm
        dt-bindings: i2c: mellanox,i2c-mlxbf: convert txt to YAML schema
        i2c: mv64xxx: Add bus error recovery
        ...
      9d0d8867
    • Linus Torvalds's avatar
      Merge tag 'spi-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 605ea5aa
      Linus Torvalds authored
      Pull spi updates from Mark Brown:
       "The big change this release has been some excellent work from Lukas
        Wunner which closes a bunch of holes in the cleanup paths for drivers,
        mainly introduced as a result of devm conversions causing bad
        interactions with the support SPI has for allocating the bus and
        driver data together.
      
        Together with some of the other work done it feels like we've turned
        the corner on several long standing pain points with the API.
      
        Summary:
      
         - Many cleanups around probe/remove and error handling from Lukas
           Wunner and Uwe Kleine-König, and further fixes around PM from Zhang
           Qilong.
      
         - Provide a mask for which bits of the mode can safely be configured
           by drivers and use that to fix an issue with the ADS7846 driver.
      
         - Documentation of the expected interactions between SPI and GPIO
           level chip select polarity configuration from H. Nikolaus Schaller,
           hopefully we're pretty much at the end of sorting out the
           interactions there. Thanks to Nikolaus, Sven Van Asbroeck and Linus
           Walleij for this.
      
         - DMA support for Allwinner sun6i controllers.
      
         - Support for Canaan K210 Designware implementations and Intel Adler
           Lake"
      
      * tag 'spi-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (69 commits)
        spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors
        spi: Limit the spi device max speed to controller's max speed
        spi: spi-geni-qcom: Use the new method of gpio CS control
        platform/chrome: cros_ec_spi: Drop bits_per_word assignment
        platform/chrome: cros_ec_spi: Don't overwrite spi::mode
        spi: dw: Add support for the Canaan K210 SoC SPI
        spi: dw: Add support for 32-bits max xfer size
        dt-bindings: spi: dw-apb-ssi: Add Canaan K210 SPI controller
        spi: Update DT binding docs to support SiFive FU740 SoC
        spi: atmel-quadspi: Fix use-after-free on unbind
        spi: npcm-fiu: Disable clock in probe error path
        spi: ar934x: Don't leak SPI master in probe error path
        spi: mt7621: Don't leak SPI master in probe error path
        spi: mt7621: Disable clock in probe error path
        media: netup_unidvb: Don't leak SPI master in probe error path
        spi: sc18is602: Don't leak SPI master in probe error path
        spi: rb4xx: Don't leak SPI master in probe error path
        spi: gpio: Don't leak SPI master in probe error path
        spi: spi-mtk-nor: Don't leak SPI master in probe error path
        spi: mxic: Don't leak SPI master in probe error path
        ...
      605ea5aa
    • Linus Torvalds's avatar
      Merge tag 'regulator-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 2dda5700
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "This has been a quiet release for the regulator API, a few new drivers
        and the usual fixes and cleanup traffic but not much else going on:
      
         - Optimisations for the handling of voltage enumeration, especially
           with sparse selector sets, from Claudiu Beznea.
      
         - Support for several ARM SCMI regulators, Dialog DA9121, NXP PF8x00,
           Qualcomm PMX55, PM8350 and PM8350c
      
        The addition of the SCMI regulator driver (which controls regulators
        via system firmware) means that we've pulled in the support for the
        underlying firmware operations from the firmware tree"
      
      * tag 'regulator-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (53 commits)
        regulator: mc13892-regulator: convert comma to semicolon
        regulator: pfuze100: Convert the driver to DT-only
        regulator: max14577: Add proper module aliases strings
        regulator: da9121: Potential Oops in da9121_assign_chip_model()
        regulator: da9121: Fix index used for DT property
        regulator: da9121: Remove uninitialised string variable
        regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x
        regulator: qcom-rpmh: Add support for PM8350/PM8350c
        regulator: dt-bindings: Add PM8350x compatibles
        regulator: da9121: include linux/gpio/consumer.h
        regulator: da9121: Mark some symbols with static keyword
        regulator: da9121: Request IRQ directly and free in release function to avoid masking race
        regulator: da9121: add interrupt support
        regulator: da9121: add mode support
        regulator: da9121: add current support
        regulator: da9121: Update registration to support multiple buck variants
        regulator: da9121: Add support for device variants via devicetree
        regulator: da9121: Add device variant descriptors
        regulator: da9121: Add device variant regmaps
        regulator: da9121: Add device variants
        ...
      2dda5700
    • Linus Torvalds's avatar
      Merge tag 'regmap-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · a45f1d43
      Linus Torvalds authored
      Pull regmap updates from Mark Brown:
       "This is quite a busy release for regmap with two substantial features
        being added:
      
          - Support for register maps Soundwire 1.2 multi-byte operations,
            allowing atomic support for registers larger than a single byte.
      
          - Support for relaxed I/O without barriers in MMIO regmaps, allowing
            them to be used efficiently on systems where default MMIO
            operations include barriers.
      
        There was also an addition and revert of use of the new Soundwire
        support for RT715 due to build issues with the driver built in, my
        tests only covered building it as a module, the patch wasn't just
        dropped as it had already been merged elsewhere"
      
      * tag 'regmap-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        ASoC: rt715: Fix build
        regmap: sdw: add required header files
        regmap: Remove duplicate `type` field from regmap `regcache_sync` trace event
        regmap: Fix order of regmap write log
        regmap: mmio: add config option to allow relaxed MMIO accesses
      a45f1d43
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2cffa11e
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "Generic interrupt and irqchips subsystem updates. Unusually, there is
        not a single completely new irq chip driver, just new DT bindings and
        extensions of existing drivers to accomodate new variants!
      
        Core:
      
         - Consolidation and robustness changes for irq time accounting
      
         - Cleanup and consolidation of irq stats
      
         - Remove the fasteoi IPI flow which has been proved useless
      
         - Provide an interface for converting legacy interrupt mechanism into
           irqdomains
      
        Drivers:
      
         - Preliminary support for managed interrupts on platform devices
      
         - Correctly identify allocation of MSIs proxyied by another device
      
         - Generalise the Ocelot support to new SoCs
      
         - Improve GICv4.1 vcpu entry, matching the corresponding KVM
           optimisation
      
         - Work around spurious interrupts on Qualcomm PDC
      
         - Random fixes and cleanups"
      
      * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
        irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
        driver core: platform: Add devm_platform_get_irqs_affinity()
        ACPI: Drop acpi_dev_irqresource_disabled()
        resource: Add irqresource_disabled()
        genirq/affinity: Add irq_update_affinity_desc()
        irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge
        irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
        platform-msi: Track shared domain allocation
        irqchip/ti-sci-intr: Fix freeing of irqs
        irqchip/ti-sci-inta: Fix printing of inta id on probe success
        drivers/irqchip: Remove EZChip NPS interrupt controller
        Revert "genirq: Add fasteoi IPI flow"
        irqchip/hip04: Make IPIs use handle_percpu_devid_irq()
        irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()
        irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()
        irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()
        irqchip/ocelot: Add support for Jaguar2 platforms
        irqchip/ocelot: Add support for Serval platforms
        irqchip/ocelot: Add support for Luton platforms
        irqchip/ocelot: prepare to support more SoC
        ...
      2cffa11e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 5b200f57
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
       "More MM work: a memcg scalability improvememt"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/lru: revise the comments of lru_lock
        mm/lru: introduce relock_page_lruvec()
        mm/lru: replace pgdat lru_lock with lruvec lock
        mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
        mm/compaction: do page isolation first in compaction
        mm/lru: introduce TestClearPageLRU()
        mm/mlock: remove __munlock_isolate_lru_page()
        mm/mlock: remove lru_lock on TestClearPageMlocked
        mm/vmscan: remove lruvec reget in move_pages_to_lru
        mm/lru: move lock into lru_note_cost
        mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn
        mm/memcg: add debug checking in lock_page_memcg
        mm: page_idle_get_page() does not need lru_lock
        mm/rmap: stop store reordering issue on page->mapping
        mm/vmscan: remove unnecessary lruvec adding
        mm/thp: narrow lru locking
        mm/thp: simplify lru_add_page_tail()
        mm/thp: use head for head page in lru_add_page_tail()
        mm/thp: move lru_add_page_tail() to huge_memory.c
      5b200f57
    • Hugh Dickins's avatar
      mm/lru: revise the comments of lru_lock · 15b44736
      Hugh Dickins authored
      Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix
      the incorrect comments in code.  Also fixed some zone->lru_lock comment
      error from ancient time.  etc.
      
      I struggled to understand the comment above move_pages_to_lru() (surely
      it never calls page_referenced()), and eventually realized that most of
      it had got separated from shrink_active_list(): move that comment back.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15b44736
    • Alexander Duyck's avatar
      mm/lru: introduce relock_page_lruvec() · 2a5e4e34
      Alexander Duyck authored
      Add relock_page_lruvec() to replace repeated same code, no functional
      change.
      
      When testing for relock we can avoid the need for RCU locking if we simply
      compare the page pgdat and memcg pointers versus those that the lruvec is
      holding.  By doing this we can avoid the extra pointer walks and accesses
      of the memory cgroup.
      
      In addition we can avoid the checks entirely if lruvec is currently NULL.
      
      [alex.shi@linux.alibaba.com: use page_memcg()]
        Link: https://lkml.kernel.org/r/66d8e79d-7ec6-bfbc-1c82-bf32db3ae5b7@linux.alibaba.com
      
      Link: https://lkml.kernel.org/r/1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a5e4e34
    • Alex Shi's avatar
      mm/lru: replace pgdat lru_lock with lruvec lock · 6168d0da
      Alex Shi authored
      This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
      each of memcg per node.  So on a large machine, each of memcg don't have
      to suffer from per node pgdat->lru_lock competition.  They could go fast
      with their self lru_lock.
      
      After move memcg charge before lru inserting, page isolation could
      serialize page's memcg, then per memcg lruvec lock is stable and could
      replace per node lru lock.
      
      In isolate_migratepages_block(), compact_unlock_should_abort and
      lock_page_lruvec_irqsave are open coded to work with compact_control.
      Also add a debug func in locking which may give some clues if there are
      sth out of hands.
      
      Daniel Jordan's testing show 62% improvement on modified readtwice case on
      his 2P * 10 core * 2 HT broadwell box.
      https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/
      
      Hugh Dickins helped on the patch polish, thanks!
      
      [alex.shi@linux.alibaba.com: fix comment typo]
        Link: https://lkml.kernel.org/r/5b085715-292a-4b43-50b3-d73dc90d1de5@linux.alibaba.com
      [alex.shi@linux.alibaba.com: use page_memcg()]
        Link: https://lkml.kernel.org/r/5a4c2b72-7ee8-2478-fc0e-85eb83aafec4@linux.alibaba.com
      
      Link: https://lkml.kernel.org/r/1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Rong Chen <rong.a.chen@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6168d0da
    • Alex Shi's avatar
      mm/swap.c: serialize memcg changes in pagevec_lru_move_fn · fc574c23
      Alex Shi authored
      Hugh Dickins' found a memcg change bug on original version: If we want to
      change the pgdat->lru_lock to memcg's lruvec lock, we have to serialize
      mem_cgroup_move_account during pagevec_lru_move_fn.  The possible bad
      scenario would like:
      
      	cpu 0					cpu 1
      lruvec = mem_cgroup_page_lruvec()
      					if (!isolate_lru_page())
      						mem_cgroup_move_account
      
      spin_lock_irqsave(&lruvec->lru_lock <== wrong lock.
      
      So we need TestClearPageLRU to block isolate_lru_page(), that serializes
      the memcg change.  and then removing the PageLRU check in move_fn callee
      as the consequence.
      
      __pagevec_lru_add_fn() is different from the others, because the pages it
      deals with are, by definition, not yet on the lru.  TestClearPageLRU is
      not needed and would not work, so __pagevec_lru_add() goes its own way.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-17-git-send-email-alex.shi@linux.alibaba.comReported-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc574c23
    • Alex Shi's avatar
      mm/compaction: do page isolation first in compaction · 9df41314
      Alex Shi authored
      Currently, compaction would get the lru_lock and then do page isolation
      which works fine with pgdat->lru_lock, since any page isoltion would
      compete for the lru_lock.  If we want to change to memcg lru_lock, we have
      to isolate the page before getting lru_lock, thus isoltion would block
      page's memcg change which relay on page isoltion too.  Then we could
      safely use per memcg lru_lock later.
      
      The new page isolation use previous introduced TestClearPageLRU() + pgdat
      lru locking which will be changed to memcg lru lock later.
      
      Hugh Dickins <hughd@google.com> fixed following bugs in this patch's early
      version:
      
      Fix lots of crashes under compaction load: isolate_migratepages_block()
      must clean up appropriately when rejecting a page, setting PageLRU again
      if it had been cleared; and a put_page() after get_page_unless_zero()
      cannot safely be done while holding locked_lruvec - it may turn out to be
      the final put_page(), which will take an lruvec lock when PageLRU.
      
      And move __isolate_lru_page_prepare back after get_page_unless_zero to
      make trylock_page() safe: trylock_page() is not safe to use at this time:
      its setting PG_locked can race with the page being freed or allocated
      ("Bad page"), and can also erase flags being set by one of those "sole
      owners" of a freshly allocated page who use non-atomic __SetPageFlag().
      
      Link: https://lkml.kernel.org/r/1604566549-62481-16-git-send-email-alex.shi@linux.alibaba.comSuggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9df41314
    • Alex Shi's avatar
      mm/lru: introduce TestClearPageLRU() · d25b5bd8
      Alex Shi authored
      Currently lru_lock still guards both lru list and page's lru bit, that's
      ok.  but if we want to use specific lruvec lock on the page, we need to
      pin down the page's lruvec/memcg during locking.  Just taking lruvec lock
      first may be undermined by the page's memcg charge/migration.  To fix this
      problem, we will clear the lru bit out of locking and use it as pin down
      action to block the page isolation in memcg changing.
      
      So now a standard steps of page isolation is following:
      	1, get_page(); 	       #pin the page avoid to be free
      	2, TestClearPageLRU(); #block other isolation like memcg change
      	3, spin_lock on lru_lock; #serialize lru list access
      	4, delete page from lru list;
      
      This patch start with the first part: TestClearPageLRU, which combines
      PageLRU check and ClearPageLRU into a macro func TestClearPageLRU.  This
      function will be used as page isolation precondition to prevent other
      isolations some where else.  Then there are may !PageLRU page on lru list,
      need to remove BUG() checking accordingly.
      
      There 2 rules for lru bit now:
      1, the lru bit still indicate if a page on lru list, just in some
         temporary moment(isolating), the page may have no lru bit when
         it's on lru list.  but the page still must be on lru list when the
         lru bit set.
      2, have to remove lru bit before delete it from lru list.
      
      As Andrew Morton mentioned this change would dirty cacheline for a page
      which isn't on the LRU.  But the loss would be acceptable in Rong Chen
      <rong.a.chen@intel.com> report:
      https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/
      
      Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.comSuggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d25b5bd8
    • Alex Shi's avatar
      mm/mlock: remove __munlock_isolate_lru_page() · 13805a88
      Alex Shi authored
      __munlock_isolate_lru_page() only has one caller, remove it to clean up
      and simplify code.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-14-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13805a88
    • Alex Shi's avatar
      mm/mlock: remove lru_lock on TestClearPageMlocked · 3db19aa3
      Alex Shi authored
      In the func munlock_vma_page, comments mentained lru_lock needed for
      serialization with split_huge_pages.  But the page must be PageLocked as
      well as pages in split_huge_page series funcs.  Thus the PageLocked is
      enough to serialize both funcs.
      
      Further more, Hugh Dickins pointed: before splitting in
      split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes
      which protect the page from munlock.  Thus, no needs to guard
      __split_huge_page_tail for mlock clean, just keep the lru_lock there for
      isolation purpose.
      
      LKP found a preempt issue on __mod_zone_page_state which need change to
      mod_zone_page_state.  Thanks!
      
      Link: https://lkml.kernel.org/r/1604566549-62481-13-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3db19aa3
    • Alex Shi's avatar
      mm/vmscan: remove lruvec reget in move_pages_to_lru · afca9157
      Alex Shi authored
      Isolated page shouldn't be recharged by memcg since the memcg migration
      isn't possible at the time.  All pages were isolated from the same lruvec
      (and isolation inhibits memcg migration).  So remove unnecessary
      regetting.
      
      Thanks to Alexander Duyck for pointing this out.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-12-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      afca9157
    • Alex Shi's avatar
      mm/lru: move lock into lru_note_cost · 75cc3c91
      Alex Shi authored
      We have to move lru_lock into lru_note_cost, since it cycle up on memcg
      tree, for future per lruvec lru_lock replace.  It's a bit ugly and may
      cost a bit more locking, but benefit from multiple memcg locking could
      cover the lost.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-11-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75cc3c91
    • Alex Shi's avatar
      mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn · c7c7b80c
      Alex Shi authored
      Fold the PGROTATED event collection into pagevec_move_tail_fn call back
      func like other funcs does in pagevec_lru_move_fn.  Thus we could save
      func call pagevec_move_tail().  Now all usage of pagevec_lru_move_fn are
      same and no needs of its 3rd parameter.
      
      It's just simply the calling. No functional change.
      
      [lkp@intel.com: found a build issue in the original patch, thanks]
      
      Link: https://lkml.kernel.org/r/1604566549-62481-10-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7c7b80c
    • Alex Shi's avatar
      mm/memcg: add debug checking in lock_page_memcg · 20ad50d6
      Alex Shi authored
      Add a debug checking in lock_page_memcg, then we could get alarm if
      anything wrong here.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-9-git-send-email-alex.shi@linux.alibaba.comSuggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20ad50d6
    • Hugh Dickins's avatar
      mm: page_idle_get_page() does not need lru_lock · 880fc6ba
      Hugh Dickins authored
      It is necessary for page_idle_get_page() to recheck PageLRU() after
      get_page_unless_zero(), but holding lru_lock around that serves no
      useful purpose, and adds to lru_lock contention: delete it.
      
      See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the
      discussion that led to lru_lock there; but __page_set_anon_rmap() now uses
      WRITE_ONCE(), and I see no other risk in page_idle_clear_pte_refs() using
      rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but not
      entirely prevented by page_count() check in ksm.c's write_protect_page():
      that risk being shared with page_referenced() and not helped by lru_lock).
      
      Link: https://lkml.kernel.org/r/1604566549-62481-8-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      880fc6ba
    • Alex Shi's avatar
      mm/rmap: stop store reordering issue on page->mapping · 16f5e707
      Alex Shi authored
      Hugh Dickins and Minchan Kim observed a long time issue which discussed
      here, but actully the mentioned fix in
      
        https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop/
      
      was missed.
      
      The store reordering may cause problem in the scenario:
      
      	CPU 0						CPU1
         do_anonymous_page
      	page_add_new_anon_rmap()
      	  page->mapping = anon_vma + PAGE_MAPPING_ANON
      	lru_cache_add_inactive_or_unevictable()
      	  spin_lock(lruvec->lock)
      	  SetPageLRU()
      	  spin_unlock(lruvec->lock)
      						/* idletacking judged it as LRU
      						 * page so pass the page in
      						 * page_idle_clear_pte_refs
      						 */
      						page_idle_clear_pte_refs
      						  rmap_walk
      						    if PageAnon(page)
      
      Johannes give detailed examples how the store reordering could cause
      trouble: "The concern is the SetPageLRU may get reorder before
      'page->mapping' setting, That would make CPU 1 will observe at
      page->mapping after observing PageLRU set on the page.
      
      1. anon_vma + PAGE_MAPPING_ANON
      
         That's the in-order scenario and is fine.
      
      2. NULL
      
         That's possible if the page->mapping store gets reordered to occur
         after SetPageLRU. That's fine too because we check for it.
      
      3. anon_vma without the PAGE_MAPPING_ANON bit
      
         That would be a problem and could lead to all kinds of undesirable
         behavior including crashes and data corruption.
      
         Is it possible? AFAICT the compiler is allowed to tear the store to
         page->mapping and I don't see anything that would prevent it.
      
      That said, I also don't see how the reader testing PageLRU under the
      lru_lock would prevent that in the first place.  AFAICT we need that
      WRITE_ONCE() around the page->mapping assignment."
      
      [alex.shi@linux.alibaba.com: updated for comments change from Johannes]
        Link: https://lkml.kernel.org/r/e66ef2e5-c74c-6498-e8b3-56c37b9d2d15@linux.alibaba.com
      
      Link: https://lkml.kernel.org/r/1604566549-62481-7-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16f5e707
    • Alex Shi's avatar
      mm/vmscan: remove unnecessary lruvec adding · 3d06afab
      Alex Shi authored
      We don't have to add a freeable page into lru and then remove from it.
      This change saves a couple of actions and makes the moving more clear.
      
      The SetPageLRU needs to be kept before put_page_testzero for list
      integrity, otherwise:
      
        #0 move_pages_to_lru             #1 release_pages
        if !put_page_testzero
           			           if (put_page_testzero())
           			              !PageLRU //skip lru_lock
           SetPageLRU()
           list_add(&page->lru,)
                                               list_add(&page->lru,)
      
      [akpm@linux-foundation.org: coding style fixes]
      
      Link: https://lkml.kernel.org/r/1604566549-62481-6-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d06afab
    • Alex Shi's avatar
      mm/thp: narrow lru locking · b6769834
      Alex Shi authored
      lru_lock and page cache xa_lock have no obvious reason to be taken one
      way round or the other: until now, lru_lock has been taken before page
      cache xa_lock, when splitting a THP; but nothing else takes them
      together.  Reverse that ordering: let's narrow the lru locking - but
      leave local_irq_disable to block interrupts throughout, like before.
      
      Hugh Dickins point: split_huge_page_to_list() was already silly, to be
      using the _irqsave variant: it's just been taking sleeping locks, so
      would already be broken if entered with interrupts enabled.  So we can
      save passing flags argument down to __split_huge_page().
      
      Why change the lock ordering here? That was hard to decide.  One reason:
      when this series reaches per-memcg lru locking, it relies on the THP's
      memcg to be stable when taking the lru_lock: that is now done after the
      THP's refcount has been frozen, which ensures page memcg cannot change.
      
      Another reason: previously, lock_page_memcg()'s move_lock was presumed
      to nest inside lru_lock; but now lru_lock must nest inside (page cache
      lock inside) move_lock, so it becomes possible to use lock_page_memcg()
      to stabilize page memcg before taking its lru_lock.  That is not the
      mechanism used in this series, but it is an option we want to keep open.
      
      [hughd@google.com: rewrite commit log]
      
      Link: https://lkml.kernel.org/r/1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6769834
    • Alex Shi's avatar
      mm/thp: simplify lru_add_page_tail() · 6dbb5741
      Alex Shi authored
      Simplify lru_add_page_tail(), there are actually only two cases
      possible: split_huge_page_to_list(), with list supplied and head
      isolated from lru by its caller; or split_huge_page(), with NULL list
      and head on lru - because when head is racily isolated from lru, the
      isolator's reference will stop the split from getting any further than
      its page_ref_freeze().
      
      So decide between the two cases by "list", but add VM_WARN_ON()s to
      verify that they match our lru expectations.
      
      [Hugh Dickins: rewrite commit log]
      
      Link: https://lkml.kernel.org/r/1604566549-62481-4-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6dbb5741
    • Alex Shi's avatar
      mm/thp: use head for head page in lru_add_page_tail() · 94866635
      Alex Shi authored
      Since the first parameter is only used by head page, it's better to make
      it explicit.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-3-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94866635
    • Alex Shi's avatar
      mm/thp: move lru_add_page_tail() to huge_memory.c · 88dcb9a3
      Alex Shi authored
      Patch series "per memcg lru lock", v21.
      
      This patchset includes 3 parts:
      
       1) some code cleanup and minimum optimization as preparation
      
       2) use TestCleanPageLRU as page isolation's precondition
      
       3) replace per node lru_lock with per memcg per node lru_lock
      
      Current lru_lock is one for each of node, pgdat->lru_lock, that guard
      for lru lists, but now we had moved the lru lists into memcg for long
      time.  Still using per node lru_lock is clearly unscalable, pages on
      each of memcgs have to compete each others for a whole lru_lock.  This
      patchset try to use per lruvec/memcg lru_lock to repleace per node lru
      lock to guard lru lists, make it scalable for memcgs and get performance
      gain.
      
      Currently lru_lock still guards both lru list and page's lru bit, that's
      ok.  but if we want to use specific lruvec lock on the page, we need to
      pin down the page's lruvec/memcg during locking.  Just taking lruvec
      lock first may be undermined by the page's memcg charge/migration.  To
      fix this problem, we could take out the page's lru bit clear and use it
      as pin down action to block the memcg changes.  That's the reason for
      new atomic func TestClearPageLRU.  So now isolating a page need both
      actions: TestClearPageLRU and hold the lru_lock.
      
      The typical usage of this is isolate_migratepages_block() in
      compaction.c we have to take lru bit before lru lock, that serialized
      the page isolation in memcg page charge/migration which will change
      page's lruvec and new lru_lock in it.
      
      The above solution suggested by Johannes Weiner, and based on his new
      memcg charge path, then have this patchset.  (Hugh Dickins tested and
      contributed much code from compaction fix to general code polish, thanks
      a lot!).
      
      Daniel Jordan's testing show 62% improvement on modified readtwice case
      on his 2P * 10 core * 2 HT broadwell box on v18, which has no much
      different with this v20.
      
       https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/
      
      Thanks to Hugh Dickins and Konstantin Khlebnikov, they both brought this
      idea 8 years ago, and others who gave comments as well: Daniel Jordan,
      Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc.
      
      Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
      and Yun Wang.  Hugh Dickins also shared his kbuild-swap case.
      
      This patch (of 19):
      
      lru_add_page_tail() is only used in huge_memory.c, defining it in other
      file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.
      
      Let's move it THP. And make it static as Hugh Dickins suggested.
      
      Link: https://lkml.kernel.org/r/1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com
      Link: https://lkml.kernel.org/r/1604566549-62481-2-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88dcb9a3