1. 27 Aug, 2020 2 commits
    • Thomas Gleixner's avatar
      x86/irq: Unbreak interrupt affinity setting · e027ffff
      Thomas Gleixner authored
      Several people reported that 5.8 broke the interrupt affinity setting
      mechanism.
      
      The consolidation of the entry code reused the regular exception entry code
      for device interrupts and changed the way how the vector number is conveyed
      from ptregs->orig_ax to a function argument.
      
      The low level entry uses the hardware error code slot to push the vector
      number onto the stack which is retrieved from there into a function
      argument and the slot on stack is set to -1.
      
      The reason for setting it to -1 is that the error code slot is at the
      position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
      indicates that the entry came via a syscall. If it's not set to a negative
      value then a signal delivery on return to userspace would try to restart a
      syscall. But there are other places which rely on pt_regs::orig_ax being a
      valid indicator for syscall entry.
      
      But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
      interrupt affinity setting mechanism, which was overlooked when this change
      was made.
      
      Moving interrupts on x86 happens in several steps. A new vector on a
      different CPU is allocated and the relevant interrupt source is
      reprogrammed to that. But that's racy and there might be an interrupt
      already in flight to the old vector. So the old vector is preserved until
      the first interrupt arrives on the new vector and the new target CPU. Once
      that happens the old vector is cleaned up, but this cleanup still depends
      on the vector number being stored in pt_regs::orig_ax, which is now -1.
      
      That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
      always false. As a consequence the interrupt is moved once, but then it
      cannot be moved anymore because the cleanup of the old vector never
      happens.
      
      There would be several ways to convey the vector information to that place
      in the guts of the interrupt handling, but on deeper inspection it turned
      out that this check is pointless and a leftover from the old affinity model
      of X86 which supported multi-CPU affinities. Under this model it was
      possible that an interrupt had an old and a new vector on the same CPU, so
      the vector match was required.
      
      Under the new model the effective affinity of an interrupt is always a
      single CPU from the requested affinity mask. If the affinity mask changes
      then either the interrupt stays on the CPU and on the same vector when that
      CPU is still in the new affinity mask or it is moved to a different CPU, but
      it is never moved to a different vector on the same CPU.
      
      Ergo the cleanup check for the matching vector number is not required and
      can be removed which makes the dependency on pt_regs:orig_ax go away.
      
      The remaining check for new_cpu == smp_processsor_id() is completely
      sufficient. If it matches then the interrupt was successfully migrated and
      the cleanup can proceed.
      
      For paranoia sake add a warning into the vector assignment code to
      validate that the assumption of never moving to a different vector on
      the same CPU holds.
      
      Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs")
      Reported-by: default avatarAlex bykov <alex.bykov@scylladb.com>
      Reported-by: default avatarAvi Kivity <avi@scylladb.com>
      Reported-by: default avatarAlexander Graf <graf@amazon.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarAlexander Graf <graf@amazon.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
      e027ffff
    • Ashok Raj's avatar
      x86/hotplug: Silence APIC only after all interrupts are migrated · 52d6b926
      Ashok Raj authored
      There is a race when taking a CPU offline. Current code looks like this:
      
      native_cpu_disable()
      {
      	...
      	apic_soft_disable();
      	/*
      	 * Any existing set bits for pending interrupt to
      	 * this CPU are preserved and will be sent via IPI
      	 * to another CPU by fixup_irqs().
      	 */
      	cpu_disable_common();
      	{
      		....
      		/*
      		 * Race window happens here. Once local APIC has been
      		 * disabled any new interrupts from the device to
      		 * the old CPU are lost
      		 */
      		fixup_irqs(); // Too late to capture anything in IRR.
      		...
      	}
      }
      
      The fix is to disable the APIC *after* cpu_disable_common().
      
      Testing was done with a USB NIC that provided a source of frequent
      interrupts. A script migrated interrupts to a specific CPU and
      then took that CPU offline.
      
      Fixes: 60dcaad5 ("x86/hotplug: Silence APIC and NMI when CPU is dead")
      Reported-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Tested-by: default avatarEvan Green <evgreen@chromium.org>
      Reviewed-by: default avatarEvan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
      Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
      52d6b926
  2. 26 Aug, 2020 1 commit
  3. 21 Aug, 2020 1 commit
    • Sean Christopherson's avatar
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM · 6a3ea3e6
      Sean Christopherson authored
      KVM has an optmization to avoid expensive MRS read/writes on
      VMENTER/EXIT. It caches the MSR values and restores them either when
      leaving the run loop, on preemption or when going out to user space.
      
      The affected MSRs are not required for kernel context operations. This
      changed with the recently introduced mechanism to handle FSGSBASE in the
      paranoid entry code which has to retrieve the kernel GSBASE value by
      accessing per CPU memory. The mechanism needs to retrieve the CPU number
      and uses either LSL or RDPID if the processor supports it.
      
      Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
      lazily restored MSRs, which means between the point where the guest value
      is written and the point of restore, MSR_TSC_AUX contains a random number.
      
      If an NMI or any other exception which uses the paranoid entry path happens
      in such a context, then RDPID returns the random guest MSR_TSC_AUX value.
      
      As a consequence this reads from the wrong memory location to retrieve the
      kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
      operations. If the GSBASE in the exception handler points to the per CPU
      memory of a different CPU then this has the obvious consequences of data
      corruption and crashes.
      
      As the paranoid entry path is the only place which accesses MSR_TSX_AUX
      (via RDPID) and the fallback via LSL is not significantly slower, remove
      the RDPID alternative from the entry path and always use LSL.
      
      The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
      which would be inflicting massive overhead on that code path.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Reported-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Debugged-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Suggested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
      6a3ea3e6
  4. 15 Aug, 2020 36 commits
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 50f6c7db
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes and small updates all around the place:
      
         - Fix mitigation state sysfs output
      
         - Fix an FPU xstate/sxave code assumption bug triggered by
           Architectural LBR support
      
         - Fix Lightning Mountain SoC TSC frequency enumeration bug
      
         - Fix kexec debug output
      
         - Fix kexec memory range assumption bug
      
         - Fix a boundary condition in the crash kernel code
      
         - Optimize porgatory.ro generation a bit
      
         - Enable ACRN guests to use X2APIC mode
      
         - Reduce a __text_poke() IRQs-off critical section for the benefit of
           PREEMPT_RT"
      
      * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/alternatives: Acquire pte lock with interrupts enabled
        x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
        x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
        x86/purgatory: Don't generate debug info for purgatory.ro
        x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
        kexec_file: Correctly output debugging information for the PT_LOAD ELF header
        kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
        x86/crash: Correct the address boundary of function parameters
        x86/acrn: Remove redundant chars from ACRN signature
        x86/acrn: Allow ACRN guest to use X2APIC mode
      50f6c7db
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1195d58f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes: fix a new tracepoint's output value, and fix the formatting
        of show-state syslog printouts"
      
      * tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/debug: Fix the alignment of the show-state debug output
        sched: Fix use of count for nr_running tracepoint
      1195d58f
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f5faaaa
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc fixes, an expansion of perf syscall access to CAP_PERFMON
        privileged tools, plus a RAPL HW-enablement for Intel SPR platforms"
      
      * tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Add support for Intel SPR platform
        perf/x86/rapl: Support multiple RAPL unit quirks
        perf/x86/rapl: Fix missing psys sysfs attributes
        hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration
        kprobes: Remove show_registers() function prototype
        perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability
      7f5faaaa
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · eb1319af
      Linus Torvalds authored
      Pull locking fixlets from Ingo Molnar:
       "A documentation fix and a 'fallthrough' macro update"
      
      * tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Convert to use the preferred 'fallthrough' macro
        Documentation/locking/locktypes: Fix a typo
      eb1319af
    • Linus Torvalds's avatar
      Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux · 410520d0
      Linus Torvalds authored
      Pull 9p updates from Dominique Martinet:
      
       - some code cleanup
      
       - a couple of static analysis fixes
      
       - setattr: try to pick a fid associated with the file rather than the
         dentry, which might sometimes matter
      
      * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux:
        9p: Remove unneeded cast from memory allocation
        9p: remove unused code in 9p
        net/9p: Fix sparse endian warning in trans_fd.c
        9p: Fix memory leak in v9fs_mount
        9p: retrieve fid from file when file instance exist.
      410520d0
    • Linus Torvalds's avatar
      Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · f6513bd3
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small cifs/smb3 fixes, one for stable fixing mkdir path with
        the 'idsfromsid' mount option"
      
      * tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        SMB3: Fix mkdir when idsfromsid configured on mount
        cifs: Convert to use the fallthrough macro
        cifs: Fix an error pointer dereference in cifs_mount()
      f6513bd3
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 37711e5e
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - pNFS: Don't return layout segments that are being used for I/O
         - pNFS: Don't move layout segments off the active list when being used for I/O
      
        Features:
         - NFS: Add support for user xattrs through the NFSv4.2 protocol
         - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
         - NFSv4.0 allow nconnect for v4.0
      
        Bugfixes and cleanups:
         - nfs: ensure correct writeback errors are returned on close()
         - nfs: nfs_file_write() should check for writeback errors
         - nfs: Fix getxattr kernel panic and memory overflow
         - NFS: Fix the pNFS/flexfiles mirrored read failover code
         - SUNRPC: dont update timeout value on connection reset
         - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
         - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"
      
      * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
        NFS: Fix flexfiles read failover
        fs: nfs: delete repeated words in comments
        rpc_pipefs: convert comma to semicolon
        nfs: Fix getxattr kernel panic and memory overflow
        NFS: Don't return layout segments that are in use
        NFS: Don't move layouts to plh_return_segs list while in use
        NFS: Add layout segment info to pnfs read/write/commit tracepoints
        NFS: Add tracepoints for layouterror and layoutstats.
        NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
        SUNRPC dont update timeout value on connection reset
        nfs: nfs_file_write() should check for writeback errors
        nfs: ensure correct writeback errors are returned on close()
        NFSv4.2: xattr cache: get rid of cache discard work queue
        NFS: remove redundant initialization of variable result
        NFSv4.0 allow nconnect for v4.0
        freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
        sunrpc: destroy rpc_inode_cachep after unregister_filesystem
        NFSv4.2: add client side xattr caching.
        NFSv4.2: hook in the user extended attribute handlers
        NFSv4.2: add the extended attribute proc functions.
        ...
      37711e5e
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 6ffdcde4
      Linus Torvalds authored
      Pull edac fix from Tony Luck:
       "Fix for the ie31200 driver that missed the first pull"
      
      * tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/ie31200: Fallback if host bridge device is already initialized
      6ffdcde4
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · b07175dc
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
       "Another round of 'allOf' removals and whitespace clean-ups of schemas"
      
      * tag 'devicetree-fixes-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: Remove more cases of 'allOf' containing a '$ref'
        dt-bindings: Whitespace clean-ups in schema files
      b07175dc
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 341323fa
      Linus Torvalds authored
      Pull more ACPI updates from Rafael Wysocki:
       "Add new hardware support to the ACPI driver for AMD SoCs, the x86 clk
        driver and the Designware i2c driver (changes from Akshu Agrawal and
        Pu Wen)"
      
      * tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        clk: x86: Support RV architecture
        ACPI: APD: Add a fmw property is_raven
        clk: x86: Change name from ST to FCH
        ACPI: APD: Change name from ST to FCH
        i2c: designware: Add device HID for Hygon I2C controller
      341323fa
    • Linus Torvalds's avatar
      Merge tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1a5d9dbb
      Linus Torvalds authored
      Pull one more power management update from Rafael Wysocki:
       "Modify the intel_pstate driver to allow it to work in the passive mode
        with hardware-managed P-states (HWP) enabled"
      
      * tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Implement passive mode with HWP enabled
      1a5d9dbb
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 884e0d3d
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "Core Frameworks
         - Make better attempt at matching device with the correct OF node
         - Allow batch removal of hierarchical sub-devices
      
        New Drivers
         - Add STM32 Clocksource driver
         - Add support for Khadas System Control Microcontroller
      
        Driver Removal
         - Remove unused driver for TI's SMSC ECE1099
      
        New Device Support
         - Add support for Intel Emmitsburg PCH to Intel LPSS PCI
         - Add support for Intel Tiger Lake PCH-H to Intel LPSS PCI
         - Add support for Dialog DA revision to Dialog DA9063
      
        New Functionality
         - Add support for AXP803 to be probed by I2C
      
        Fix-ups
         - Numerous W=1 warning fixes
         - Device Tree changes (stm32-lptimer, gateworks-gsc, khadas,mcu, stmfx, cros-ec, j721e-system-controller)
         - Enabled Regmap 'fast I/O' in stm32-lptimer
         - Change BUG_ON to WARN_ON in arizona-core
         - Remove superfluous code/initialisation (madera, max14577)
         - Trivial formatting/spelling issues (madera-core, madera-i2c, da9055, max77693-private)
         - Switch to of_platform_populate() in sprd-sc27xx-spi
         - Expand out set/get brightness/pwm macros in lm3533-ctrlbank
         - Disable IRQs on suspend in motorola-cpcap
         - Clean-up error handling in intel_soc_pmic_mrfld
         - Ensure correct removal order of sub-devices in madera
         - Many s/HTTP/HTTPS/ link changes
         - Ensure name used with Regmap is unique in syscon
      
        Bug Fixes
         - Properly 'put' clock on unbind and error in arizona-core
         - Fix revision handling in da9063
         - Fix 'assignment of read-only location' error in kempld-core
         - Avoid using the Regmap API when atomic in rn5t618
         - Redefine volatile register description in rn5t618
         - Use locking to protect event handler in dln2"
      
      * tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (76 commits)
        mfd: syscon: Use a unique name with regmap_config
        mfd: Replace HTTP links with HTTPS ones
        mfd: dln2: Run event handler loop under spinlock
        mfd: madera: Improve handling of regulator unbinding
        mfd: mfd-core: Add mechanism for removal of a subset of children
        mfd: intel_soc_pmic_mrfld: Simplify the return expression of intel_scu_ipc_dev_iowrite8()
        mfd: max14577: Remove redundant initialization of variable current_bits
        mfd: rn5t618: Fix caching of battery related registers
        mfd: max77693-private: Drop a duplicated word
        mfd: da9055: pdata.h: Drop a duplicated word
        mfd: rn5t618: Make restart handler atomic safe
        mfd: kempld-core: Fix 'assignment of read-only location' error
        mfd: axp20x: Allow the AXP803 to be probed by I2C
        mfd: da9063: Add support for latest DA silicon revision
        mfd: da9063: Fix revision handling to correctly select reg tables
        dt-bindings: mfd: st,stmfx: Remove I2C unit name
        dt-bindings: mfd: ti,j721e-system-controller.yaml: Add J721e system controller
        mfd: motorola-cpcap: Disable interrupt for suspend
        mfd: smsc-ece1099: Remove driver
        mfd: core: Add OF_MFD_CELL_REG() helper
        ...
      884e0d3d
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 18737f42
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
       "Subsystems affected by this patch series: mm/hotfixes, lz4, exec,
        mailmap, mm/thp, autofs, sysctl, mm/kmemleak, mm/misc and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
        virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
        ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
        rtl818x: constify ioreadX() iomem argument (as in generic implementation)
        iomap: constify ioreadX() iomem argument (as in generic implementation)
        sh: use generic strncpy()
        sh: clkfwk: remove r8/r16/r32
        include/asm-generic/vmlinux.lds.h: align ro_after_init
        mm: annotate a data race in page_zonenum()
        mm/swap.c: annotate data races for lru_rotate_pvecs
        mm/rmap: annotate a data race at tlb_flush_batched
        mm/mempool: fix a data race in mempool_free()
        mm/list_lru: fix a data race in list_lru_count_one
        mm/memcontrol: fix a data race in scan count
        mm/page_counter: fix various data races at memsw
        mm/swapfile: fix and annotate various data races
        mm/filemap.c: fix a data race in filemap_fault()
        mm/swap_state: mark various intentional data races
        mm/page_io: mark various intentional data races
        mm/frontswap: mark various intentional data races
        mm/kmemleak: silence KCSAN splats in checksum
        ...
      18737f42
    • Krzysztof Kozlowski's avatar
      virtio: pci: constify ioreadX() iomem argument (as in generic implementation) · fe0580ac
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-5-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe0580ac
    • Krzysztof Kozlowski's avatar
      ntb: intel: constify ioreadX() iomem argument (as in generic implementation) · 58184e95
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-4-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58184e95
    • Krzysztof Kozlowski's avatar
      rtl818x: constify ioreadX() iomem argument (as in generic implementation) · 5ca6ad7d
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-3-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ca6ad7d
    • Krzysztof Kozlowski's avatar
      iomap: constify ioreadX() iomem argument (as in generic implementation) · 8f28ca6b
      Krzysztof Kozlowski authored
      Patch series "iomap: Constify ioreadX() iomem argument", v3.
      
      The ioread8/16/32() and others have inconsistent interface among the
      architectures: some taking address as const, some not.
      
      It seems there is nothing really stopping all of them to take pointer to
      const.
      
      This patch (of 4):
      
      The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
      some architectures void *__iomem address argument is a pointer to const,
      on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      
      [krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
        Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
      [akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
        Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.comSuggested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
      Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f28ca6b
    • Kuninori Morimoto's avatar
      sh: use generic strncpy() · f9e7ff9c
      Kuninori Morimoto authored
      Current SH will get below warning at strncpy()
      
      In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                       from ${LINUX}/include/linux/string.h:20,
                       from ${LINUX}/include/linux/bitmap.h:9,
                       from ${LINUX}/include/linux/nodemask.h:95,
                       from ${LINUX}/include/linux/mmzone.h:17,
                       from ${LINUX}/include/linux/gfp.h:6,
                       from ${LINUX}/innclude/linux/slab.h:15,
                       from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
      ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
      ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
        80 is above array bounds of 'char[26]' [-Warray-bounds]
         : "0" (__dest), "1" (__src), "r" (__src+__n)
                                           ~~~~~^~~~
      
      In general, strncpy() should behave like below.
      
      	char dest[10];
      	char *src = "12345";
      
      	strncpy(dest, src, 10);
      	// dest = {'1', '2', '3', '4', '5',
      	           '\0','\0','\0','\0','\0'}
      
      But, current SH strnpy() has 2 issues.
      1st is it will access to out-of-memory (= src + 10).
      2nd is it needs big fixup for it, and maintenance __asm__
      code is difficult.
      
      To solve these issues, this patch simply uses generic strncpy()
      instead of architecture specific one.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9e7ff9c
    • Kuninori Morimoto's avatar
      sh: clkfwk: remove r8/r16/r32 · a8e3943b
      Kuninori Morimoto authored
      SH will get below warning
      
      ${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
      ${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
       discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        return ioread8(addr);
                       ^~~~
      In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                       from ${LINUX}/include/linux/io.h:13,
                       from ${LINUX}/drivers/sh/clk/cpg.c:14:
      ${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
      argument is of type 'const void *'
       extern unsigned int ioread8(void __iomem *);
                                   ^~~~~~~~~~~~~~
      
      We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
      themselvs.  This patch cleanup these.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8e3943b
    • Romain Naour's avatar
      include/asm-generic/vmlinux.lds.h: align ro_after_init · 7f897acb
      Romain Naour authored
      Since the patch [1], building the kernel using a toolchain built with
      binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
      provided by Alan Modra [2] that fix alignment of rodata.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
      [2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.htmlSigned-off-by: default avatarRomain Naour <romain.naour@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Link: https://marc.info/?l=linux-sh&m=158429470221261Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f897acb
    • Qian Cai's avatar
      mm: annotate a data race in page_zonenum() · c403f6a3
      Qian Cai authored
       BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
      
       write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
        page_cpupid_xchg_last+0x51/0x80
        page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
        wp_page_reuse+0x3e/0xc0
        wp_page_reuse at mm/memory.c:2453
        do_wp_page+0x472/0x7b0
        do_wp_page at mm/memory.c:2798
        __handle_mm_fault+0xcb0/0xd00
        handle_pte_fault at mm/memory.c:4049
        (inlined by) __handle_mm_fault at mm/memory.c:4163
        handle_mm_fault+0xfc/0x2f0
        handle_mm_fault at mm/memory.c:4200
        do_page_fault+0x263/0x6f9
        do_user_addr_fault at arch/x86/mm/fault.c:1465
        (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
        page_fault+0x34/0x40
      
       read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
        put_page+0x15a/0x1f0
        page_zonenum at include/linux/mm.h:923
        (inlined by) is_zone_device_page at include/linux/mm.h:929
        (inlined by) page_is_devmap_managed at include/linux/mm.h:948
        (inlined by) put_page at include/linux/mm.h:1023
        wp_page_copy+0x571/0x930
        wp_page_copy at mm/memory.c:2615
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xcb0/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A page never changes its zone number. The zone number happens to be
      stored in the same word as other bits which are modified, but the zone
      number bits will never be modified by any other write, so it can accept
      a reload of the zone bits after an intervening write and it don't need
      to use READ_ONCE(). Thus, annotate this data race using
      ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
      writes to it.
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c403f6a3
    • Qian Cai's avatar
      mm/swap.c: annotate data races for lru_rotate_pvecs · 7e0cc01e
      Qian Cai authored
      Read to lru_add_pvec->nr could be interrupted and then write to the same
      variable.  The write has local interrupt disabled, but the plain reads
      result in data races.  However, it is unlikely the compilers could do much
      damage here given that lru_add_pvec->nr is a "unsigned char" and there is
      an existing compiler barrier.  Thus, annotate the reads using the
      data_race() macro.  The data races were reported by KCSAN,
      
       BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page
      
       write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
        rotate_reclaimable_page+0x2df/0x490
        pagevec_add at include/linux/pagevec.h:81
        (inlined by) rotate_reclaimable_page at mm/swap.c:259
        end_page_writeback+0x1b5/0x2b0
        end_swap_bio_write+0x1d0/0x280
        bio_endio+0x297/0x560
        dec_pending+0x218/0x430 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x297/0x560
        blk_update_request+0x201/0x920
        scsi_end_request+0x6b/0x4a0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x1ed/0x2a0
        scsi_softirq_done+0x1c9/0x1d0
        blk_done_softirq+0x181/0x1d0
        __do_softirq+0xd9/0x57c
        irq_exit+0xa2/0xc0
        do_IRQ+0x8b/0x190
        ret_from_intr+0x0/0x42
        delay_tsc+0x46/0x80
        __const_udelay+0x3c/0x40
        __udelay+0x10/0x20
        kcsan_setup_watchpoint+0x202/0x3a0
        __tsan_read1+0xc2/0x100
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain_cpu at mm/swap.c:602
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       2 locks held by oom02/37761:
        #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
        #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
       irq event stamp: 1949217
       trace_hardirqs_on_thunk+0x1a/0x1c
       __do_softirq+0x2e7/0x57c
       __do_softirq+0x34c/0x57c
       irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
       Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e0cc01e
    • Qian Cai's avatar
      mm/rmap: annotate a data race at tlb_flush_batched · 9c1177b6
      Qian Cai authored
      mm->tlb_flush_batched could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one
      
       write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
        try_to_unmap_one+0x59a/0x1ab0
        set_tlb_ubc_flush_pending at mm/rmap.c:635
        (inlined by) try_to_unmap_one at mm/rmap.c:1538
        rmap_walk_anon+0x296/0x650
        rmap_walk+0xdf/0x100
        try_to_unmap+0x18a/0x2f0
        shrink_page_list+0xef6/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
        flush_tlb_batched_pending+0x29/0x90
        flush_tlb_batched_pending at mm/rmap.c:682
        change_p4d_range+0x5dd/0x1030
        change_pte_range at mm/mprotect.c:44
        (inlined by) change_pmd_range at mm/mprotect.c:212
        (inlined by) change_pud_range at mm/mprotect.c:240
        (inlined by) change_p4d_range at mm/mprotect.c:260
        change_protection+0x222/0x310
        change_prot_numa+0x3e/0x60
        task_numa_work+0x219/0x350
        task_work_run+0xed/0x140
        prepare_exit_to_usermode+0x2cc/0x2e0
        ret_from_intr+0x32/0x42
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      flush_tlb_batched_pending() is under PTL but the write is not, but
      mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
      shattered.  Thus, mark it as an intentional data race by using the data
      race macro.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c1177b6
    • Qian Cai's avatar
      mm/mempool: fix a data race in mempool_free() · abe1de42
      Qian Cai authored
      mempool_t pool.curr_nr could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in mempool_free / remove_element
      
       write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
        remove_element+0x4a/0x1c0
        remove_element at mm/mempool.c:132
        mempool_alloc+0x102/0x210
        (inlined by) mempool_alloc at mm/mempool.c:399
        bio_alloc_bioset+0x106/0x2c0
        get_swap_bio+0x49/0x230
        __swap_writepage+0x680/0xc30
        swap_writepage+0x9c/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        <snip>
      
       read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
        mempool_free+0x3e/0x150
        mempool_free at mm/mempool.c:492
        bio_free+0x192/0x280
        bio_put+0x91/0xd0
        end_swap_bio_write+0x1d8/0x280
        bio_endio+0x2c2/0x5b0
        dec_pending+0x22b/0x440 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x2c2/0x5b0
        blk_update_request+0x217/0x940
        scsi_end_request+0x6b/0x4d0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x223/0x310
        scsi_softirq_done+0x1d5/0x210
        blk_mq_complete_request+0x224/0x250
        scsi_mq_done+0xc2/0x250
        pqi_raid_io_complete+0x5a/0x70 [smartpqi]
        pqi_irq_handler+0x150/0x1410 [smartpqi]
        __handle_irq_event_percpu+0x90/0x540
        handle_irq_event_percpu+0x49/0xd0
        handle_irq_event+0x85/0xca
        handle_edge_irq+0x13f/0x3e0
        do_IRQ+0x86/0x190
        <snip>
      
      Since the write is under pool->lock but the read is done as lockless.
      Even though the commit 5b990546 ("mempool: fix and document
      synchronization and memory barrier usage") introduced the smp_wmb() and
      smp_rmb() pair to improve the situation, it is adequate to protect it
      from data races which could lead to a logic bug, so fix it by adding
      READ_ONCE() for the read.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abe1de42
    • Qian Cai's avatar
      mm/list_lru: fix a data race in list_lru_count_one · a1f45935
      Qian Cai authored
      struct list_lru_one l.nr_items could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move
      
       write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
        list_lru_isolate_move+0xf9/0x130
        list_lru_isolate_move at mm/list_lru.c:180
        inode_lru_isolate+0x12b/0x2a0
        __list_lru_walk_one+0x122/0x3d0
        list_lru_walk_one+0x75/0xa0
        prune_icache_sb+0x8b/0xc0
        super_cache_scan+0x1b8/0x250
        do_shrink_slab+0x256/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
        list_lru_count_one+0x116/0x2f0
        list_lru_count_one at mm/list_lru.c:193
        super_cache_count+0xe8/0x170
        do_shrink_slab+0x95/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A shattered l.nr_items could affect the shrinker behaviour due to a data
      race. Fix it by adding READ_ONCE() for the read. Since the writes are
      aligned and up to word-size, assume those are safe from data races to
      avoid readability issues of writing WRITE_ONCE(var, var + val).
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1f45935
    • Qian Cai's avatar
      mm/memcontrol: fix a data race in scan count · e0e3f42f
      Qian Cai authored
      struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
      accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size
      
       write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
        mem_cgroup_update_lru_size+0x11c/0x1d0
        mem_cgroup_update_lru_size at mm/memcontrol.c:1266
        isolate_lru_pages+0x6a9/0xf30
        shrink_active_list+0x123/0xcc0
        shrink_lruvec+0x8fd/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
        lruvec_lru_size+0xbb/0x270
        mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
        (inlined by) lruvec_lru_size at mm/vmscan.c:326
        shrink_lruvec+0x1d0/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_current+0xa6/0x120
        alloc_slab_page+0x3b1/0x540
        allocate_slab+0x70/0x660
        new_slab+0x46/0x70
        ___slab_alloc+0x4ad/0x7d0
        __slab_alloc+0x43/0x70
        kmem_cache_alloc+0x2c3/0x420
        getname_flags+0x4c/0x230
        getname+0x22/0x30
        do_sys_openat2+0x205/0x3b0
        do_sys_open+0x9a/0xf0
        __x64_sys_openat+0x62/0x80
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The write is under lru_lock, but the read is done as lockless.  The scan
      count is used to determine how aggressively the anon and file LRU lists
      should be scanned.  Load tearing could generate an inefficient heuristic,
      so fix it by adding READ_ONCE() for the read.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0e3f42f
    • Qian Cai's avatar
      mm/page_counter: fix various data races at memsw · 6e4bd50f
      Qian Cai authored
      Commit 3e32cb2e ("mm: memcontrol: lockless page counters") could had
      memcg->memsw->watermark and memcg->memsw->failcnt been accessed
      concurrently as reported by KCSAN,
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
        page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x58/0x140
        __memcg_kmem_charge+0xcc/0x280
        __alloc_pages_nodemask+0x1e1/0x450
        alloc_pages_current+0xa6/0x120
        pte_alloc_one+0x17/0xd0
        __pte_alloc+0x3a/0x1f0
        copy_p4d_range+0xc36/0x1990
        copy_page_range+0x21d/0x360
        dup_mmap+0x5f5/0x7a0
        dup_mm+0xa2/0x240
        copy_process+0x1b3f/0x3460
        _do_fork+0xaa/0xa20
        __x64_sys_clone+0x13b/0x170
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
        page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        mem_cgroup_try_charge+0x159/0x460
        mem_cgroup_try_charge_delay+0x3d/0xa0
        wp_page_copy+0x14d/0x930
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xce6/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
        page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
       read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
        page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
      Since watermark could be compared or set to garbage due to a data race
      which would change the code logic, fix it by adding a pair of READ_ONCE()
      and WRITE_ONCE() in those places.
      
      The "failcnt" counter is tolerant of some degree of inaccuracy and is only
      used to report stats, a data race will not be harmful, thus mark it as an
      intentional data race using the data_race() macro.
      
      Fixes: 3e32cb2e ("mm: memcontrol: lockless page counters")
      Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e4bd50f
    • Qian Cai's avatar
      mm/swapfile: fix and annotate various data races · a449bf58
      Qian Cai authored
      swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
      be accessed concurrently separately as noticed by KCSAN,
      
      === si.highest_bit ===
      
       write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
        swap_range_alloc+0x81/0x130
        swap_range_alloc at mm/swapfile.c:681
        scan_swap_map_slots+0x371/0xb90
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
        scan_swap_map_slots+0x4a6/0xb90
        scan_swap_map_slots at mm/swapfile.c:892
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      === si.swap_map[offset] ===
      
       write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
        __swap_entry_free_locked+0x8c/0x100
        __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
        __swap_entry_free.constprop.20+0x69/0xb0
        free_swap_and_cache+0x53/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
       read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
        _swap_info_get+0x81/0xa0
        _swap_info_get at mm/swapfile.c:1140
        free_swap_and_cache+0x40/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
      === si.flags ===
      
       write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
        _swap_info_get+0x41/0xa0
        __swap_info_get at mm/swapfile.c:1114
        put_swap_page+0x84/0x490
        __remove_mapping+0x384/0x5f0
        shrink_page_list+0xff1/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
      The writes are under si->lock but the reads are not. For si.highest_bit
      and si.swap_map[offset], data race could trigger logic bugs, so fix them
      by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
      except those isolated reads where they compare against zero which a data
      race would cause no harm. Thus, annotate them as intentional data races
      using the data_race() macro.
      
      For si.flags, the readers are only interested in a single bit where a
      data race there would cause no issue there.
      
      [cai@lca.pw: add a missing annotation for si->flags in memory.c]
        Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a449bf58
    • Kirill A. Shutemov's avatar
      mm/filemap.c: fix a data race in filemap_fault() · e630bfac
      Kirill A. Shutemov authored
      struct file_ra_state ra.mmap_miss could be accessed concurrently during
      page faults as noticed by KCSAN,
      
       BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
      
       write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
        filemap_fault+0x920/0xfc0
        do_sync_mmap_readahead at mm/filemap.c:2384
        (inlined by) filemap_fault at mm/filemap.c:2486
        __xfs_filemap_fault+0x112/0x3e0 [xfs]
        xfs_filemap_fault+0x74/0x90 [xfs]
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
        filemap_map_pages+0xc2e/0xd80
        filemap_map_pages at mm/filemap.c:2625
        do_fault+0x3da/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      ra.mmap_miss is used to contribute the readahead decisions, a data race
      could be undesirable.  Both the read and write is only under non-exclusive
      mmap_sem, two concurrent writers could even underflow the counter.  Fix
      the underflow by writing to a local variable before committing a final
      store to ra.mmap_miss given a small inaccuracy of the counter should be
      acceptable.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e630bfac
    • Qian Cai's avatar
      mm/swap_state: mark various intentional data races · b96a3db2
      Qian Cai authored
      swap_cache_info.* could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache
      
       write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
        lookup_swap_cache+0x12e/0x460
        lookup_swap_cache at mm/swap_state.c:322
        do_swap_page+0x112/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
        lookup_swap_cache+0x117/0x460
        lookup_swap_cache at mm/swap_state.c:322
        shmem_swapin_page+0xc7/0x9e0
        shmem_getpage_gfp+0x2ca/0x16c0
        shmem_fault+0xef/0x3c0
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
       write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
         __delete_from_swap_cache+0x681/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
       read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
         __delete_from_swap_cache+0x66e/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
      Both the read and write are done as lockless. Since swap_cache_info.*
      are only used to print out counter information, even if any of them
      missed a few incremental due to data races, it will be harmless, so just
      mark it as an intentional data race using the data_race() macro.
      
      While at it, fix a checkpatch.pl warning,
      
      WARNING: Single statement macros should not use a do {} while (0) loop
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b96a3db2
    • Qian Cai's avatar
      mm/page_io: mark various intentional data races · 7b37e226
      Qian Cai authored
      struct swap_info_struct si.flags could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage
      
       write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1740/0x2820
        shrink_inactive_list+0x316/0x8b0
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
        swap_readpage+0x204/0x6a0
        swap_readpage at mm/page_io.c:380
        read_swap_cache_async+0xa2/0xb0
        swapin_readahead+0x6a0/0x890
        do_swap_page+0x465/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      Other reads,
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
        __swap_writepage+0x140/0xc20
        __swap_writepage at mm/page_io.c:289
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
        swap_set_page_dirty+0x44/0x1f4
        swap_set_page_dirty at mm/page_io.c:442
      
      The write is under &si->lock, but the reads are done as lockless.  Since
      the reads only check for a specific bit in the flag, it is harmless even
      if load tearing happens.  Thus, just mark them as intentional data races
      using the data_race() macro.
      
      [cai@lca.pw: add a missing annotation]
        Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b37e226
    • Qian Cai's avatar
      mm/frontswap: mark various intentional data races · 96bdd2bc
      Qian Cai authored
      There are a few information counters that are intentionally not protected
      against increment races, so just annotate them using the data_race()
      macro.
      
       BUG: KCSAN: data-race in __frontswap_store / __frontswap_store
      
       write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
        __frontswap_store+0x2d0/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
        __frontswap_store+0x2b9/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96bdd2bc
    • Qian Cai's avatar
      mm/kmemleak: silence KCSAN splats in checksum · 69d0b54d
      Qian Cai authored
      Even if KCSAN is disabled for kmemleak, update_checksum() could still call
      crc32() (which is outside of kmemleak.c) to dereference object->pointer.
      Thus, the value of object->pointer could be accessed concurrently as
      noticed by KCSAN,
      
       BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock
      
       write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
        do_raw_spin_lock+0x114/0x200
        debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
        (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
        _raw_spin_lock+0x40/0x50
        __handle_mm_fault+0xa9e/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
        crc32_le_base+0x67/0x350
        crc32_le_base+0x67/0x350:
        crc32_body at lib/crc32.c:106
        (inlined by) crc32_le_generic at lib/crc32.c:179
        (inlined by) crc32_le at lib/crc32.c:197
        kmemleak_scan+0x528/0xd90
        update_checksum at mm/kmemleak.c:1172
        (inlined by) kmemleak_scan at mm/kmemleak.c:1497
        kmemleak_scan_thread+0xcc/0xfa
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
      If a shattered value was returned due to a data race, it will be corrected
      in the next scan.  Thus, let KCSAN ignore all reads in the region to
      silence KCSAN in case the write side is non-atomic.
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69d0b54d
    • Xiaoming Ni's avatar
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni authored
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.comSigned-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
    • Randy Dunlap's avatar
    • Matthew Wilcox (Oracle)'s avatar
      mm: introduce offset_in_thp · ee6c400f
      Matthew Wilcox (Oracle) authored
      Mirroring offset_in_page(), this gives you the offset within this
      particular page, no matter what size page it is.  It optimises down to
      offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-8-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee6c400f