- 08 Feb, 2021 2 commits
-
-
Rafael J. Wysocki authored
If the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale-invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver, the scale-invariant utilization falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, which causes the schedutil governor to select a frequency below cpuinfo.max_freq. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies which prevents "boost" frequencies from being used in some workloads. While this issue is related to scale-invariance, it may be amplified by commit db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle which made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. If CPPC is available, it can be used to address this issue by extending the frequency tables created by acpi_cpufreq to cover the entire available frequency range (including "boost" frequencies) for each CPU, but if CPPC is not there, acpi_cpufreq has no idea what the maximum "boost" frequency is and the frequency tables created by it cannot be extended in a meaningful way, so in that case make it ask the arch scale-invariance code to to use the "nominal" performance level for CPU utilization scaling in order to avoid the issue at hand. Fixes: db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
-
Rafael J. Wysocki authored
A severe performance regression on AMD EPYC processors when using the schedutil scaling governor was discovered by Phoronix.com and attributed to the following commits: 41ea6672 ("x86, sched: Calculate frequency invariance for AMD systems") 976df7e5 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") The source of the problem is that the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale- invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver. This effectively causes the scale-invariant utilization to fall below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so the schedutil governor selects a frequency below cpuinfo.max_freq then. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies. However, if the cpuinfo.max_freq value coming from acpi_cpufreq was higher, the schedutil governor would select higher frequencies which in turn would allow acpi_cpufreq to set more adequate performance levels and to get to the "boost" range of CPU frequencies more often. This issue affects any systems where acpi_cpufreq is used and the "boost" (or "turbo") frequencies are enabled, not just AMD EPYC. Moreover, commit db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. To address this issue, extend the frequency table constructed by acpi_cpufreq for each CPU to cover the entire range of available frequencies (including the "boost" ones) if CPPC is available and indicates that "boost" (or "turbo") frequencies are enabled. That causes cpuinfo.max_freq to become the maximum "boost" frequency of the given CPU (instead of the maximum frequency returned by the ACPI _PSS object that corresponds to the "nominal" performance level). Fixes: 41ea6672 ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 976df7e5 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") Fixes: db865272 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/Reported-by: Michael Larabel <Michael@phoronix.com> Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Tested-by: Michael Larabel <Michael@phoronix.com>
-
- 07 Feb, 2021 9 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm fixes from Dan Williams: "A fix for a crash scenario that has been present since the initial merge, a minor regression in sysfs attribute visibility, and a fix for some flexible array warnings. The bulk of this pull is an update to the libnvdimm unit test infrastructure to test non-ACPI platforms. Given there is zero regression risk for test updates, and the tests enable validation of bits headed towards the next merge window, I saw no reason to hold the new tests back. Santosh originally submitted this before the v5.11 window opened. Summary: - Fix a crash when sysfs accesses race 'dimm' driver probe/remove. - Fix a regression in 'resource' attribute visibility necessary for mapping badblocks and other physical address interrogations. - Fix some flexible array warnings - Expand the unit test infrastructure for non-ACPI platforms" * tag 'libnvdimm-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/dimm: Avoid race between probe and available_slots_show() ndtest: Add papr health related flags ndtest: Add nvdimm control functions ndtest: Add regions and mappings to the test buses ndtest: Add dimm attributes ndtest: Add dimms to the two buses ndtest: Add compatability string to treat it as PAPR family testing/nvdimm: Add test module for non-nfit platforms libnvdimm/namespace: Fix visibility of namespace resource attribute libnvdimm/pmem: Remove unused header ACPI: NFIT: Fix flexible_array.cocci warnings
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: "Fix a 32 vs 64-bit padding issue in the new benchmark code (Barry Song)" * tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: benchmark: use u8 for reserved field in uAPI structure
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Prevent device managed IRQ allocation helpers from returning IRQ 0 - A fix for MSI activation of PCI endpoints with multiple MSIs * tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent [devm_]irq_alloc_desc from returning irq 0 genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull syscall entry fixes from Borislav Petkov: - For syscall user dispatch, separate prctl operation from syscall redirection range specification before the API has been made official in 5.11. - Ensure tasks using the generic syscall code do trap after returning from a syscall when single-stepping is requested. * tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Use different define for selector variable in SUD entry: Ensure trap after single-step on system call return
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Borislav Petkov: "Revert an attempt to not spread IRQ threads on isolated CPUs which has a bunch of problems" * tag 'sched_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fixes from Borislav Petkov: "Two more timers-related fixes for v5.11: - Use a freezable workqueue for RTC sync because the sync can happen at any time and trigger suspend assertion checks in the i2c subsystem. - Correct a previous RTC validation change to check only bit 6 in register D because some Intel machines use bits 0-5" * tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ntp: Use freezable workqueue for RTC synchronization rtc: mc146818: Dont test for bit 0-5 in Register D
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: "I hope this is the last batch of x86/urgent updates for this round: - Remove superfluous EFI PGD range checks which lead to those assertions failing with certain kernel configs and LLVM. - Disable setting breakpoints on facilities involved in #DB exception handling to avoid infinite loops. - Add extra serialization to non-serializing MSRs (IA32_TSC_DEADLINE and x2 APIC MSRs) to adhere to SDM's recommendation and avoid any theoretical issues. - Re-add the EPB MSR reading on turbostat so that it works on older kernels which don't have the corresponding EPB sysfs file. - Add Alder Lake to the list of CPUs which support split lock. - Fix %dr6 register handling in order to be able to set watchpoints with gdb again. - Disable CET instrumentation in the kernel so that gcc doesn't add ENDBR64 to kernel code and thus confuse tracing" * tag 'x86_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Remove EFI PGD build time checks x86/debug: Prevent data breakpoints on cpu_dr7 x86/debug: Prevent data breakpoints on __per_cpu_offset x86/apic: Add extra serialization for non-serializing MSRs tools/power/turbostat: Fallback to an MSR read for EPB x86/split_lock: Enable the split lock feature on another Alder Lake CPU x86/debug: Fix DR6 handling x86/build: Disable CET instrumentation in the kernel
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Use the 'python3' command to invoke python scripts because some distributions do not provide the 'python' command any more. - Clean-up and update documents - Use pkg-config to search libcrypto - Fix duplicated debug flags - Ignore some more stubs in scripts/kallsyms.c * tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kallsyms: fix nonconverging kallsyms table with lld kbuild: fix duplicated flags in DEBUG_CFLAGS scripts/clang-tools: switch explicitly to Python 3 kbuild: remove PYTHON variable Documentation/llvm: Add a section about supported architectures Revert "checkpatch: add check for keyword 'boolean' in Kconfig definitions" scripts: use pkg-config to locate libcrypto kconfig: mconf: fix HOSTCC call doc: gcc-plugins: update gcc-plugins.rst kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc Documentation/Kbuild: Remove references to gcc-plugin.sh scripts: switch explicitly to Python 3
-
- 06 Feb, 2021 10 commits
-
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Three small smb3 fixes for stable" * tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: report error instead of invalid when revalidating a dentry fails smb3: fix crediting for compounding when only one request in flight smb3: Fix out-of-bounds bug in SMB2_negotiate()
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fixes from Palmer Dabbelt: "A handful of fixes for this week: - A fix to avoid evalating the VA twice in virt_addr_valid, which fixes some WARNs under DEBUG_VIRTUAL. - Two fixes related to STRICT_KERNEL_RWX: one that fixes some permissions when strict is disabled, and one to fix some alignment issues when strict is enabled. - A fix to disallow the selection of MAXPHYSMEM_2GB on RV32, which isn't valid any more but may still show up in some oldconfigs. We still have the HiFive Unleashed ethernet phy reset regression, so there will likely be something coming next week" * tag 'riscv-for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Define MAXPHYSMEM_1GB only for RV32 riscv: Align on L1_CACHE_BYTES when STRICT_KERNEL_RWX RISC-V: Fix .init section permission update riscv: virt_addr_valid must check the address belongs to linear mapping
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - A fix for a change we made to __kernel_sigtramp_rt64() which confused glibc's backtrace logic, and also changed the semantics of that symbol, which was arguably an ABI break. - A fix for a stack overwrite in our VSX instruction emulation. - A couple of fixes for the Makefile logic in the new C VDSO. Thanks to Masahiro Yamada, Naveen N. Rao, Raoni Fassina Firmino, and Ravi Bangoria. * tag 'powerpc-5.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics powerpc/vdso64: remove meaningless vgettimeofday.o build rule powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o powerpc/sstep: Fix array out of bound warning
-
git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds authored
Pull ARM fixes from Russell King: - Fix latent bug with DC21285 (Footbridge PCI bridge) configuration accessors that affects GCC >= 4.9.2 - Fix misplaced tegra_uart_config in decompressor - Ensure signal page contents are initialised - Fix kexec oops * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: kexec: fix oops after TLB are invalidated ARM: ensure the signal page contains defined contents ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor ARM: footbridge: fix dc21285 PCI configuration accessors
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are some small, last-minute, USB driver fixes for 5.11-rc7 They all resolve issues reported, or are a few new device ids for some drivers. They include: - new device ids for some usb-serial drivers - xhci fixes for a variety of reported problems - dwc3 driver bugfixes - dwc2 driver bugfixes - usblp driver bugfix - thunderbolt bugfix - few other tiny fixes All have been in linux-next with no reported issues" * tag 'usb-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: dwc2: Fix endpoint direction check in ep_from_windex usb: dwc3: fix clock issue during resume in OTG mode xhci: fix bounce buffer usage for non-sg list case usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720 usb: xhci-mtk: break loop when find the endpoint to drop usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop() USB: gadget: legacy: fix an error code in eth_bind() thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link() USB: serial: option: Adding support for Cinterion MV31 usb: xhci-mtk: fix unreleased bandwidth data usb: gadget: aspeed: add missing of_node_put USB: usblp: don't call usb_set_interface if there's a single alt USB: serial: cp210x: add pid/vid for WSDA-200-USB USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: "Nothing terribly interesting, just a few fixups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - sync supported devices with fork on GitHub Input: ariel-pwrbutton - remove unused variable ariel_pwrbutton_id_table Input: goodix - add support for Goodix GT9286 chip dt-bindings: input: touchscreen: goodix: Add binding for GT9286 IC dt-bindings: input: adc-keys: clarify description Input: ili210x - implement pressure reporting for ILI251x Input: i8042 - unbreak Pegatron C15B Input: st1232 - wait until device is ready before reading resolution Input: st1232 - do not read more bytes than needed Input: st1232 - fix off-by-one error in resolution handling
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fix from James Bottomley: "One fix in drivers (lpfc) that stops an oops on resource exhaustion" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Fix EEH encountering oops with NVMe traffic
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A few small regression fixes: - NVMe pull request from Christoph: - more quirks for buggy devices (Thorsten Leemhuis, Claus Stovgaard) - update the email address for Keith (Keith Busch) - fix an out of bounds access in nvmet-tcp (Sagi Grimberg) - Regression fix for BFQ shallow depth calculations introduced in this merge window (Lin)" * tag 'block-5.11-2021-02-05' of git://git.kernel.dk/linux-block: nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs bfq-iosched: Revert "bfq: Fix computation of shallow depth" update the email address for Keith Bush nvme-pci: ignore the subsysem NQN on Phison E16 nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "Two small fixes that should go into 5.11: - task_work resource drop fix (Pavel) - identity COW fix (Xiaoguang)" * tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-block: io_uring: drop mm/files between task_work_submit io_uring: don't modify identity's files uncess identity is cowed
-
Borislav Petkov authored
With CONFIG_X86_5LEVEL, CONFIG_UBSAN and CONFIG_UBSAN_UNSIGNED_OVERFLOW enabled, clang fails the build with x86_64-linux-ld: arch/x86/platform/efi/efi_64.o: in function `efi_sync_low_kernel_mappings': efi_64.c:(.text+0x22c): undefined reference to `__compiletime_assert_354' which happens due to -fsanitize=unsigned-integer-overflow being enabled: -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation (see -fsanitize=implicit-conversion). and that fires when the (intentional) EFI_VA_START/END defines overflow an unsigned long, leading to the assertion expressions not getting optimized away (on GCC they do)... However, those checks are superfluous: the runtime services mapping code already makes sure the ranges don't overshoot EFI_VA_END as the EFI mapping range is hardcoded. On each runtime services call, it is switched to the EFI-specific PGD and even if mappings manage to escape that last PGD, this won't remain unnoticed for long. So rip them out. See https://github.com/ClangBuiltLinux/linux/issues/256 for more info. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: http://lkml.kernel.org/r/20210107223424.4135538-1-arnd@kernel.org
-
- 05 Feb, 2021 19 commits
-
-
Gabriel Krisman Bertazi authored
Michael Kerrisk suggested that, from an API perspective, it is a bad idea to share the PR_SYS_DISPATCH_ defines between the prctl operation and the selector variable. Therefore, define two new constants to be used by SUD's selector variable and update the corresponding documentation and test cases. While this changes the API syscall user dispatch has never been part of a Linux release, it will show up for the first time in 5.11. Suggested-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com
-
Gabriel Krisman Bertazi authored
Commit 29915524 ("entry: Drop usage of TIF flags in the generic syscall code") introduced a bug on architectures using the generic syscall entry code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall return after receiving a TIF_SINGLESTEP. The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to cause the trap after a system call is executed, but since the above commit, the syscall call handler only checks for the SYSCALL_WORK flags on the exit work. Split the meaning of TIF_SINGLESTEP such that it only means single-step mode, and create a new type of SYSCALL_WORK to request a trap immediately after a syscall in single-step mode. In the current implementation, the SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity. Update x86 to flip this bit when a tracer enables single stepping. Fixes: 29915524 ("entry: Drop usage of TIF flags in the generic syscall code") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com
-
Thomas Gleixner authored
This reverts commit 1abdfe70. This change is broken and not solving any problem it claims to solve. Robin reported that cpumask_local_spread() now returns any cpu out of cpu_possible_mask in case that NOHZ_FULL is disabled (runtime or compile time). It can also return any offline or not-present CPU in the housekeeping mask. Before that it was returning a CPU out of online_cpu_mask. While the function is racy against CPU hotplug if the caller does not protect against it, the actual use cases are not caring much about it as they use it mostly as hint for: - the user space affinity hint which is unused by the kernel - memory node selection which is just suboptimal - network queue affinity which might fail but is handled gracefully But the occasional fail vs. hotplug is very different from returning anything from possible_cpu_mask which can have a large amount of offline CPUs obviously. The changelog of the commit claims: "The current implementation of cpumask_local_spread() does not respect the isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task, it will return it to the caller for pinning of its IRQ threads. Having these unwanted IRQ threads on an isolated CPU adds up to a latency overhead." The only correct part of this changelog is: "The current implementation of cpumask_local_spread() does not respect the isolated CPUs." Everything else is just disjunct from reality. Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: abelits@marvell.com Cc: davem@davemloft.net Link: https://lore.kernel.org/r/87y2g26tnt.fsf@nanos.tec.linutronix.de
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (hugetlb, compaction, vmalloc, shmem, memblock, pagecache, kasan, and hugetlb), mailmap, gcov, ubsan, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS/.mailmap: use my @kernel.org address mm: hugetlb: fix missing put_page in gather_surplus_pages() ubsan: implement __ubsan_handle_alignment_assumption kasan: make addr_has_metadata() return true for valid addresses kasan: add explicit preconditions to kasan_report() mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() mailmap: add entries for Manivannan Sadhasivam mailmap: fix name/email for Viresh Kumar memblock: do not start bottom-up allocations with kernel_end mm: thp: fix MADV_REMOVE deadlock on shmem THP init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov mm/vmalloc: separate put pages and flush VM flags mm, compaction: move high_pfn to the for loop scope mm: migrate: do not migrate HugeTLB page whose refcount is one mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active mm: hugetlb: fix a race between isolating and freeing page mm: hugetlb: fix a race between freeing and dissolving the page mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
-
Hans de Goede authored
Since commit a85a6c86 ("driver core: platform: Clarify that IRQ 0 is invalid"), having a linux-irq with number 0 will trigger a WARN() when calling platform_get_irq*() to retrieve that linux-irq. Since [devm_]irq_alloc_desc allocs a single irq and since irq 0 is not used on some systems, it can return 0, triggering that WARN(). This happens e.g. on Intel Bay Trail and Cherry Trail devices using the LPE audio engine for HDMI audio: 0 is an invalid IRQ number WARNING: CPU: 3 PID: 472 at drivers/base/platform.c:238 platform_get_irq_optional+0x108/0x180 Modules linked in: snd_hdmi_lpe_audio(+) ... Call Trace: platform_get_irq+0x17/0x30 hdmi_lpe_audio_probe+0x4a/0x6c0 [snd_hdmi_lpe_audio] ---[ end trace ceece38854223a0b ]--- Change the 'from' parameter passed to __[devm_]irq_alloc_descs() by the [devm_]irq_alloc_desc macros from 0 to 1, so that these macros will no longer return 0. Fixes: a85a6c86 ("driver core: platform: Clarify that IRQ 0 is invalid") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201221185647.226146-1-hdegoede@redhat.com
-
Aurelien Aptel authored
Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Suggested-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> CC: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
-
Lai Jiangshan authored
local_db_save() is called at the start of exc_debug_kernel(), reads DR7 and disables breakpoints to prevent recursion. When running in a guest (X86_FEATURE_HYPERVISOR), local_db_save() reads the per-cpu variable cpu_dr7 to check whether a breakpoint is active or not before it accesses DR7. A data breakpoint on cpu_dr7 therefore results in infinite #DB recursion. Disallow data breakpoints on cpu_dr7 to prevent that. Fixes: 84b6a349("x86/entry: Optimize local_db_save() for virt") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210204152708.21308-2-jiangshanlai@gmail.com
-
Lai Jiangshan authored
When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value via __per_cpu_offset or pcpu_unit_offsets. When a data breakpoint is set on __per_cpu_offset[cpu] (read-write operation), the specific CPU will be stuck in an infinite #DB loop. RCU will try to send an NMI to the specific CPU, but it is not working either since NMI also relies on paranoid_entry(). Which means it's undebuggable. Fixes: eaad9812("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com
-
Nathan Chancellor authored
Use my @kernel.org for all points of contact so that I am always accessible. Link: https://lkml.kernel.org/r/20210126212730.2097108-1-nathan@kernel.orgSigned-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Muchun Song authored
The VM_BUG_ON_PAGE avoids the generation of any code, even if that expression has side-effects when !CONFIG_DEBUG_VM. Link: https://lkml.kernel.org/r/20210126031009.96266-1-songmuchun@bytedance.com Fixes: e5dfaceb ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nathan Chancellor authored
When building ARCH=mips 32r2el_defconfig with CONFIG_UBSAN_ALIGNMENT: ld.lld: error: undefined symbol: __ubsan_handle_alignment_assumption referenced by slab.h:557 (include/linux/slab.h:557) main.o:(do_initcalls) in archive init/built-in.a referenced by slab.h:448 (include/linux/slab.h:448) do_mounts_rd.o:(rd_load_image) in archive init/built-in.a referenced by slab.h:448 (include/linux/slab.h:448) do_mounts_rd.o:(identify_ramdisk_image) in archive init/built-in.a referenced 1579 more times Implement this for the kernel based on LLVM's handleAlignmentAssumptionImpl because the kernel is not linked against the compiler runtime. Link: https://github.com/ClangBuiltLinux/linux/issues/1245 Link: https://github.com/llvm/llvm-project/blob/llvmorg-11.0.1/compiler-rt/lib/ubsan/ubsan_handlers.cpp#L151-L190 Link: https://lkml.kernel.org/r/20210127224451.2587372-1-nathan@kernel.orgSigned-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vincenzo Frascino authored
Currently, addr_has_metadata() returns true for every address. An invalid address (e.g. NULL) passed to the function when, KASAN_HW_TAGS is enabled, leads to a kernel panic. Make addr_has_metadata() return true for valid addresses only. Note: KASAN_HW_TAGS support for vmalloc will be added with a future patch. Link: https://lkml.kernel.org/r/20210126134409.47894-3-vincenzo.frascino@arm.com Fixes: 2e903b91 ("kasan, arm64: implement HW_TAGS runtime") Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vincenzo Frascino authored
Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5. With the introduction of KASAN_HW_TAGS, kasan_report() currently assumes that every location in memory has valid metadata associated. This is due to the fact that addr_has_metadata() returns always true. As a consequence of this, an invalid address (e.g. NULL pointer address) passed to kasan_report() when KASAN_HW_TAGS is enabled, leads to a kernel panic. Example below, based on arm64: BUG: KASAN: invalid-access in 0x0 Read at addr 0000000000000000 by task swapper/0/1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 ... Call trace: mte_get_mem_tag+0x24/0x40 kasan_report+0x1a4/0x410 alsa_sound_last_init+0x8c/0xa4 do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x1d4/0x23c kernel_init+0x14/0x118 ret_from_fork+0x10/0x34 Code: d65f03c0 9000f021 f9428021 b6cfff61 (d9600000) ---[ end trace 377c8bb45bdd3a1a ]--- hrtimer: interrupt took 48694256 ns note: swapper/0[1] exited with preempt_count 1 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b SMP: stopping secondary CPUs Kernel Offset: 0x35abaf140000 from 0xffff800010000000 PHYS_OFFSET: 0x40000000 CPU features: 0x0a7e0152,61c0a030 Memory Limit: none ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- This series fixes the behavior of addr_has_metadata() that now returns true only when the address is valid. This patch (of 2): With the introduction of KASAN_HW_TAGS, kasan_report() accesses the metadata only when addr_has_metadata() succeeds. Add a comment to make sure that the preconditions to the function are explicitly clarified. Link: https://lkml.kernel.org/r/20210126134409.47894-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/20210126134409.47894-2-vincenzo.frascino@arm.comSigned-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Waiman Long authored
Commit 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked() causing the following splat: page dumped because: VM_BUG_ON_PAGE(page_memcg(page)) pages's memcg:ffff8889a4116000 ------------[ cut here ]------------ kernel BUG at mm/memcontrol.c:2924! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1 Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017 RIP: commit_charge+0xf4/0x130 Call Trace: mem_cgroup_charge+0x175/0x770 __add_to_page_cache_locked+0x712/0xad0 add_to_page_cache_lru+0xc5/0x1f0 cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache] __nfs_readpages_from_fscache+0x16d/0x630 [nfs] nfs_readpages+0x24e/0x540 [nfs] read_pages+0x5b1/0xc40 page_cache_ra_unbounded+0x460/0x750 generic_file_buffered_read_get_pages+0x290/0x1710 generic_file_buffered_read+0x2a9/0xc30 nfs_file_read+0x13f/0x230 [nfs] new_sync_read+0x3af/0x610 vfs_read+0x339/0x4b0 ksys_read+0xf1/0x1c0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Before that commit, there was a try_charge() and commit_charge() in __add_to_page_cache_locked(). These two separated charge functions were replaced by a single mem_cgroup_charge(). However, it forgot to add a matching mem_cgroup_uncharge() when the xarray insertion failed with the page released back to the pool. Fix this by adding a mem_cgroup_uncharge() call when insertion error happens. Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com Fixes: 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manivannan Sadhasivam authored
Map my personal and work addresses to korg mail address. Link: https://lkml.kernel.org/r/20210201104640.108556-1-manivannan.sadhasivam@linaro.orgSigned-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Viresh Kumar authored
For some of the patches the email id was misspelled to linaro.com instead of linaro.org and for others Viresh Kumar was written as "viresh kumar" (all small). Fix both with help of mailmap entries. Link: https://lkml.kernel.org/r/d6b80b210d7fe0ddc1d4d0b22eff9708c72ef8b3.1612178938.git.viresh.kumar@linaro.orgSigned-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roman Gushchin authored
With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node ------------[ cut here ]------------ memblock: bottom-up allocation failed, memory hotremove may be affected WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:memblock_find_in_range_node+0x178/0x25a Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 Call Trace: memblock_alloc_range_nid+0x8d/0x11e cma_declare_contiguous_nid+0x2c4/0x38c hugetlb_cma_reserve+0xdc/0x128 flush_tlb_one_kernel+0xc/0x20 native_set_fixmap+0x82/0xd0 flat_get_apic_id+0x5/0x10 register_lapic_address+0x8e/0x97 setup_arch+0x8a5/0xc3f start_kernel+0x66/0x547 load_ucode_bsp+0x4c/0xcd secondary_startup_64_no_verify+0xb0/0xbb random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com Fixes: 8fabc623 ("powerpc: Ensure that swiotlb buffer is allocated from low memory") Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb56 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Berg authored
On ARCH=um, loading a module doesn't result in its constructors getting called, which breaks module gcov since the debugfs files are never registered. On the other hand, in-kernel constructors have already been called by the dynamic linker, so we can't call them again. Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be selected, but avoiding the in-kernel constructor calls. Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we really do want CONSTRUCTORS, just not kernel binary ones. Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeidSigned-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-