1. 25 Mar, 2022 5 commits
  2. 24 Mar, 2022 35 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 52deda95
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
       "Various misc subsystems, before getting into the post-linux-next
        material.
      
        41 patches.
      
        Subsystems affected by this patch series: procfs, misc, core-kernel,
        lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
        taskstats, panic, kcov, resource, and ubsan"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
        Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
        kernel/resource: fix kfree() of bootmem memory again
        kcov: properly handle subsequent mmap calls
        kcov: split ioctl handling into locked and unlocked parts
        panic: move panic_print before kmsg dumpers
        panic: add option to dump all CPUs backtraces in panic_print
        docs: sysctl/kernel: add missing bit to panic_print
        taskstats: remove unneeded dead assignment
        kasan: no need to unset panic_on_warn in end_report()
        ubsan: no need to unset panic_on_warn in ubsan_epilogue()
        panic: unset panic_on_warn inside panic()
        docs: kdump: add scp example to write out the dump file
        docs: kdump: update description about sysfs file system support
        arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
        x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
        riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
        kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
        cgroup: use irqsave in cgroup_rstat_flush_locked().
        fat: use pointer to simple type in put_user()
        minix: fix bug when opening a file with O_DIRECT
        ...
      52deda95
    • Linus Torvalds's avatar
      Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 169e7776
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "The sprinkling of SPI drivers is because we added a new one and Mark
        sent us a SPI driver interface conversion pull request.
      
        Core
        ----
      
         - Introduce XDP multi-buffer support, allowing the use of XDP with
           jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
      
         - Speed up netns dismantling (5x) and lower the memory cost a little.
           Remove unnecessary per-netns sockets. Scope some lists to a netns.
           Cut down RCU syncing. Use batch methods. Allow netdev registration
           to complete out of order.
      
         - Support distinguishing timestamp types (ingress vs egress) and
           maintaining them across packet scrubbing points (e.g. redirect).
      
         - Continue the work of annotating packet drop reasons throughout the
           stack.
      
         - Switch netdev error counters from an atomic to dynamically
           allocated per-CPU counters.
      
         - Rework a few preempt_disable(), local_irq_save() and busy waiting
           sections problematic on PREEMPT_RT.
      
         - Extend the ref_tracker to allow catching use-after-free bugs.
      
        BPF
        ---
      
         - Introduce "packing allocator" for BPF JIT images. JITed code is
           marked read only, and used to be allocated at page granularity.
           Custom allocator allows for more efficient memory use, lower iTLB
           pressure and prevents identity mapping huge pages from getting
           split.
      
         - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
           the correct probe read access method, add appropriate helpers.
      
         - Convert the BPF preload to use light skeleton and drop the
           user-mode-driver dependency.
      
         - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
           its use as a packet generator.
      
         - Allow local storage memory to be allocated with GFP_KERNEL if
           called from a hook allowed to sleep.
      
         - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
           bits to come later).
      
         - Add unstable conntrack lookup helpers for BPF by using the BPF
           kfunc infra.
      
         - Allow cgroup BPF progs to return custom errors to user space.
      
         - Add support for AF_UNIX iterator batching.
      
         - Allow iterator programs to use sleepable helpers.
      
         - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
      
         - Add BTFGen support to bpftool which allows to use CO-RE in kernels
           without BTF info.
      
         - Large number of libbpf API improvements, cleanups and deprecations.
      
        Protocols
        ---------
      
         - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
      
         - Adjust TSO packet sizes based on min_rtt, allowing very low latency
           links (data centers) to always send full-sized TSO super-frames.
      
         - Make IPv6 flow label changes (AKA hash rethink) more configurable,
           via sysctl and setsockopt. Distinguish between server and client
           behavior.
      
         - VxLAN support to "collect metadata" devices to terminate only
           configured VNIs. This is similar to VLAN filtering in the bridge.
      
         - Support inserting IPv6 IOAM information to a fraction of frames.
      
         - Add protocol attribute to IP addresses to allow identifying where
           given address comes from (kernel-generated, DHCP etc.)
      
         - Support setting socket and IPv6 options via cmsg on ping6 sockets.
      
         - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
           Define dscp_t and stop taking ECN bits into account in fib-rules.
      
         - Add support for locked bridge ports (for 802.1X).
      
         - tun: support NAPI for packets received from batched XDP buffs,
           doubling the performance in some scenarios.
      
         - IPv6 extension header handling in Open vSwitch.
      
         - Support IPv6 control message load balancing in bonding, prevent
           neighbor solicitation and advertisement from using the wrong port.
           Support NS/NA monitor selection similar to existing ARP monitor.
      
         - SMC
            - improve performance with TCP_CORK and sendfile()
            - support auto-corking
            - support TCP_NODELAY
      
         - MCTP (Management Component Transport Protocol)
            - add user space tag control interface
            - I2C binding driver (as specified by DMTF DSP0237)
      
         - Multi-BSSID beacon handling in AP mode for WiFi.
      
         - Bluetooth:
            - handle MSFT Monitor Device Event
            - add MGMT Adv Monitor Device Found/Lost events
      
         - Multi-Path TCP:
            - add support for the SO_SNDTIMEO socket option
            - lots of selftest cleanups and improvements
      
         - Increase the max PDU size in CAN ISOTP to 64 kB.
      
        Driver API
        ----------
      
         - Add HW counters for SW netdevs, a mechanism for devices which
           offload packet forwarding to report packet statistics back to
           software interfaces such as tunnels.
      
         - Select the default NIC queue count as a fraction of number of
           physical CPU cores, instead of hard-coding to 8.
      
         - Expose devlink instance locks to drivers. Allow device layer of
           drivers to use that lock directly instead of creating their own
           which always runs into ordering issues in devlink callbacks.
      
         - Add header/data split indication to guide user space enabling of
           TCP zero-copy Rx.
      
         - Allow configuring completion queue event size.
      
         - Refactor page_pool to enable fragmenting after allocation.
      
         - Add allocation and page reuse statistics to page_pool.
      
         - Improve Multiple Spanning Trees support in the bridge to allow
           reuse of topologies across VLANs, saving HW resources in switches.
      
         - DSA (Distributed Switch Architecture):
            - replay and offload of host VLAN entries
            - offload of static and local FDB entries on LAG interfaces
            - FDB isolation and unicast filtering
      
        New hardware / drivers
        ----------------------
      
         - Ethernet:
            - LAN937x T1 PHYs
            - Davicom DM9051 SPI NIC driver
            - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
            - Microchip ksz8563 switches
            - Netronome NFP3800 SmartNICs
            - Fungible SmartNICs
            - MediaTek MT8195 switches
      
         - WiFi:
            - mt76: MediaTek mt7916
            - mt76: MediaTek mt7921u USB adapters
            - brcmfmac: Broadcom BCM43454/6
      
         - Mobile:
            - iosm: Intel M.2 7360 WWAN card
      
        Drivers
        -------
      
         - Convert many drivers to the new phylink API built for split PCS
           designs but also simplifying other cases.
      
         - Intel Ethernet NICs:
            - add TTY for GNSS module for E810T device
            - improve AF_XDP performance
            - GTP-C and GTP-U filter offload
            - QinQ VLAN support
      
         - Mellanox Ethernet NICs (mlx5):
            - support xdp->data_meta
            - multi-buffer XDP
            - offload tc push_eth and pop_eth actions
      
         - Netronome Ethernet NICs (nfp):
            - flow-independent tc action hardware offload (police / meter)
            - AF_XDP
      
         - Other Ethernet NICs:
            - at803x: fiber and SFP support
            - xgmac: mdio: preamble suppression and custom MDC frequencies
            - r8169: enable ASPM L1.2 if system vendor flags it as safe
            - macb/gem: ZynqMP SGMII
            - hns3: add TX push mode
            - dpaa2-eth: software TSO
            - lan743x: multi-queue, mdio, SGMII, PTP
            - axienet: NAPI and GRO support
      
         - Mellanox Ethernet switches (mlxsw):
            - source and dest IP address rewrites
            - RJ45 ports
      
         - Marvell Ethernet switches (prestera):
            - basic routing offload
            - multi-chain TC ACL offload
      
         - NXP embedded Ethernet switches (ocelot & felix):
            - PTP over UDP with the ocelot-8021q DSA tagging protocol
            - basic QoS classification on Felix DSA switch using dcbnl
            - port mirroring for ocelot switches
      
         - Microchip high-speed industrial Ethernet (sparx5):
            - offloading of bridge port flooding flags
            - PTP Hardware Clock
      
         - Other embedded switches:
            - lan966x: PTP Hardward Clock
            - qca8k: mdio read/write operations via crafted Ethernet packets
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
            - enable RX PPDU stats in monitor co-exist mode
      
         - Intel WiFi (iwlwifi):
            - UHB TAS enablement via BIOS
            - band disablement via BIOS
            - channel switch offload
            - 32 Rx AMPDU sessions in newer devices
      
         - MediaTek WiFi (mt76):
            - background radar detection
            - thermal management improvements on mt7915
            - SAR support for more mt76 platforms
            - MBSSID and 6 GHz band on mt7915
      
         - RealTek WiFi:
            - rtw89: AP mode
            - rtw89: 160 MHz channels and 6 GHz band
            - rtw89: hardware scan
      
         - Bluetooth:
            - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
      
         - Microchip CAN (mcp251xfd):
            - multiple RX-FIFOs and runtime configurable RX/TX rings
            - internal PLL, runtime PM handling simplification
            - improve chip detection and error handling after wakeup"
      
      * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
        llc: fix netdevice reference leaks in llc_ui_bind()
        drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
        ice: don't allow to run ice_send_event_to_aux() in atomic ctx
        ice: fix 'scheduling while atomic' on aux critical err interrupt
        net/sched: fix incorrect vlan_push_eth dest field
        net: bridge: mst: Restrict info size queries to bridge ports
        net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
        drivers: net: xgene: Fix regression in CRC stripping
        net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
        net: dsa: fix missing host-filtered multicast addresses
        net/mlx5e: Fix build warning, detected write beyond size of field
        iwlwifi: mvm: Don't fail if PPAG isn't supported
        selftests/bpf: Fix kprobe_multi test.
        Revert "rethook: x86: Add rethook x86 implementation"
        Revert "arm64: rethook: Add arm64 rethook implementation"
        Revert "powerpc: Add rethook support"
        Revert "ARM: rethook: Add rethook arm implementation"
        netdevice: add missing dm_private kdoc
        net: bridge: mst: prevent NULL deref in br_mst_info_size()
        selftests: forwarding: Use same VRF for port and VLAN upper
        ...
      169e7776
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio · 7403e6d8
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
      
       - Introduce new device migration uAPI and implement device specific
         mlx5 vfio-pci variant driver supporting new protocol (Jason
         Gunthorpe, Yishai Hadas, Leon Romanovsky)
      
       - New HiSilicon acc vfio-pci variant driver, also supporting migration
         interface (Shameer Kolothum, Longfang Liu)
      
       - D3hot fixes for vfio-pci-core (Abhishek Sahu)
      
       - Document new vfio-pci variant driver acceptance criteria
         (Alex Williamson)
      
       - Fix UML build unresolved ioport_{un}map() functions
         (Alex Williamson)
      
       - Fix MAINTAINERS due to header movement (Lukas Bulwahn)
      
      * tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio: (31 commits)
        vfio-pci: Provide reviewers and acceptance criteria for variant drivers
        MAINTAINERS: adjust entry for header movement in hisilicon qm driver
        hisi_acc_vfio_pci: Use its own PCI reset_done error handler
        hisi_acc_vfio_pci: Add support for VFIO live migration
        crypto: hisilicon/qm: Set the VF QM state register
        hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver
        hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
        hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
        hisi_acc_qm: Move VF PCI device IDs to common header
        crypto: hisilicon/qm: Move few definitions to common header
        crypto: hisilicon/qm: Move the QM header to include/linux
        vfio/mlx5: Fix to not use 0 as NULL pointer
        PCI/IOV: Fix wrong kernel-doc identifier
        vfio/mlx5: Use its own PCI reset_done error handler
        vfio/pci: Expose vfio_pci_core_aer_err_detected()
        vfio/mlx5: Implement vfio_pci driver for mlx5 devices
        vfio/mlx5: Expose migration commands over mlx5 device
        vfio: Remove migration protocol v1 documentation
        vfio: Extend the device migration protocol with RUNNING_P2P
        vfio: Define device migration protocol v2
        ...
      7403e6d8
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed-20220322' of... · 66711cfe
      Linus Torvalds authored
      Merge tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv updates from Wei Liu:
       "Minor patches from various people"
      
      * tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/hyperv: Output host build info as normal Windows version number
        hv_balloon: rate-limit "Unhandled message" warning
        drivers: hv: log when enabling crash_kexec_post_notifiers
        hv_utils: Add comment about max VMbus packet size in VSS driver
        Drivers: hv: Compare cpumasks and not their weights in init_vp_index()
        Drivers: hv: Rename 'alloced' to 'allocated'
        Drivers: hv: vmbus: Use struct_size() helper in kmalloc()
      66711cfe
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 1ebdbeb0
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "ARM:
         - Proper emulation of the OSLock feature of the debug architecture
      
         - Scalibility improvements for the MMU lock when dirty logging is on
      
         - New VMID allocator, which will eventually help with SVA in VMs
      
         - Better support for PMUs in heterogenous systems
      
         - PSCI 1.1 support, enabling support for SYSTEM_RESET2
      
         - Implement CONFIG_DEBUG_LIST at EL2
      
         - Make CONFIG_ARM64_ERRATUM_2077057 default y
      
         - Reduce the overhead of VM exit when no interrupt is pending
      
         - Remove traces of 32bit ARM host support from the documentation
      
         - Updated vgic selftests
      
         - Various cleanups, doc updates and spelling fixes
      
        RISC-V:
         - Prevent KVM_COMPAT from being selected
      
         - Optimize __kvm_riscv_switch_to() implementation
      
         - RISC-V SBI v0.3 support
      
        s390:
         - memop selftest
      
         - fix SCK locking
      
         - adapter interruptions virtualization for secure guests
      
         - add Claudio Imbrenda as maintainer
      
         - first step to do proper storage key checking
      
        x86:
         - Continue switching kvm_x86_ops to static_call(); introduce
           static_call_cond() and __static_call_ret0 when applicable.
      
         - Cleanup unused arguments in several functions
      
         - Synthesize AMD 0x80000021 leaf
      
         - Fixes and optimization for Hyper-V sparse-bank hypercalls
      
         - Implement Hyper-V's enlightened MSR bitmap for nested SVM
      
         - Remove MMU auditing
      
         - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
           page tracking is enabled
      
         - Cleanup the implementation of the guest PGD cache
      
         - Preparation for the implementation of Intel IPI virtualization
      
         - Fix some segment descriptor checks in the emulator
      
         - Allow AMD AVIC support on systems with physical APIC ID above 255
      
         - Better API to disable virtualization quirks
      
         - Fixes and optimizations for the zapping of page tables:
      
            - Zap roots in two passes, avoiding RCU read-side critical
              sections that last too long for very large guests backed by 4
              KiB SPTEs.
      
            - Zap invalid and defunct roots asynchronously via
              concurrency-managed work queue.
      
            - Allowing yielding when zapping TDP MMU roots in response to the
              root's last reference being put.
      
            - Batch more TLB flushes with an RCU trick. Whoever frees the
              paging structure now holds RCU as a proxy for all vCPUs running
              in the guest, i.e. to prolongs the grace period on their behalf.
              It then kicks the the vCPUs out of guest mode before doing
              rcu_read_unlock().
      
        Generic:
         - Introduce __vcalloc and use it for very large allocations that need
           memcg accounting"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
        KVM: use kvcalloc for array allocations
        KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
        kvm: x86: Require const tsc for RT
        KVM: x86: synthesize CPUID leaf 0x80000021h if useful
        KVM: x86: add support for CPUID leaf 0x80000021
        KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
        Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
        kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
        KVM: arm64: fix typos in comments
        KVM: arm64: Generalise VM features into a set of flags
        KVM: s390: selftests: Add error memop tests
        KVM: s390: selftests: Add more copy memop tests
        KVM: s390: selftests: Add named stages for memop test
        KVM: s390: selftests: Add macro as abstraction for MEM_OP
        KVM: s390: selftests: Split memop tests
        KVM: s390x: fix SCK locking
        RISC-V: KVM: Implement SBI HSM suspend call
        RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
        RISC-V: Add SBI HSM suspend related defines
        RISC-V: KVM: Implement SBI v0.3 SRST extension
        ...
      1ebdbeb0
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20220322' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · efee6c79
      Linus Torvalds authored
      Pull tomoyo update from Tetsuo Handa:
       "Avoid unnecessarily leaking kernel command line arguments"
      
      * tag 'tomoyo-pr-20220322' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        TOMOYO: fix __setup handlers return values
      efee6c79
    • Linus Torvalds's avatar
      Merge tag 'flexible-array-transformations-5.18-rc1' of... · 3ce62cf4
      Linus Torvalds authored
      Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull flexible-array transformations from Gustavo Silva:
       "Treewide patch that replaces zero-length arrays with flexible-array
        members.
      
        This has been baking in linux-next for a whole development cycle"
      
      * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
        treewide: Replace zero-length arrays with flexible-array members
      3ce62cf4
    • Linus Torvalds's avatar
      Merge tag 'prlimit-tasklist_lock-for-v5.18' of... · cd4699c5
      Linus Torvalds authored
      Merge tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull tasklist_lock optimizations from Eric Biederman:
       "prlimit and getpriority tasklist_lock optimizations
      
        The tasklist_lock popped up as a scalability bottleneck on some
        testing workloads. The readlocks in do_prlimit and set/getpriority are
        not necessary in all cases.
      
        Based on a cycles profile, it looked like ~87% of the time was spent
        in the kernel, ~42% of which was just trying to get *some* spinlock
        (queued_spin_lock_slowpath, not necessarily the tasklist_lock).
      
        The big offenders (with rough percentages in cycles of the overall
        trace):
         - do_wait 11%
         - setpriority 8% (done previously in commit 7f8ca0ed)
         - kill 8%
         - do_exit 5%
         - clone 3%
         - prlimit64 2%   (this patchset)
         - getrlimit 1%   (this patchset)
      
        I can't easily test this patchset on the original workload for various
        reasons. Instead, I used the microbenchmark below to at least verify
        there was some improvement. This patchset had a 28% speedup (12% from
        baseline to set/getprio, then another 14% for prlimit).
      
        This series used to do the setpriority case, but an almost identical
        change was merged as commit 7f8ca0ed ("kernel/sys.c: only take
        tasklist_lock for get/setpriority(PRIO_PGRP)") so that has been
        dropped from here.
      
        One interesting thing is that my libc's getrlimit() was calling
        prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64
        had no effect - it essentially optimized the older syscalls only. I
        didn't do that in this patchset, but figured I'd mention it since it
        was an option from the previous patch's discussion"
      
      micobenchmark.c:
      ---------------
      	int main(int argc, char **argv)
      	{
      		pid_t child;
      		struct rlimit rlim[1];
      
      		fork(); fork(); fork(); fork(); fork(); fork();
      
      		for (int i = 0; i < 5000; i++) {
      			child = fork();
      			if (child < 0)
      				exit(1);
      			if (child > 0) {
      				usleep(1000);
      				kill(child, SIGTERM);
      				waitpid(child, NULL, 0);
      			} else {
      				for (;;) {
      					setpriority(PRIO_PROCESS, 0,
      						    getpriority(PRIO_PROCESS, 0));
      					getrlimit(RLIMIT_CPU, rlim);
      				}
      			}
      		}
      
      		return 0;
      	}
      
      Link: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/ [v1]
      Link: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/ [v2]
      Link: https://lore.kernel.org/lkml/20220106172041.522167-1-brho@google.com/ [v3]
      
      * tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        prlimit: do not grab the tasklist_lock
        prlimit: make do_prlimit() static
      cd4699c5
    • Linus Torvalds's avatar
      Merge tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 2e2d4650
      Linus Torvalds authored
      Pull mount attributes PREEMPT_RT update from Christian Brauner:
       "This contains Sebastian's fix to make changing mount
        attributes/getting write access compatible with CONFIG_PREEMPT_RT.
      
        The change only applies when users explicitly opt-in to real-time via
        CONFIG_PREEMPT_RT otherwise things are exactly as before. We've waited
        quite a long time with this to make sure folks could take a good look"
      
      * tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT.
      2e2d4650
    • Linus Torvalds's avatar
      Merge tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 15f2e3d6
      Linus Torvalds authored
      Pull mount_setattr updates from Christian Brauner:
       "This contains a few more patches to massage the mount_setattr()
        codepaths and one minor fix to reuse a helper we added some time back.
      
        The final two patches do similar cleanups in different ways. One patch
        is mine and the other is Al's who was nice enough to give me a branch
        for it.
      
        Since his came in later and my branch had been sitting in -next for
        quite some time we just put his on top instead of swap them"
      
      * tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        mount_setattr(): clean the control flow and calling conventions
        fs: clean up mount_setattr control flow
        fs: don't open-code mnt_hold_writers()
        fs: simplify check in mount_setattr_commit()
        fs: add mnt_allow_writers() and simplify mount_setattr_prepare()
      15f2e3d6
    • Marco Elver's avatar
      Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" · b027471a
      Marco Elver authored
      This reverts commit ea91a1d4.
      
      Since df05c0e9 ("Documentation: Raise the minimum supported version
      of LLVM to 11.0.0") the minimum Clang version is now 11.0, which fixed
      the UBSAN/KCSAN vs. KCOV incompatibilities.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=45831
      Link: https://lkml.kernel.org/r/YaodyZzu0MTCJcvO@elver.google.com
      Link: https://lkml.kernel.org/r/20220128105631.509772-1-elver@google.comSigned-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b027471a
    • Miaohe Lin's avatar
      kernel/resource: fix kfree() of bootmem memory again · 0cbcc929
      Miaohe Lin authored
      Since commit ebff7d8f ("mem hotunplug: fix kfree() of bootmem
      memory"), we could get a resource allocated during boot via
      alloc_resource().  And it's required to release the resource using
      free_resource().  Howerver, many people use kfree directly which will
      result in kernel BUG.  In order to fix this without fixing every call
      site, just leak a couple of bytes in such corner case.
      
      Link: https://lkml.kernel.org/r/20220217083619.19305-1-linmiaohe@huawei.com
      Fixes: ebff7d8f ("mem hotunplug: fix kfree() of bootmem memory")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0cbcc929
    • Aleksandr Nogikh's avatar
      kcov: properly handle subsequent mmap calls · b3d7fe86
      Aleksandr Nogikh authored
      Allocate the kcov buffer during KCOV_MODE_INIT in order to untie mmapping
      of a kcov instance and the actual coverage collection process. Modify
      kcov_mmap, so that it can be reliably used any number of times once
      KCOV_MODE_INIT has succeeded.
      
      These changes to the user-facing interface of the tool only weaken the
      preconditions, so all existing user space code should remain compatible
      with the new version.
      
      Link: https://lkml.kernel.org/r/20220117153634.150357-3-nogikh@google.comSigned-off-by: default avatarAleksandr Nogikh <nogikh@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Taras Madan <tarasmadan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3d7fe86
    • Aleksandr Nogikh's avatar
      kcov: split ioctl handling into locked and unlocked parts · 17581aa1
      Aleksandr Nogikh authored
      Patch series "kcov: improve mmap processing", v3.
      
      Subsequent mmaps of the same kcov descriptor currently do not update the
      virtual memory of the task and yet return 0 (success).  This is
      counter-intuitive and may lead to unexpected memory access errors.
      
      Also, this unnecessarily limits the functionality of kcov to only the
      simplest usage scenarios.  Kcov instances are effectively forever attached
      to their first address spaces and it becomes impossible to e.g.  reuse the
      same kcov handle in forked child processes without mmapping the memory
      first.  This is exactly what we tried to do in syzkaller and inadvertently
      came upon this behavior.
      
      This patch series addresses the problem described above.
      
      This patch (of 3):
      
      Currently all ioctls are de facto processed under a spinlock in order to
      serialise them.  This, however, prohibits the use of vmalloc and other
      memory management functions in the implementations of those ioctls,
      unnecessary complicating any further changes to the code.
      
      Let all ioctls first be processed inside the kcov_ioctl() function which
      should execute the ones that are not compatible with spinlock and then
      pass control to kcov_ioctl_locked() for all other ones.
      KCOV_REMOTE_ENABLE is processed both in kcov_ioctl() and
      kcov_ioctl_locked() as the steps are easily separable.
      
      Although it is still compatible with a spinlock, move KCOV_INIT_TRACE
      handling to kcov_ioctl(), so that the changes from the next commit are
      easier to follow.
      
      Link: https://lkml.kernel.org/r/20220117153634.150357-1-nogikh@google.com
      Link: https://lkml.kernel.org/r/20220117153634.150357-2-nogikh@google.comSigned-off-by: default avatarAleksandr Nogikh <nogikh@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Taras Madan <tarasmadan@google.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17581aa1
    • Guilherme G. Piccoli's avatar
      panic: move panic_print before kmsg dumpers · f953f140
      Guilherme G. Piccoli authored
      The panic_print setting allows users to collect more information in a
      panic event, like memory stats, tasks, CPUs backtraces, etc.  This is an
      interesting debug mechanism, but currently the print event happens *after*
      kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with
      the panic_print extra information.
      
      This patch changes that in 2 steps:
      
      (a) The panic_print setting allows to replay the existing kernel log
          buffer to the console (bit 5), besides the extra information dump.
          This functionality makes sense only at the end of the panic()
          function.  So, we hereby allow to distinguish the two situations by a
          new boolean parameter in the function panic_print_sys_info().
      
      (b) With the above change, we can safely call panic_print_sys_info()
          before kmsg_dump(), allowing to dump the extra information when using
          pstore or other kmsg dumpers.
      
      The additional messages from panic_print could overwrite the oldest
      messages when the buffer is full.  The only reasonable solution is to use
      a large enough log buffer, hence we added an advice into the kernel
      parameters documentation about that.
      
      Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.comSigned-off-by: default avatarGuilherme G. Piccoli <gpiccoli@igalia.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f953f140
    • Guilherme G. Piccoli's avatar
      panic: add option to dump all CPUs backtraces in panic_print · 8d470a45
      Guilherme G. Piccoli authored
      Currently the "panic_print" parameter/sysctl allows some interesting debug
      information to be printed during a panic event.  This is useful for
      example in cases the user cannot kdump due to resource limits, or if the
      user collects panic logs in a serial output (or pstore) and prefers a fast
      reboot instead of a kdump.
      
      Happens that currently there's no way to see all CPUs backtraces in a
      panic using "panic_print" on architectures that support that.  We do have
      "oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
      functionality, they are orthogonal in nature: "panic_print" is a panic
      tuning (and we have panics without oopses, like direct calls to panic() or
      maybe other paths that don't go through oops_enter() function), and the
      original purpose of "oops_all_cpu_backtrace" is to provide more
      information on oopses for cases in which the users desire to continue
      running the kernel even after an oops, i.e., used in non-panic scenarios.
      
      So, we hereby introduce an additional bit for "panic_print" to allow
      dumping the CPUs backtraces during a panic event.
      
      Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.comSigned-off-by: default avatarGuilherme G. Piccoli <gpiccoli@igalia.com>
      Reviewed-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d470a45
    • Guilherme G. Piccoli's avatar
      docs: sysctl/kernel: add missing bit to panic_print · a1ff1de0
      Guilherme G. Piccoli authored
      Patch series "Some improvements on panic_print".
      
      This is a mix of a documentation fix with some additions to the
      "panic_print" syscall / parameter.  The goal here is being able to collect
      all CPUs backtraces during a panic event and also to enable "panic_print"
      in a kdump event - details of the reasoning and design choices in the
      patches.
      
      This patch (of 3):
      
      Commit de6da1e8 ("panic: add an option to replay all the printk
      message in buffer") added a new bit to the sysctl/kernel parameter
      "panic_print", but the documentation was added only in
      kernel-parameters.txt, not in the sysctl guide.
      
      Fix it here by adding bit 5 to sysctl admin-guide documentation.
      
      [rdunlap@infradead.org: fix table format warning]
        Link: https://lkml.kernel.org/r/20220109055635.6999-1-rdunlap@infradead.org
      
      Link: https://lkml.kernel.org/r/20211109202848.610874-1-gpiccoli@igalia.com
      Link: https://lkml.kernel.org/r/20211109202848.610874-2-gpiccoli@igalia.com
      Fixes: de6da1e8 ("panic: add an option to replay all the printk message in buffer")
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@igalia.com>
      Reviewed-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1ff1de0
    • Lukas Bulwahn's avatar
      taskstats: remove unneeded dead assignment · 92333baa
      Lukas Bulwahn authored
      make clang-analyzer on x86_64 defconfig caught my attention with:
      
        kernel/taskstats.c:120:2: warning: Value stored to 'rc' is never read \
        [clang-analyzer-deadcode.DeadStores]
                rc = 0;
                ^
      
      Commit d94a0415 ("taskstats: free skb, avoid returns in
      send_cpu_listeners") made send_cpu_listeners() not return a value and
      hence, the rc variable remained only to be used within the loop where
      it is always assigned before read and it does not need any other
      initialisation.
      
      So, simply remove this unneeded dead initializing assignment.
      
      As compilers will detect this unneeded assignment and optimize this anyway,
      the resulting object code is identical before and after this change.
      
      No functional change. No change to object code.
      
      [akpm@linux-foundation.org: reduce scope of `rc']
      
      Link: https://lkml.kernel.org/r/20220307093942.21310-1-lukas.bulwahn@gmail.comSigned-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92333baa
    • Tiezhu Yang's avatar
      kasan: no need to unset panic_on_warn in end_report() · e7ce7500
      Tiezhu Yang authored
      panic_on_warn is unset inside panic(), so no need to unset it before
      calling panic() in end_report().
      
      Link: https://lkml.kernel.org/r/1644324666-15947-6-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Xuefeng Li <lixuefeng@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7ce7500
    • Tiezhu Yang's avatar
      ubsan: no need to unset panic_on_warn in ubsan_epilogue() · d83ce027
      Tiezhu Yang authored
      panic_on_warn is unset inside panic(), so no need to unset it before
      calling panic() in ubsan_epilogue().
      
      Link: https://lkml.kernel.org/r/1644324666-15947-5-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Xuefeng Li <lixuefeng@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d83ce027
    • Tiezhu Yang's avatar
      panic: unset panic_on_warn inside panic() · 1a2383e8
      Tiezhu Yang authored
      In the current code, the following three places need to unset
      panic_on_warn before calling panic() to avoid recursive panics:
      
      kernel/kcsan/report.c: print_report()
      kernel/sched/core.c: __schedule_bug()
      mm/kfence/report.c: kfence_report_error()
      
      In order to avoid copy-pasting "panic_on_warn = 0" all over the places,
      it is better to move it inside panic() and then remove it from the other
      places.
      
      Link: https://lkml.kernel.org/r/1644324666-15947-4-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Xuefeng Li <lixuefeng@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a2383e8
    • Tiezhu Yang's avatar
      docs: kdump: add scp example to write out the dump file · ae6694c1
      Tiezhu Yang authored
      Except cp and makedumpfile, add scp example to write out the dump file.
      
      Link: https://lkml.kernel.org/r/1644324666-15947-3-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Marco Elver <elver@google.com>
      Cc: Xuefeng Li <lixuefeng@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae6694c1
    • Tiezhu Yang's avatar
      docs: kdump: update description about sysfs file system support · b2377d4b
      Tiezhu Yang authored
      Patch series "Update doc and fix some issues about kdump", v2.
      
      This patch (of 5):
      
      After commit 6a108a14 ("kconfig: rename CONFIG_EMBEDDED to
      CONFIG_EXPERT"), "Configure standard kernel features (for small
      systems)" is not exist, we should use "Configure standard kernel
      features (expert users)" now.
      
      Link: https://lkml.kernel.org/r/1644324666-15947-1-git-send-email-yangtiezhu@loongson.cn
      Link: https://lkml.kernel.org/r/1644324666-15947-2-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Marco Elver <elver@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Xuefeng Li <lixuefeng@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2377d4b
    • Jisheng Zhang's avatar
      arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef · d339f158
      Jisheng Zhang authored
      Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
      check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
      increase compile coverage.
      
      Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.orgSigned-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d339f158
    • Jisheng Zhang's avatar
      x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef · 4ece09be
      Jisheng Zhang authored
      Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
      check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
      increase compile coverage.
      
      Link: https://lkml.kernel.org/r/20211206160514.2000-4-jszhang@kernel.orgSigned-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ece09be
    • Jisheng Zhang's avatar
      riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef · d414cb37
      Jisheng Zhang authored
      Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
      check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
      increase compile coverage.
      
      Link: https://lkml.kernel.org/r/20211206160514.2000-3-jszhang@kernel.orgSigned-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Acked-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d414cb37
    • Jisheng Zhang's avatar
      kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible · f05fa109
      Jisheng Zhang authored
      Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2.
      
      Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by
      a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
      increase compile coverage.
      
      I only modified x86, arm, arm64 and riscv, other architectures such as
      sh, powerpc and s390 are better to be kept kexec code as-is so they are
      not touched.
      
      This patch (of 5):
      
      Make the forward declarations of crashk_res, crashk_low_res and
      crash_notes always visible.  Code referring to these symbols can then just
      check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional
      compilation using an #ifdef, thus preparing to increase compile coverage
      and simplify the code.
      
      Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org
      Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.orgSigned-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f05fa109
    • Sebastian Andrzej Siewior's avatar
      cgroup: use irqsave in cgroup_rstat_flush_locked(). · b1e2c8df
      Sebastian Andrzej Siewior authored
      All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
      either with spin_lock_irq() or spin_lock_irqsave().
      cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
      is a raw_spin_lock.  This lock is also acquired in
      cgroup_rstat_updated() in IRQ context and therefore requires _irqsave()
      locking suffix in cgroup_rstat_flush_locked().
      
      Since there is no difference between spin_lock_t and raw_spin_lock_t on
      !RT lockdep does not complain here.  On RT lockdep complains because the
      interrupts were not disabled here and a deadlock is possible.
      
      Acquire the raw_spin_lock_t with disabled interrupts.
      
      Link: https://lkml.kernel.org/r/20220301122143.1521823-2-bigeasy@linutronix.deSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Subject: cgroup: add a comment to cgroup_rstat_flush_locked().
      
      Add a comment why spin_lock_irq() -> raw_spin_lock_irqsave() is needed.
      
      Link: https://lkml.kernel.org/r/Yh+DOK73hfVV5ThX@linutronix.deSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1e2c8df
    • Helge Deller's avatar
      fat: use pointer to simple type in put_user() · 2cd50532
      Helge Deller authored
      The put_user(val,ptr) macro wants a pointer to a simple type, but in
      fat_ioctl_filldir() the d_name field references an "array of chars".  Be
      more accurate and explicitly give the pointer to the first character of
      the d_name[] array.
      
      I noticed that issue while trying to optimize the parisc put_user()
      macro and used an intermediate variable to store the pointer.  In that
      case I got this error:
      
        In file included from include/linux/uaccess.h:11,
                         from include/linux/compat.h:17,
                         from fs/fat/dir.c:18:
        fs/fat/dir.c: In function `fat_ioctl_filldir':
        fs/fat/dir.c:725:33: error: invalid initializer
          725 |                 if (put_user(0, d2->d_name)                     ||         \
              |                                 ^~
        include/asm/uaccess.h:152:33: note: in definition of macro `__put_user'
          152 |         __typeof__(ptr) __ptr = ptr;                            \
              |                                 ^~~
        fs/fat/dir.c:759:1: note: in expansion of macro `FAT_IOCTL_FILLDIR_FUNC'
          759 | FAT_IOCTL_FILLDIR_FUNC(fat_ioctl_filldir, __fat_dirent)
      
      Andreas Schwab <schwab@linux-m68k.org> suggested to use
      
         __typeof__(&*(ptr)) __ptr = ptr;
      
      instead.  This works, but nevertheless it's probably reasonable to fix
      the original caller too.
      
      Link: https://lkml.kernel.org/r/Ygo+A9MREmC1H3kr@p100Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2cd50532
    • Qinghua Jin's avatar
      minix: fix bug when opening a file with O_DIRECT · 9ce3c0d2
      Qinghua Jin authored
      Testcase:
      1. create a minix file system and mount it
      2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT
      3. open fails with -EINVAL but leaves an empty file behind. All other
         open() failures don't leave the failed open files behind.
      
      It is hard to check the direct_IO op before creating the inode.  Just as
      ext4 and btrfs do, this patch will resolve the issue by allowing to
      create the file with O_DIRECT but returning error when writing the file.
      
      Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.comSigned-off-by: default avatarQinghua Jin <qhjin.dev@gmail.com>
      Reported-by: default avatarColin Ian King <colin.king@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ce3c0d2
    • Andrei Vagin's avatar
      fs/pipe.c: local vars have to match types of proper pipe_inode_info fields · aeb213cd
      Andrei Vagin authored
      head, tail, ring_size are declared as unsigned int, so all local
      variables that operate with these fields have to be unsigned to avoid
      signed integer overflow.
      
      Right now, it isn't an issue because the maximum pipe size is limited by
      1U<<31.
      
      Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.comSigned-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Suggested-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aeb213cd
    • Andrei Vagin's avatar
      fs/pipe: use kvcalloc to allocate a pipe_buffer array · 5a519c8f
      Andrei Vagin authored
      Right now, kcalloc is used to allocate a pipe_buffer array.  The size of
      the pipe_buffer struct is 40 bytes.  kcalloc allows allocating reliably
      chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3).  It
      means that the maximum pipe size is 3.2MB in this case.
      
      In CRIU, we use pipes to dump processes memory.  CRIU freezes a target
      process, injects a parasite code into it and then this code splices
      memory into pipes.  If a maximum pipe size is small, we need to do many
      iterations or create many pipes.
      
      kvcalloc attempt to allocate physically contiguous memory, but upon
      failure, fall back to non-contiguous (vmalloc) allocation and so it
      isn't limited by PAGE_ALLOC_COSTLY_ORDER.
      
      The maximum pipe size for non-root users is limited by the
      /proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the
      root user will be able to trigger vmalloc allocations.
      
      Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.comSigned-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Reviewed-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a519c8f
    • Randy Dunlap's avatar
      init/main.c: return 1 from handled __setup() functions · f9a40b08
      Randy Dunlap authored
      initcall_blacklist() should return 1 to indicate that it handled its
      cmdline arguments.
      
      set_debug_rodata() should return 1 to indicate that it handled its
      cmdline arguments.  Print a warning if the option string is invalid.
      
      This prevents these strings from being added to the 'init' program's
      environment as they are not init arguments/parameters.
      
      Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9a40b08
    • Randy Dunlap's avatar
      init.h: improve __setup and early_param documentation · abc7da58
      Randy Dunlap authored
      Igor noted in [1] that there are quite a few __setup() handling
      functions that return incorrect values.  Doing this can be harmless, but
      it can also cause strings to be added to init's argument or environment
      list, polluting them.
      
      Since __setup() handling and return values are not documented, first add
      documentation for that.  Also add more documentation for early_param()
      handling and return values.
      
      For __setup() functions, returning 0 (not handled) has questionable
      value if it is just a malformed option value, as in
      
        rodata=junk
      
      since returning 0 would just cause "rodata=junk" to be added to init's
      environment unnecessarily:
      
        Run /sbin/init as init process
          with arguments:
            /sbin/init
          with environment:
            HOME=/
            TERM=linux
            splash=native
            rodata=junk
      
      Also, there are no recommendations on whether to print a warning when an
      unknown parameter value is seen.  I am not addressing that here.
      
      [1] lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
      
      Link: https://lkml.kernel.org/r/20220221050852.1147-1-rdunlap@infradead.orgSigned-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abc7da58
    • Mark-PK Tsai's avatar
      init: use ktime_us_delta() to make initcall_debug log more precise · 105e8c2e
      Mark-PK Tsai authored
      Use ktime_us_delta() to make the initcall_debug log more precise than
      right shifting the result of ktime_to_ns() by 10 bits.
      
      Link: https://lkml.kernel.org/r/20220209053350.15771-1-mark-pk.tsai@mediatek.comSigned-off-by: default avatarMark-PK Tsai <mark-pk.tsai@mediatek.com>
      Reviewed-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Tested-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Valentin Schneider <valentin.schneider@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: YJ Chiang <yj.chiang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      105e8c2e