1. 08 Dec, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 4df7c5fd
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A pair of fixes to the new module load-time relocation code
      
       - A fix for hwprobe overflowing on rv32
      
       - A fix for to correctly decode C.SWSP and C.SDSP, which manifests in
         misaligned access handling
      
       - A fix for a boot-time shadow call stack initialization ordering issue
      
       - A fix for Andes' errata probing, which was calling
         riscv_noncoherent_supported() too late in the boot process and
         triggering an oops
      
      * tag 'riscv-for-linus-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: errata: andes: Probe for IOCP only once in boot stage
        riscv: Fix SMP when shadow call stacks are enabled
        dt-bindings: perf: riscv,pmu: drop unneeded quotes
        riscv: fix misaligned access handling of C.SWSP and C.SDSP
        RISC-V: hwprobe: Always use u64 for extension bits
        Support rv32 ULEB128 test
        riscv: Correct type casting in module loading
        riscv: Safely remove entries from relocation list
      4df7c5fd
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · a6adef89
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Most of the changes are devicetree fixes for NXP, Mediatek, Rockchips
        Arm machines as well as Microchip RISC-V, and most of these address
        build-time warnings for spec violations and other minor issues. One of
        the Mediatek warnings was enabled by default and prevented a clean
        build.
      
        The ones that address serious runtime issues are all on the i.MX
        platform:
      
         - a boot time panic on imx8qm
      
         - USB hanging under load on imx8
      
         - regressions on the imx93 ethernet phy
      
        Code fixes include a minor error handling for the i.MX PMU driver, and
        a number of firmware driver fixes:
      
         - OP-TEE fix for supplicant based device enumeration, and a new sysfs
           attribute to needed to fix a race against userspace
      
         - Arm SCMI fix for possible truncation/overflow in the frequency
           computations
      
         - Multiple FF-A fixes for the newly added notification support"
      
      * tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (55 commits)
        MAINTAINERS: change the S32G2 maintainer's email address.
        arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588
        ARM: dts: imx28-xea: Pass the 'model' property
        ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
        MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry
        arm64: dts: imx8-apalis: set wifi regulator to always-on
        ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
        arm64: dts: imx8ulp: update gpio node name to align with register address
        arm64: dts: imx93: update gpio node name to align with register address
        arm64: dts: imx93: correct mediamix power
        arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup
        arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells
        arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
        dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
        ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
        arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3
        arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
        arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi
        firmware: arm_scmi: Fix possible frequency truncation when using level indexing mode
        firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
        ...
      a6adef89
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 17894c2a
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Snapshot buffer issues:
      
         1. When instances started allowing latency tracers, it uses a
            snapshot buffer (another buffer that is not written to but swapped
            with the main buffer that is). The snapshot buffer needs to be the
            same size as the main buffer. But when the snapshot buffers were
            added to instances, the code to make the snapshot equal to the
            main buffer still was only doing it for the main buffer and not
            the instances.
      
         2. Need to stop the current tracer when resizing the buffers.
            Otherwise there can be a race if the tracer decides to make a
            snapshot between resizing the main buffer and the snapshot buffer.
      
         3. When a tracer is "stopped" in disables both the main buffer and
            the snapshot buffer. This needs to be done for instances and not
            only the main buffer, now that instances also have a snapshot
            buffer.
      
       - Buffered event for filtering issues:
      
         When filtering is enabled, because events can be dropped often, it is
         quicker to copy the event into a temp buffer and write that into the
         main buffer if it is not filtered or just drop the event if it is,
         than to write the event into the ring buffer and then try to discard
         it. This temp buffer is allocated and needs special synchronization
         to do so. But there were some issues with that:
      
         1. When disabling the filter and freeing the buffer, a call to all
            CPUs is required to stop each per_cpu usage. But the code called
            smp_call_function_many() which does not include the current CPU.
            If the task is migrated to another CPU when it enables the CPUs
            via smp_call_function_many(), it will not enable the one it is
            currently on and this causes issues later on. Use
            on_each_cpu_mask() instead, which includes the current CPU.
      
          2.When the allocation of the buffered event fails, it can give a
            warning. But the buffered event is just an optimization (it's
            still OK to write to the ring buffer and free it). Do not WARN in
            this case.
      
          3.The freeing of the buffer event requires synchronization. First a
            counter is decremented to zero so that no new uses of it will
            happen. Then it sets the buffered event to NULL, and finally it
            frees the buffered event. There's a synchronize_rcu() between the
            counter decrement and the setting the variable to NULL, but only a
            smp_wmb() between that and the freeing of the buffer. It is
            theoretically possible that a user missed seeing the decrement,
            but will use the buffer after it is free. Another
            synchronize_rcu() is needed in place of that smp_wmb().
      
       - ring buffer timestamps on 32 bit machines
      
         The ring buffer timestamp on 32 bit machines has to break the 64 bit
         number into multiple values as cmpxchg is required on it, and a 64
         bit cmpxchg on 32 bit architectures is very slow. The code use to
         just use two 32 bit values and make it a 60 bit timestamp where the
         other 4 bits were used as counters for synchronization. It later came
         known that the timestamp on 32 bit still need all 64 bits in some
         cases. So 3 words were created to handle the 64 bits. But issues
         arised with this:
      
          1. The synchronization logic still only compared the counter with
             the first two, but not with the third number, so the
             synchronization could fail unknowingly.
      
          2. A check on discard of an event could race if an event happened
             between the discard and updating one of the counters. The counter
             needs to be updated (forcing an absolute timestamp and not to use
             a delta) before the actual discard happens.
      
      * tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Test last update in 32bit version of __rb_time_read()
        ring-buffer: Force absolute timestamp on discard of event
        tracing: Fix a possible race when disabling buffered events
        tracing: Fix a warning when allocating buffered events fails
        tracing: Fix incomplete locking when disabling buffered events
        tracing: Disable snapshot buffer when stopping instance tracers
        tracing: Stop current tracer when resizing buffer
        tracing: Always update snapshot buffer size
      17894c2a
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of... · 8e819a76
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "31 hotfixes. Ten of these address pre-6.6 issues and are marked
        cc:stable. The remainder address post-6.6 issues or aren't considered
        serious enough to justify backporting"
      
      * tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
        mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
        nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
        mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
        scripts/gdb: fix lx-device-list-bus and lx-device-list-class
        MAINTAINERS: drop Antti Palosaari
        highmem: fix a memory copy problem in memcpy_from_folio
        nilfs2: fix missing error check for sb_set_blocksize call
        kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
        units: add missing header
        drivers/base/cpu: crash data showing should depends on KEXEC_CORE
        mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
        scripts/gdb/tasks: fix lx-ps command error
        mm/Kconfig: make userfaultfd a menuconfig
        selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
        mm/damon/core: copy nr_accesses when splitting region
        lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
        checkstack: fix printed address
        mm/memory_hotplug: fix error handling in add_memory_resource()
        mm/memory_hotplug: add missing mem_hotplug_lock
        .mailmap: add a new address mapping for Chester Lin
        ...
      8e819a76
    • Arnd Bergmann's avatar
      Merge tag 'v6.7-rockchip-dtsfixes1' of... · fd1e5745
      Arnd Bergmann authored
      Merge tag 'v6.7-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes
      
      Devicetree fixes for the 6.7-cycle.
      
      All over the place this time. From adapting the size of the vdec nodes
      on rk3328 and rk3399, fixing some wrong pinctrl settings on rk3128 and
      the Turing RK1 board, emmc-settings fixes on rk3588 and interrupt-name
      mishaps, down to some dt-cleanups.
      
      Also this adds the missing rockchip,rk3588-pmugrf compatible to the soc
      grf binding, that I somehow messed up during the pull requests for the
      -rc1 . At least with it included the dt-checker is happier.
      
      * tag 'v6.7-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
        arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588
        arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
        arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi
        arm64: dts: rockchip: Fix Turing RK1 interrupt pinctrls
        ARM: dts: rockchip: Fix sdmmc_pwren's pinmux setting for RK3128
        arm64: dts: rockchip: minor whitespace cleanup around '='
        ARM: dts: rockchip: minor whitespace cleanup around '='
        dt-bindings: soc: rockchip: grf: add rockchip,rk3588-pmugrf
        arm64: dts: rockchip: fix rk356x pcie msg interrupt name
        arm64: dts: rockchip: Expand reg size of vdec node for RK3399
        arm64: dts: rockchip: Expand reg size of vdec node for RK3328
      
      Link: https://lore.kernel.org/r/2709704.mvXUDI8C0e@philSigned-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      fd1e5745
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5e3f5b81
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff
      
        Current release - new code bugs:
      
         - tcp: assorted fixes to the new Auth Option support
      
        Older releases - regressions:
      
         - tcp: fix mid stream window clamp
      
         - tls: fix incorrect splice handling
      
         - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit()
      
         - dsa: mv88e6xxx: restore USXGMII support for 6393X
      
         - arcnet: restore support for multiple Sohard Arcnet cards
      
        Older releases - always broken:
      
         - tcp: do not accept ACK of bytes we never sent
      
         - require admin privileges to receive packet traces via netlink
      
         - packet: move reference count in packet_sock to atomic_long_t
      
         - bpf:
            - fix incorrect branch offset comparison with cpu=v4
            - fix prog_array_map_poke_run map poke update
      
         - netfilter:
            - three fixes for crashes on bad admin commands
            - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
            - nf_tables: fix 'exist' matching on bigendian arches
      
         - leds: netdev: fix RTNL handling to prevent potential deadlock
      
         - eth: tg3: prevent races in error/reset handling
      
         - eth: r8169: fix rtl8125b PAUSE storm when suspended
      
         - eth: r8152: improve reset and surprise removal handling
      
         - eth: hns: fix race between changing features and sending
      
         - eth: nfp: fix sleep in atomic for bonding offload"
      
      * tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
        net/smc: fix missing byte order conversion in CLC handshake
        net: dsa: microchip: provide a list of valid protocols for xmit handler
        drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
        psample: Require 'CAP_NET_ADMIN' when joining "packets" group
        bpf: sockmap, updating the sg structure should also update curr
        net: tls, update curr on splice as well
        nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
        net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
        tcp: do not accept ACK of bytes we never sent
        selftests/bpf: Add test for early update in prog_array_map_poke_run
        bpf: Fix prog_array_map_poke_run map poke update
        netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
        netfilter: nf_tables: validate family when identifying table via handle
        netfilter: nf_tables: bail out on mismatching dynset and set expressions
        netfilter: nf_tables: fix 'exist' matching on bigendian arches
        netfilter: nft_set_pipapo: skip inactive elements during set walk
        netfilter: bpf: fix bad registration on nf_defrag
        leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
        octeontx2-af: Update Tx link register range
        ...
      5e3f5b81
  2. 07 Dec, 2023 34 commits
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 9ace34a8
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "Just one fix.
      
        Commit f5d39b02 ("freezer,sched: Rewrite core freezer logic")
        changed how freezing state is recorded which made cgroup_freezing()
        disagree with the actual state of the task while thawing triggering a
        warning. Fix it by updating cgroup_freezing()"
      
      * tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup_freezer: cgroup_freezing: Check if not frozen
      9ace34a8
    • Linus Torvalds's avatar
      Merge tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · e0348c1f
      Linus Torvalds authored
      Pull workqueue fix from Tejun Heo:
       "Just one patch to fix a bug which can crash the kernel if the
        housekeeping and wq_unbound_cpu cpumask configuration combination
        leaves the latter empty"
      
      * tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Make sure that wq_unbound_cpumask is never empty
      e0348c1f
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 4388ae22
      Linus Torvalds authored
      Pull regmap fix from Mark Brown:
       "An incremental fix for the fix introduced during the merge window for
        caching of the selector for windowed register ranges. We were
        incorrectly leaking an error code in the case where the last selector
        accessed was for some reason not cached"
      
      * tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: fix bogus error on regcache_sync success
      4388ae22
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · d5c0b601
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix dt-extract-compatibles for builds with in tree build directory
      
       - Drop Xinlei Lee <xinlei.lee@mediatek.com> bouncing email
      
       - Fix the of_reconfig_get_state_change() return value documentation
      
       - Add missing #power-domain-cells property to QCom MPM
      
       - Fix warnings in i.MX LCDIF and adi,adv7533
      
      * tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: display: adi,adv75xx: Document #sound-dai-cells
        dt-bindings: lcdif: Properly describe the i.MX23 interrupts
        dt-bindings: interrupt-controller: Allow #power-domain-cells
        of: dynamic: Fix of_reconfig_get_state_change() return value documentation
        dt-bindings: display: mediatek: dsi: remove Xinlei's mail
        dt: dt-extract-compatibles: Don't follow symlinks when walking tree
      d5c0b601
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.7-3' of... · 33d42bde
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Ilpo Järvinen:
      
       - Fix i8042 filter resource handling, input, and suspend issues in
         asus-wmi
      
       - Skip zero instance WMI blocks to avoid issues with some laptops
      
       - Differentiate dev/production keys in mlxbf-bootctl
      
       - Correct surface serdev related return value to avoid leaking errno
         into userspace
      
       - Error checking fixes
      
      * tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/mellanox: Check devm_hwmon_device_register_with_groups() return value
        platform/mellanox: Add null pointer checks for devm_kasprintf()
        mlxbf-bootctl: correctly identify secure boot with development keys
        platform/x86: wmi: Skip blocks with zero instances
        platform/surface: aggregator: fix recv_buf() return value
        platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend
        platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd
        platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter
        platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
      33d42bde
    • Linus Torvalds's avatar
      Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f35e4663
      Linus Torvalds authored
      Pull x86 int80 fixes from Dave Hansen:
       "Avoid VMM misuse of 'int 0x80' handling in TDX and SEV guests.
      
        It also has the very nice side effect of getting rid of a bunch of
        assembly entry code"
      
      * tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tdx: Allow 32-bit emulation by default
        x86/entry: Do not allow external 0x80 interrupts
        x86/entry: Convert INT 0x80 emulation to IDTENTRY
        x86/coco: Disable 32-bit emulation by default on TDX and SEV
      f35e4663
    • Stefano Garzarella's avatar
      vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning · b0a930e8
      Stefano Garzarella authored
      After backporting commit 581512a6 ("vsock/virtio: MSG_ZEROCOPY
      flag support") in CentOS Stream 9, CI reported the following error:
      
          In file included from ./include/linux/kernel.h:17,
                           from ./include/linux/list.h:9,
                           from ./include/linux/preempt.h:11,
                           from ./include/linux/spinlock.h:56,
                           from net/vmw_vsock/virtio_transport_common.c:9:
          net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘:
          ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
             20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                |                                   ^~
          ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘
             26 |                 (__typecheck(x, y) && __no_side_effects(x, y))
                |                  ^~~~~~~~~~~
          ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘
             36 |         __builtin_choose_expr(__safe_cmp(x, y), \
                |                               ^~~~~~~~~~
          ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘
             45 | #define min(x, y)       __careful_cmp(x, y, <)
                |                         ^~~~~~~~~~~~~
          net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘
             63 |                 int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
      
      We could solve it by using min_t(), but this operation seems entirely
      unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(),
      which performs almost the same check, returning at most MAX_SKB_FRAGS
      elements. So, let's eliminate this unnecessary comparison.
      
      Fixes: 581512a6 ("vsock/virtio: MSG_ZEROCOPY flag support")
      Cc: avkrasnov@salutedevices.com
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarArseniy Krasnov <avkrasnov@salutedevices.com>
      Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b0a930e8
    • Wen Gu's avatar
      net/smc: fix missing byte order conversion in CLC handshake · c5a10397
      Wen Gu authored
      The byte order conversions of ISM GID and DMB token are missing in
      process of CLC accept and confirm. So fix it.
      
      Fixes: 3d9725a6 ("net/smc: common routine for CLC accept and confirm")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5a10397
    • Sean Nyekjaer's avatar
      net: dsa: microchip: provide a list of valid protocols for xmit handler · 1499b892
      Sean Nyekjaer authored
      Provide a list of valid protocols for which the driver will provide
      it's deferred xmit handler.
      
      When using DSA_TAG_PROTO_KSZ8795 protocol, it does not provide a
      "connect" method, therefor ksz_connect() is not allocating ksz_tagger_data.
      
      This avoids the following null pointer dereference:
       ksz_connect_tag_protocol from dsa_register_switch+0x9ac/0xee0
       dsa_register_switch from ksz_switch_register+0x65c/0x828
       ksz_switch_register from ksz_spi_probe+0x11c/0x168
       ksz_spi_probe from spi_probe+0x84/0xa8
       spi_probe from really_probe+0xc8/0x2d8
      
      Fixes: ab32f56a ("net: dsa: microchip: ptp: add packet transmission timestamping")
      Signed-off-by: default avatarSean Nyekjaer <sean@geanix.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20231206071655.1626479-1-sean@geanix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1499b892
    • Jakub Kicinski's avatar
      Merge branch 'generic-netlink-multicast-fixes' · a041adee
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      Generic netlink multicast fixes
      
      Restrict two generic netlink multicast groups - in the "psample" and
      "NET_DM" families - to be root-only with the appropriate capabilities.
      See individual patches for more details.
      ====================
      
      Link: https://lore.kernel.org/r/20231206213102.1824398-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a041adee
    • Ido Schimmel's avatar
      drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group · e0378187
      Ido Schimmel authored
      The "NET_DM" generic netlink family notifies drop locations over the
      "events" multicast group. This is problematic since by default generic
      netlink allows non-root users to listen to these notifications.
      
      Fix by adding a new field to the generic netlink multicast group
      structure that when set prevents non-root users or root without the
      'CAP_SYS_ADMIN' capability (in the user namespace owning the network
      namespace) from joining the group. Set this field for the "events"
      group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
      nature of the information that is shared over this group.
      
      Note that the capability check in this case will always be performed
      against the initial user namespace since the family is not netns aware
      and only operates in the initial network namespace.
      
      A new field is added to the structure rather than using the "flags"
      field because the existing field uses uAPI flags and it is inappropriate
      to add a new uAPI flag for an internal kernel check. In net-next we can
      rework the "flags" field to use internal flags and fold the new field
      into it. But for now, in order to reduce the amount of changes, add a
      new field.
      
      Since the information can only be consumed by root, mark the control
      plane operations that start and stop the tracing as root-only using the
      'GENL_ADMIN_PERM' flag.
      
      Tested using [1].
      
      Before:
      
       # capsh -- -c ./dm_repo
       # capsh --drop=cap_sys_admin -- -c ./dm_repo
      
      After:
      
       # capsh -- -c ./dm_repo
       # capsh --drop=cap_sys_admin -- -c ./dm_repo
       Failed to join "events" multicast group
      
      [1]
       $ cat dm.c
       #include <stdio.h>
       #include <netlink/genl/ctrl.h>
       #include <netlink/genl/genl.h>
       #include <netlink/socket.h>
      
       int main(int argc, char **argv)
       {
       	struct nl_sock *sk;
       	int grp, err;
      
       	sk = nl_socket_alloc();
       	if (!sk) {
       		fprintf(stderr, "Failed to allocate socket\n");
       		return -1;
       	}
      
       	err = genl_connect(sk);
       	if (err) {
       		fprintf(stderr, "Failed to connect socket\n");
       		return err;
       	}
      
       	grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events");
       	if (grp < 0) {
       		fprintf(stderr,
       			"Failed to resolve \"events\" multicast group\n");
       		return grp;
       	}
      
       	err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
       	if (err) {
       		fprintf(stderr, "Failed to join \"events\" multicast group\n");
       		return err;
       	}
      
       	return 0;
       }
       $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c
      
      Fixes: 9a8afc8d ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
      Reported-by: default avatar"The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e0378187
    • Ido Schimmel's avatar
      psample: Require 'CAP_NET_ADMIN' when joining "packets" group · 44ec98ea
      Ido Schimmel authored
      The "psample" generic netlink family notifies sampled packets over the
      "packets" multicast group. This is problematic since by default generic
      netlink allows non-root users to listen to these notifications.
      
      Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will
      prevent non-root users or root without the 'CAP_NET_ADMIN' capability
      (in the user namespace owning the network namespace) from joining the
      group.
      
      Tested using [1].
      
      Before:
      
       # capsh -- -c ./psample_repo
       # capsh --drop=cap_net_admin -- -c ./psample_repo
      
      After:
      
       # capsh -- -c ./psample_repo
       # capsh --drop=cap_net_admin -- -c ./psample_repo
       Failed to join "packets" multicast group
      
      [1]
       $ cat psample.c
       #include <stdio.h>
       #include <netlink/genl/ctrl.h>
       #include <netlink/genl/genl.h>
       #include <netlink/socket.h>
      
       int join_grp(struct nl_sock *sk, const char *grp_name)
       {
       	int grp, err;
      
       	grp = genl_ctrl_resolve_grp(sk, "psample", grp_name);
       	if (grp < 0) {
       		fprintf(stderr, "Failed to resolve \"%s\" multicast group\n",
       			grp_name);
       		return grp;
       	}
      
       	err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
       	if (err) {
       		fprintf(stderr, "Failed to join \"%s\" multicast group\n",
       			grp_name);
       		return err;
       	}
      
       	return 0;
       }
      
       int main(int argc, char **argv)
       {
       	struct nl_sock *sk;
       	int err;
      
       	sk = nl_socket_alloc();
       	if (!sk) {
       		fprintf(stderr, "Failed to allocate socket\n");
       		return -1;
       	}
      
       	err = genl_connect(sk);
       	if (err) {
       		fprintf(stderr, "Failed to connect socket\n");
       		return err;
       	}
      
       	err = join_grp(sk, "config");
       	if (err)
       		return err;
      
       	err = join_grp(sk, "packets");
       	if (err)
       		return err;
      
       	return 0;
       }
       $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o psample_repo psample.c
      
      Fixes: 6ae0a628 ("net: Introduce psample, a new genetlink channel for packet sampling")
      Reported-by: default avatar"The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      44ec98ea
    • Jakub Kicinski's avatar
      Merge branch 'fixes-for-ktls' · 4a02609d
      Jakub Kicinski authored
      John Fastabend says:
      
      ====================
      Couple fixes for TLS and BPF interactions.
      ====================
      
      Link: https://lore.kernel.org/r/20231206232706.374377-1-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a02609d
    • John Fastabend's avatar
      bpf: sockmap, updating the sg structure should also update curr · bb9aefde
      John Fastabend authored
      Curr pointer should be updated when the sg structure is shifted.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20231206232706.374377-3-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb9aefde
    • John Fastabend's avatar
      net: tls, update curr on splice as well · c5a59500
      John Fastabend authored
      The curr pointer must also be updated on the splice similar to how
      we do this for other copy types.
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5a59500
    • Kirill A. Shutemov's avatar
      x86/tdx: Allow 32-bit emulation by default · f4116bfc
      Kirill A. Shutemov authored
      32-bit emulation was disabled on TDX to prevent a possible attack by
      a VMM injecting an interrupt on vector 0x80.
      
      Now that int80_emulation() has a check for external interrupts the
      limitation can be lifted.
      
      To distinguish software interrupts from external ones, int80_emulation()
      checks the APIC ISR bit relevant to the 0x80 vector. For
      software interrupts, this bit will be 0.
      
      On TDX, the VAPIC state (including ISR) is protected and cannot be
      manipulated by the VMM. The ISR bit is set by the microcode flow during
      the handling of posted interrupts.
      
      [ dhansen: more changelog tweaks ]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      f4116bfc
    • Thomas Gleixner's avatar
      x86/entry: Do not allow external 0x80 interrupts · 55617fb9
      Thomas Gleixner authored
      The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
      kernel expects to receive a software interrupt as a result of the INT
      0x80 instruction. However, an external interrupt on the same vector
      also triggers the same codepath.
      
      An external interrupt on vector 0x80 will currently be interpreted as a
      32-bit system call, and assuming that it was a user context.
      
      Panic on external interrupts on the vector.
      
      To distinguish software interrupts from external ones, the kernel checks
      the APIC ISR bit relevant to the 0x80 vector. For software interrupts,
      this bit will be 0.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      55617fb9
    • Thomas Gleixner's avatar
      x86/entry: Convert INT 0x80 emulation to IDTENTRY · be5341eb
      Thomas Gleixner authored
      There is no real reason to have a separate ASM entry point implementation
      for the legacy INT 0x80 syscall emulation on 64-bit.
      
      IDTENTRY provides all the functionality needed with the only difference
      that it does not:
      
        - save the syscall number (AX) into pt_regs::orig_ax
        - set pt_regs::ax to -ENOSYS
      
      Both can be done safely in the C code of an IDTENTRY before invoking any of
      the syscall related functions which depend on this convention.
      
      Aside of ASM code reduction this prepares for detecting and handling a
      local APIC injected vector 0x80.
      
      [ kirill.shutemov: More verbose comments ]
      Suggested-by: default avatarLinus Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+
      be5341eb
    • Kirill A. Shutemov's avatar
      x86/coco: Disable 32-bit emulation by default on TDX and SEV · b82a8dbd
      Kirill A. Shutemov authored
      The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
      kernel expects to receive a software interrupt as a result of the INT
      0x80 instruction. However, an external interrupt on the same vector
      triggers the same handler.
      
      The kernel interprets an external interrupt on vector 0x80 as a 32-bit
      system call that came from userspace.
      
      A VMM can inject external interrupts on any arbitrary vector at any
      time.  This remains true even for TDX and SEV guests where the VMM is
      untrusted.
      
      Put together, this allows an untrusted VMM to trigger int80 syscall
      handling at any given point. The content of the guest register file at
      that moment defines what syscall is triggered and its arguments. It
      opens the guest OS to manipulation from the VMM side.
      
      Disable 32-bit emulation by default for TDX and SEV. User can override
      it with the ia32_emulation=y command line option.
      
      [ dhansen: reword the changelog ]
      Reported-by: default avatarSupraja Sridhara <supraja.sridhara@inf.ethz.ch>
      Reported-by: default avatarBenedict Schlüter <benedict.schlueter@inf.ethz.ch>
      Reported-by: default avatarMark Kuhne <mark.kuhne@inf.ethz.ch>
      Reported-by: default avatarAndrin Bertschi <andrin.bertschi@inf.ethz.ch>
      Reported-by: default avatarShweta Shinde <shweta.shinde@inf.ethz.ch>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # v6.0+: 1da5c9bc x86: Introduce ia32_enabled()
      Cc: <stable@vger.kernel.org> # v6.0+
      b82a8dbd
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 4de75d3e
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Incorrect nf_defrag registration for bpf link infra, from D. Wythe.
      
      2) Skip inactive elements in pipapo set backend walk to avoid double
         deactivation, from Florian Westphal.
      
      3) Fix NFT_*_F_PRESENT check with big endian arch, also from Florian.
      
      4) Bail out if number of expressions in NFTA_DYNSET_EXPRESSIONS mismatch
         stateful expressions in set declaration.
      
      5) Honor family in table lookup by handle. Broken since 4.16.
      
      6) Use sk_callback_lock to protect access to sk->sk_socket in xt_owner.
         sock_orphan() might zap this pointer, from Phil Sutter.
      
      All of these fixes address broken stuff for several releases.
      
      * tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
        netfilter: nf_tables: validate family when identifying table via handle
        netfilter: nf_tables: bail out on mismatching dynset and set expressions
        netfilter: nf_tables: fix 'exist' matching on bigendian arches
        netfilter: nft_set_pipapo: skip inactive elements during set walk
        netfilter: bpf: fix bad registration on nf_defrag
      ====================
      
      Link: https://lore.kernel.org/r/20231206180357.959930-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4de75d3e
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c85e5594
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-12-06
      
      We've added 4 non-merge commits during the last 6 day(s) which contain
      a total of 7 files changed, 185 insertions(+), 55 deletions(-).
      
      The main changes are:
      
      1) Fix race found by syzkaller on prog_array_map_poke_run when
         a BPF program's kallsym symbols were still missing, from Jiri Olsa.
      
      2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA,
         from Yonghong Song.
      
      3) Fix xsk's poll handling to only set mask on bound xsk sockets,
         from Yewon Choi.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add test for early update in prog_array_map_poke_run
        bpf: Fix prog_array_map_poke_run map poke update
        xsk: Skip polling event check for unbound socket
        bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
      ====================
      
      Link: https://lore.kernel.org/r/20231206220528.12093-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c85e5594
    • Arnd Bergmann's avatar
      Merge tag 'imx-fixes-6.7' of... · 7c9bb190
      Arnd Bergmann authored
      Merge tag 'imx-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
      
      i.MX fixes for 6.7:
      
      - A MAINTAINERS update to reinstate freescale ARM64 DT directory in i.MX
        entry.
      - A series from Alexander Stein to fix #pwm-cells for imx8-ss.
      - A series from Haibo Chen to fix GPIO node name for i.MX93 and
        i.MX8ULP.
      - Add parkmode-disable-ss-quirk for DWC3 on i.MX8MP and i.MX8MQ to fix
        an issue that the controller may hang when processing transactions
        under heavy USB traffic from multiple endpoints.
      - Fix mediamix block power on/off for i.MX93 by correcting the power
        domain clock to be 'nic_media'.
      - A couple of Ethernet PHY clock regression fixes for imx6ul-pico and
        imx6q-skov board.
      - Fix edma3 power domain for i.MX8QM to fix a panic during startup
        process.
      
      * tag 'imx-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
        ARM: dts: imx28-xea: Pass the 'model' property
        ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
        MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry
        arm64: dts: imx8-apalis: set wifi regulator to always-on
        ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
        arm64: dts: imx8ulp: update gpio node name to align with register address
        arm64: dts: imx93: update gpio node name to align with register address
        arm64: dts: imx93: correct mediamix power
        arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup
        arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells
        arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
        dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
        ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
        arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3
        ARM: dts: imx6q: skov: fix ethernet clock regression
        arm64: dt: imx93: tqma9352-mba93xxla: Fix LPUART2 pad config
      
      Link: https://lore.kernel.org/r/20231207005202.GF270430@dragonSigned-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      7c9bb190
    • Hui Zhou's avatar
      nfp: flower: fix for take a mutex lock in soft irq context and rcu lock · 0ad722bd
      Hui Zhou authored
      The neighbour event callback call the function nfp_tun_write_neigh,
      this function will take a mutex lock and it is in soft irq context,
      change the work queue to process the neighbour event.
      
      Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock()
      in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6.
      
      Fixes: abc21095 ("nfp: flower: tunnel neigh support bond offload")
      CC: stable@vger.kernel.org # 6.2+
      Signed-off-by: default avatarHui Zhou <hui.zhou@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad722bd
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 55b224d9
      Linus Torvalds authored
      Pull parisc fix from Helge Deller:
       "A single line patch for parisc which fixes the build in tinyconfig
        configurations:
      
         - Fix asm operand number out of range build error in bug table"
      
      * tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix asm operand number out of range build error in bug table
      55b224d9
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 803a809d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-12-05 (ice, i40e, iavf)
      
      This series contains updates to ice, i40e and iavf drivers.
      
      Michal fixes incorrect usage of VF MSIX value and index calculation for
      ice.
      
      Marcin restores disabling of Rx VLAN filtering which was inadvertently
      removed for ice.
      
      Ivan Vecera corrects improper messaging of MFS port for i40e.
      
      Jake fixes incorrect checking of coalesce values on iavf.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero
        i40e: Fix unexpected MFS warning message
        ice: Restore fix disabling RX VLAN filtering
        ice: change vfs.num_msix_per to vf->num_msix
      ====================
      
      Link: https://lore.kernel.org/r/20231205211918.2123019-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      803a809d
    • Tobias Waldekranz's avatar
      net: dsa: mv88e6xxx: Restore USXGMII support for 6393X · 0c7ed1f9
      Tobias Waldekranz authored
      In 4a562127, USXGMII support was added for 6393X, but this was
      lost in the PCS conversion (the blamed commit), most likely because
      these efforts where more or less done in parallel.
      
      Restore this feature by porting Michal's patch to fit the new
      implementation.
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarMichal Smulski <michal.smulski@ooma.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: e5b732a2 ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c7ed1f9
    • Eric Dumazet's avatar
      tcp: do not accept ACK of bytes we never sent · 3d501dd3
      Eric Dumazet authored
      This patch is based on a detailed report and ideas from Yepeng Pan
      and Christian Rossow.
      
      ACK seq validation is currently following RFC 5961 5.2 guidelines:
      
         The ACK value is considered acceptable only if
         it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
         SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
         above condition MUST be discarded and an ACK sent back.  It needs to
         be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
         duplicate (SEG.ACK < SND.UNA), it can be ignored.  If the ACK
         acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
         ACK, drop the segment, and return".  The "ignored" above implies that
         the processing of the incoming data segment continues, which means
         the ACK value is treated as acceptable.  This mitigation makes the
         ACK check more stringent since any ACK < SND.UNA wouldn't be
         accepted, instead only ACKs that are in the range ((SND.UNA -
         MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
      
      This can be refined for new (and possibly spoofed) flows,
      by not accepting ACK for bytes that were never sent.
      
      This greatly improves TCP security at a little cost.
      
      I added a Fixes: tag to make sure this patch will reach stable trees,
      even if the 'blamed' patch was adhering to the RFC.
      
      tp->bytes_acked was added in linux-4.2
      
      Following packetdrill test (courtesy of Yepeng Pan) shows
      the issue at hand:
      
      0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1024) = 0
      
      // ---------------- Handshake ------------------- //
      
      // when window scale is set to 14 the window size can be extended to
      // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
      // with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
      // ,though this ack number acknowledges some data never
      // sent by the server.
      
      +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14>
      +0 > S. 0:0(0) ack 1 <...>
      +0 < . 1:1(0) ack 1 win 65535
      +0 accept(3, ..., ...) = 4
      
      // For the established connection, we send an ACK packet,
      // the ack packet uses ack number 1 - 1073725300 + 2^32,
      // where 2^32 is used to wrap around.
      // Note: we used 1073725300 instead of 1073725440 to avoid possible
      // edge cases.
      // 1 - 1073725300 + 2^32 = 3221241997
      
      // Oops, old kernels happily accept this packet.
      +0 < . 1:1001(1000) ack 3221241997 win 65535
      
      // After the kernel fix the following will be replaced by a challenge ACK,
      // and prior malicious frame would be dropped.
      +0 > . 1:1(0) ack 1001
      
      Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarYepeng Pan <yepeng.pan@cispa.de>
      Reported-by: default avatarChristian Rossow <rossow@cispa.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d501dd3
    • Jiexun Wang's avatar
      mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range() · b2f557a2
      Jiexun Wang authored
      I conducted real-time testing and observed that
      madvise_cold_or_pageout_pte_range() causes significant latency under
      memory pressure, which can be effectively reduced by adding cond_resched()
      within the loop.
      
      I tested on the LicheePi 4A board using Cylictest for latency testing and
      Ftrace for latency tracing.  The board uses TH1520 processor and has a
      memory size of 8GB.  The kernel version is 6.5.0 with the PREEMPT_RT patch
      applied.
      
      The script I tested is as follows:
      
      echo wakeup_rt > /sys/kernel/tracing/current_tracer
      echo 1 > /sys/kernel/tracing/tracing_on
      echo 0 > /sys/kernel/tracing/tracing_max_latency
      stress-ng --vm 8 --vm-bytes 2G &
      cyclictest --mlockall --smp --priority=99 --distance=0 --duration=30m
      echo 0 > /sys/kernel/tracing/tracing_on
      cat /sys/kernel/tracing/trace 
      
      The tracing results before modification are as follows:
      
      # tracer: wakeup_rt
      #
      # wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00003-g999d221864bf
      # --------------------------------------------------------------------
      # latency: 2552 us, #6/6, CPU#3 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4)
      #    -----------------
      #    | task: cyclictest-196 (uid:0 nice:0 policy:1 rt_prio:99)
      #    -----------------
      #
      #                    _--------=> CPU#
      #                   / _-------=> irqs-off/BH-disabled
      #                  | / _------=> need-resched
      #                  || / _-----=> need-resched-lazy
      #                  ||| / _----=> hardirq/softirq
      #                  |||| / _---=> preempt-depth
      #                  ||||| / _--=> preempt-lazy-depth
      #                  |||||| / _-=> migrate-disable
      #                  ||||||| /     delay
      #  cmd     pid     |||||||| time  |   caller
      #     \   /        ||||||||  \    |    /
      stress-n-206       3dn.h512    2us :      206:120:R   + [003]     196:  0:R cyclictest
      stress-n-206       3dn.h512    7us : <stack trace>
       => __ftrace_trace_stack
       => __trace_stack
       => probe_wakeup
       => ttwu_do_activate
       => try_to_wake_up
       => wake_up_process
       => hrtimer_wakeup
       => __hrtimer_run_queues
       => hrtimer_interrupt
       => riscv_timer_interrupt
       => handle_percpu_devid_irq
       => generic_handle_domain_irq
       => riscv_intc_irq
       => handle_riscv_irq
       => do_irq
      stress-n-206       3dn.h512    9us#: 0
      stress-n-206       3d...3.. 2544us : __schedule
      stress-n-206       3d...3.. 2545us :      206:120:R ==> [003]     196:  0:R cyclictest
      stress-n-206       3d...3.. 2551us : <stack trace>
       => __ftrace_trace_stack
       => __trace_stack
       => probe_wakeup_sched_switch
       => __schedule
       => preempt_schedule
       => migrate_enable
       => rt_spin_unlock
       => madvise_cold_or_pageout_pte_range
       => walk_pgd_range
       => __walk_page_range
       => walk_page_range
       => madvise_pageout
       => madvise_vma_behavior
       => do_madvise
       => sys_madvise
       => do_trap_ecall_u
       => ret_from_exception
      
      The tracing results after modification are as follows:
      
      # tracer: wakeup_rt
      #
      # wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00004-gca3876fc69a6-dirty
      # --------------------------------------------------------------------
      # latency: 1689 us, #6/6, CPU#0 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4)
      #    -----------------
      #    | task: cyclictest-217 (uid:0 nice:0 policy:1 rt_prio:99)
      #    -----------------
      #
      #                    _--------=> CPU#
      #                   / _-------=> irqs-off/BH-disabled
      #                  | / _------=> need-resched
      #                  || / _-----=> need-resched-lazy
      #                  ||| / _----=> hardirq/softirq
      #                  |||| / _---=> preempt-depth
      #                  ||||| / _--=> preempt-lazy-depth
      #                  |||||| / _-=> migrate-disable
      #                  ||||||| /     delay
      #  cmd     pid     |||||||| time  |   caller
      #     \   /        ||||||||  \    |    /
      stress-n-232       0dn.h413    1us+:      232:120:R   + [000]     217:  0:R cyclictest
      stress-n-232       0dn.h413   12us : <stack trace>
       => __ftrace_trace_stack
       => __trace_stack
       => probe_wakeup
       => ttwu_do_activate
       => try_to_wake_up
       => wake_up_process
       => hrtimer_wakeup
       => __hrtimer_run_queues
       => hrtimer_interrupt
       => riscv_timer_interrupt
       => handle_percpu_devid_irq
       => generic_handle_domain_irq
       => riscv_intc_irq
       => handle_riscv_irq
       => do_irq
      stress-n-232       0dn.h413   19us#: 0
      stress-n-232       0d...3.. 1671us : __schedule
      stress-n-232       0d...3.. 1676us+:      232:120:R ==> [000]     217:  0:R cyclictest
      stress-n-232       0d...3.. 1687us : <stack trace>
       => __ftrace_trace_stack
       => __trace_stack
       => probe_wakeup_sched_switch
       => __schedule
       => preempt_schedule
       => migrate_enable
       => free_unref_page_list
       => release_pages
       => free_pages_and_swap_cache
       => tlb_batch_pages_flush
       => tlb_flush_mmu
       => unmap_page_range
       => unmap_vmas
       => unmap_region
       => do_vmi_align_munmap.constprop.0
       => do_vmi_munmap
       => __vm_munmap
       => sys_munmap
       => do_trap_ecall_u
       => ret_from_exception
      
      After the modification, the cause of maximum latency is no longer
      madvise_cold_or_pageout_pte_range(), so this modification can reduce the
      latency caused by madvise_cold_or_pageout_pte_range().
      
      
      Currently the madvise_cold_or_pageout_pte_range() function exhibits
      significant latency under memory pressure, which can be effectively
      reduced by adding cond_resched() within the loop.
      
      When the batch_count reaches SWAP_CLUSTER_MAX, we reschedule
      the task to ensure fairness and avoid long lock holding times.
      
      Link: https://lkml.kernel.org/r/85363861af65fac66c7a98c251906afc0d9c8098.1695291046.git.wangjiexun@tinylab.orgSigned-off-by: default avatarJiexun Wang <wangjiexun@tinylab.org>
      Cc: Zhangjin Wu <falcon@tinylab.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b2f557a2
    • Ryusuke Konishi's avatar
      nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() · 675abf8d
      Ryusuke Konishi authored
      If nilfs2 reads a disk image with corrupted segment usage metadata, and
      its segment usage information is marked as an error for the segment at the
      write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
      during log writing.
      
      Segments newly allocated for writing with nilfs_sufile_alloc() will not
      have this error flag set, but this unexpected situation will occur if the
      segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
      segment) was marked in error.
      
      Fix this issue by inserting a sanity check to treat it as a file system
      corruption.
      
      Since error returns are not allowed during the execution phase where
      nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
      into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
      segment usage record to be updated and sets it up in a dirty state for
      writing.
      
      In addition, nilfs_sufile_set_segment_usage() is also called when
      canceling log writing and undoing segment usage update, so in order to
      avoid issuing the same kernel warning in that case, in case of
      cancellation, avoid checking the error flag in
      nilfs_sufile_set_segment_usage().
      
      Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      675abf8d
    • Sidhartha Kumar's avatar
      mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI · 4a3ef6be
      Sidhartha Kumar authored
      After commit a08c7193 "mm/filemap: remove hugetlb special casing in
      filemap.c", hugetlb pages are stored in the page cache in base page sized
      indexes.  This leads to multi index stores in the xarray which is only
      supporting through CONFIG_XARRAY_MULTI.  The other page cache user of
      multi index stores ,THP, selects XARRAY_MULTI.  Have CONFIG_HUGETLB_PAGE
      follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE
      && !CONFIG_XARRAY_MULTI config.
      
      Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com
      Fixes: a08c7193 ("mm/filemap: remove hugetlb special casing in filemap.c")
      Signed-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4a3ef6be
    • Florian Fainelli's avatar
      scripts/gdb: fix lx-device-list-bus and lx-device-list-class · 801a2b1b
      Florian Fainelli authored
      After the conversion to bus_to_subsys() and class_to_subsys(), the gdb
      scripts listing the system buses and classes respectively was broken, fix
      those by returning the subsys_priv pointer and have the various caller
      de-reference either the 'bus' or 'class' structure members accordingly.
      
      Link: https://lkml.kernel.org/r/20231130043317.174188-1-florian.fainelli@broadcom.com
      Fixes: 7b884b7f ("driver core: class.c: convert to only use class_to_subsys")
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Tested-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Kieran Bingham <kbingham@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      801a2b1b
    • Bagas Sanjaya's avatar
      MAINTAINERS: drop Antti Palosaari · bc220fe7
      Bagas Sanjaya authored
      He is currently inactive (last message from him is two years ago [1]). 
      His media tree [2] is also dormant (latest activity is 6 years ago), yet
      his site is still online [3].
      
      Drop him from MAINTAINERS and add CREDITS entry for him. We thank him
      for maintaining various DVB drivers.
      
      [1]: https://lore.kernel.org/all/660772b3-0597-02db-ed94-c6a9be04e8e8@iki.fi/
      [2]: https://git.linuxtv.org/anttip/media_tree.git/
      [3]: https://palosaari.fi/linux/
      
      Link: https://lkml.kernel.org/r/20231130083848.5396-1-bagasdotme@gmail.comSigned-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Acked-by: default avatarAntti Palosaari <crope@iki.fi>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bc220fe7
    • Su Hui's avatar
      highmem: fix a memory copy problem in memcpy_from_folio · 73424d00
      Su Hui authored
      Clang static checker complains that value stored to 'from' is never read. 
      And memcpy_from_folio() only copy the last chunk memory from folio to
      destination.  Use 'to += chunk' to replace 'from += chunk' to fix this
      typo problem.
      
      Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com
      Fixes: b23d03ef ("highmem: add memcpy_to_folio() and memcpy_from_folio()")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73424d00
    • Ryusuke Konishi's avatar
      nilfs2: fix missing error check for sb_set_blocksize call · d61d0ab5
      Ryusuke Konishi authored
      When mounting a filesystem image with a block size larger than the page
      size, nilfs2 repeatedly outputs long error messages with stack traces to
      the kernel log, such as the following:
      
       getblk(): invalid block size 8192 requested
       logical block size: 512
       ...
       Call Trace:
        dump_stack_lvl+0x92/0xd4
        dump_stack+0xd/0x10
        bdev_getblk+0x33a/0x354
        __breadahead+0x11/0x80
        nilfs_search_super_root+0xe2/0x704 [nilfs2]
        load_nilfs+0x72/0x504 [nilfs2]
        nilfs_mount+0x30f/0x518 [nilfs2]
        legacy_get_tree+0x1b/0x40
        vfs_get_tree+0x18/0xc4
        path_mount+0x786/0xa88
        __ia32_sys_mount+0x147/0x1a8
        __do_fast_syscall_32+0x56/0xc8
        do_fast_syscall_32+0x29/0x58
        do_SYSENTER_32+0x15/0x18
        entry_SYSENTER_32+0x98/0xf1
       ...
      
      This overloads the system logger.  And to make matters worse, it sometimes
      crashes the kernel with a memory access violation.
      
      This is because the return value of the sb_set_blocksize() call, which
      should be checked for errors, is not checked.
      
      The latter issue is due to out-of-buffer memory being accessed based on a
      large block size that caused sb_set_blocksize() to fail for buffers read
      with the initial minimum block size that remained unupdated in the
      super_block structure.
      
      Since nilfs2 mkfs tool does not accept block sizes larger than the system
      page size, this has been overlooked.  However, it is possible to create
      this situation by intentionally modifying the tool or by passing a
      filesystem image created on a system with a large page size to a system
      with a smaller page size and mounting it.
      
      Fix this issue by inserting the expected error handling for the call to
      sb_set_blocksize().
      
      Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d61d0ab5