1. 31 Jul, 2024 20 commits
  2. 25 Jul, 2024 20 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1722389b
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and netfilter.
      
        A lot of networking people were at a conference last week, busy
        catching COVID, so relatively short PR.
      
        Current release - regressions:
      
         - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
      
        Current release - new code bugs:
      
         - l2tp: protect session IDR and tunnel session list with one lock,
           make sure the state is coherent to avoid a warning
      
         - eth: bnxt_en: update xdp_rxq_info in queue restart logic
      
         - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
      
        Previous releases - regressions:
      
         - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
           the field reuses previously un-validated pad
      
        Previous releases - always broken:
      
         - tap/tun: drop short frames to prevent crashes later in the stack
      
         - eth: ice: add a per-VF limit on number of FDIR filters
      
         - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
      
      * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
        tun: add missing verification for short frame
        tap: add missing verification for short frame
        mISDN: Fix a use after free in hfcmulti_tx()
        gve: Fix an edge case for TSO skb validity check
        bnxt_en: update xdp_rxq_info in queue restart logic
        tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
        selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
        xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
        bpf: Fix a segment issue when downgrading gso_size
        net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
        MAINTAINERS: make Breno the netconsole maintainer
        MAINTAINERS: Update bonding entry
        net: nexthop: Initialize all fields in dumped nexthops
        net: stmmac: Correct byte order of perfect_match
        selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
        tipc: Return non-zero value from tipc_udp_addr2str() on error
        netfilter: nft_set_pipapo_avx2: disable softinterrupts
        ice: Fix recipe read procedure
        ice: Add a per-VF limit on number of FDIR filters
        net: bonding: correctly annotate RCU in bond_should_notify_peers()
        ...
      1722389b
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 8bf10009
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - trivial printk changes
      
      The bigger "real" printk work is still being discussed.
      
      * tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        vsprintf: add missing MODULE_DESCRIPTION() macro
        printk: Rename console_replay_all() and update context
      8bf10009
    • Linus Torvalds's avatar
      Merge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl · b4856250
      Linus Torvalds authored
      Pull sysctl constification from Joel Granados:
       "Treewide constification of the ctl_table argument of proc_handlers
        using a coccinelle script and some manual code formatting fixups.
      
        This is a prerequisite to moving the static ctl_table structs into
        read-only data section which will ensure that proc_handler function
        pointers cannot be modified"
      
      * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
        sysctl: treewide: constify the ctl_table argument of proc_handlers
      b4856250
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · bba959f4
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Wipe screen_info after allocating it from the heap - used by arm32
         and EFI zboot, other EFI architectures allocate it statically
      
       - Revert to allocating boot_params from the heap on x86 when entering
         via the native PE entrypoint, to work around a regression on older
         Dell hardware
      
      * tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        x86/efistub: Revert to heap allocated boot_params for PE entrypoint
        efi/libstub: Zero initialize heap allocated struct screen_info
      bba959f4
    • Linus Torvalds's avatar
      Merge tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · 9b219936
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "Three small changes this cycle:
      
         - Clean up an architecture abstraction that is no longer needed
           because all the architectures have converged.
      
         - Actually use the prompt argument to kdb_position_cursor() instead
           of ignoring it (functionally this fix is a nop but that was due to
           luck rather than good judgement)
      
         - Fix a -Wformat-security warning"
      
      * tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Get rid of redundant kdb_curr_task()
        kdb: Use the passed prompt in kdb_position_cursor()
        kdb: address -Wformat-security warnings
      9b219936
    • Linus Torvalds's avatar
      Merge tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 28e7241c
      Linus Torvalds authored
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - Use improved timer sync for Loongson64
      
       - Fix address of GCR_ACCESS register
      
       - Add missing MODULE_DESCRIPTION
      
      * tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        mips: sibyte: add missing MODULE_DESCRIPTION() macro
        MIPS: SMP-CPS: Fix address for GCR_ACCESS register for CM3 and later
        MIPS: Loongson64: Switch to SYNC_R4K
      28e7241c
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.11-rc1' of... · f6464295
      Linus Torvalds authored
      Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
      
      Pull parisc updates from Helge Deller:
       "The gettimeofday() and clock_gettime() syscalls are now available as
        vDSO functions, and Dave added a patch which allows to use NVMe cards
        in the PCI slots as fast and easy alternative to SCSI discs.
      
        Summary:
      
         - add gettimeofday() and clock_gettime() vDSO functions
      
         - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
           with PCIe NVME card to function in parisc machines
      
         - allow users to reduce kernel unaligned runtime warnings
      
         - minor code cleanups"
      
      * tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
        parisc: Use max() to calculate parisc_tlb_flush_threshold
        parisc: Fix warning at drivers/pci/msi/msi.h:121
        parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
        parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
        parisc: Clean up unistd.h file
      f6464295
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · f9bcc61a
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Support for preemption
      
       - i386 Rust support
      
       - Huge cleanup by Benjamin Berg
      
       - UBSAN support
      
       - Removal of dead code
      
      * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
        um: vector: always reset vp->opened
        um: vector: remove vp->lock
        um: register power-off handler
        um: line: always fill *error_out in setup_one_line()
        um: remove pcap driver from documentation
        um: Enable preemption in UML
        um: refactor TLB update handling
        um: simplify and consolidate TLB updates
        um: remove force_flush_all from fork_handler
        um: Do not flush MM in flush_thread
        um: Delay flushing syscalls until the thread is restarted
        um: remove copy_context_skas0
        um: remove LDT support
        um: compress memory related stub syscalls while adding them
        um: Rework syscall handling
        um: Add generic stub_syscall6 function
        um: Create signal stack memory assignment in stub_data
        um: Remove stub-data.h include from common-offsets.h
        um: time-travel: fix signal blocking race/hang
        um: time-travel: remove time_exit()
        ...
      f9bcc61a
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.11-rc1' of... · c2a96b7f
      Linus Torvalds authored
      Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the big set of driver core changes for 6.11-rc1.
      
        Lots of stuff in here, with not a huge diffstat, but apis are evolving
        which required lots of files to be touched. Highlights of the changes
        in here are:
      
         - platform remove callback api final fixups (Uwe took many releases
           to get here, finally!)
      
         - Rust bindings for basic firmware apis and initial driver-core
           interactions.
      
           It's not all that useful for a "write a whole driver in rust" type
           of thing, but the firmware bindings do help out the phy rust
           drivers, and the driver core bindings give a solid base on which
           others can start their work.
      
           There is still a long way to go here before we have a multitude of
           rust drivers being added, but it's a great first step.
      
         - driver core const api changes.
      
           This reached across all bus types, and there are some fix-ups for
           some not-common bus types that linux-next and 0-day testing shook
           out.
      
           This work is being done to help make the rust bindings more safe,
           as well as the C code, moving toward the end-goal of allowing us to
           put driver structures into read-only memory. We aren't there yet,
           but are getting closer.
      
         - minor devres cleanups and fixes found by code inspection
      
         - arch_topology minor changes
      
         - other minor driver core cleanups
      
        All of these have been in linux-next for a very long time with no
        reported problems"
      
      * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
        ARM: sa1100: make match function take a const pointer
        sysfs/cpu: Make crash_hotplug attribute world-readable
        dio: Have dio_bus_match() callback take a const *
        zorro: make match function take a const pointer
        driver core: module: make module_[add|remove]_driver take a const *
        driver core: make driver_find_device() take a const *
        driver core: make driver_[create|remove]_file take a const *
        firmware_loader: fix soundness issue in `request_internal`
        firmware_loader: annotate doctests as `no_run`
        devres: Correct code style for functions that return a pointer type
        devres: Initialize an uninitialized struct member
        devres: Fix memory leakage caused by driver API devm_free_percpu()
        devres: Fix devm_krealloc() wasting memory
        driver core: platform: Switch to use kmemdup_array()
        driver core: have match() callback in struct bus_type take a const *
        MAINTAINERS: add Rust device abstractions to DRIVER CORE
        device: rust: improve safety comments
        MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
        MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
        firmware: rust: improve safety comments
        ...
      c2a96b7f
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog · b2eed733
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - make watchdog_class const
      
       - rework of the rzg2l_wdt driver
      
       - other small fixes and improvements
      
      * tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog:
        dt-bindings: watchdog: dlg,da9062-watchdog: Drop blank space
        watchdog: rzn1: Convert comma to semicolon
        watchdog: lenovo_se10_wdt: Convert comma to semicolon
        dt-bindings: watchdog: renesas,wdt: Document RZ/G3S support
        watchdog: rzg2l_wdt: Add suspend/resume support
        watchdog: rzg2l_wdt: Rely on the reset driver for doing proper reset
        watchdog: rzg2l_wdt: Remove comparison with zero
        watchdog: rzg2l_wdt: Remove reset de-assert from probe
        watchdog: rzg2l_wdt: Check return status of pm_runtime_put()
        watchdog: rzg2l_wdt: Use pm_runtime_resume_and_get()
        watchdog: rzg2l_wdt: Make the driver depend on PM
        watchdog: rzg2l_wdt: Restrict the driver to ARCH_RZG2L and ARCH_R9A09G011
        watchdog: imx7ulp_wdt: keep already running watchdog enabled
        watchdog: starfive: Add missing clk_disable_unprepare()
        watchdog: Make watchdog_class const
      b2eed733
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping · 9cf601e8
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
      
       - fix the order of actions in dmam_free_coherent (Lance Richardson)
      
      * tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping:
        dma: fix call order in dmam_free_coherent
      9cf601e8
    • Jakub Kicinski's avatar
      Merge branch 'tap-tun-harden-by-dropping-short-frame' · af65ea42
      Jakub Kicinski authored
      Dongli Zhang says:
      
      ====================
      tap/tun: harden by dropping short frame
      
      This is to harden all of tap/tun to avoid any short frame smaller than the
      Ethernet header (ETH_HLEN).
      
      While the xen-netback already rejects short frame smaller than ETH_HLEN ...
      
       914 static void xenvif_tx_build_gops(struct xenvif_queue *queue,
       915                                      int budget,
       916                                      unsigned *copy_ops,
       917                                      unsigned *map_ops)
       918 {
      ... ...
      1007                 if (unlikely(txreq.size < ETH_HLEN)) {
      1008                         netdev_dbg(queue->vif->dev,
      1009                                    "Bad packet size: %d\n", txreq.size);
      1010                         xenvif_tx_err(queue, &txreq, extra_count, idx);
      1011                         break;
      1012                 }
      
      ... the short frame may not be dropped by vhost-net/tap/tun.
      
      This fixes CVE-2024-41090 and CVE-2024-41091.
      ====================
      
      Link: https://patch.msgid.link/20240724170452.16837-1-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af65ea42
    • Dongli Zhang's avatar
      tun: add missing verification for short frame · 04958480
      Dongli Zhang authored
      The cited commit missed to check against the validity of the frame length
      in the tun_xdp_one() path, which could cause a corrupted skb to be sent
      downstack. Even before the skb is transmitted, the
      tun_xdp_one-->eth_type_trans() may access the Ethernet header although it
      can be less than ETH_HLEN. Once transmitted, this could either cause
      out-of-bound access beyond the actual length, or confuse the underlayer
      with incorrect or inconsistent header length in the skb metadata.
      
      In the alternative path, tun_get_user() already prohibits short frame which
      has the length less than Ethernet header size from being transmitted for
      IFF_TAP.
      
      This is to drop any frame shorter than the Ethernet header size just like
      how tun_get_user() does.
      
      CVE: CVE-2024-41091
      Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
      Fixes: 043d222f ("tuntap: accept an array of XDP buffs through sendmsg()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      04958480
    • Si-Wei Liu's avatar
      tap: add missing verification for short frame · ed7f2afd
      Si-Wei Liu authored
      The cited commit missed to check against the validity of the frame length
      in the tap_get_user_xdp() path, which could cause a corrupted skb to be
      sent downstack. Even before the skb is transmitted, the
      tap_get_user_xdp()-->skb_set_network_header() may assume the size is more
      than ETH_HLEN. Once transmitted, this could either cause out-of-bound
      access beyond the actual length, or confuse the underlayer with incorrect
      or inconsistent header length in the skb metadata.
      
      In the alternative path, tap_get_user() already prohibits short frame which
      has the length less than Ethernet header size from being transmitted.
      
      This is to drop any frame shorter than the Ethernet header size just like
      how tap_get_user() does.
      
      CVE: CVE-2024-41090
      Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
      Fixes: 0efac277 ("tap: accept an array of XDP buffs through sendmsg()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed7f2afd
    • Dan Carpenter's avatar
      mISDN: Fix a use after free in hfcmulti_tx() · 61ab7514
      Dan Carpenter authored
      Don't dereference *sp after calling dev_kfree_skb(*sp).
      
      Fixes: af69fb3a ("Add mISDN HFC multiport driver")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/8be65f5a-c2dd-4ba0-8a10-bfe5980b8cfb@stanley.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61ab7514
    • Bailey Forrest's avatar
      gve: Fix an edge case for TSO skb validity check · 36e3b949
      Bailey Forrest authored
      The NIC requires each TSO segment to not span more than 10
      descriptors. NIC further requires each descriptor to not exceed
      16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).
      
      The descriptors for an skb are generated by
      gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
      gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
      generates a descriptor for the entire frag if the frag size is
      not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
      greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
      of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
      the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).
      
      gve_can_send_tso() checks if the descriptors thus generated for an
      skb would meet the requirement that each TSO-segment not span more
      than 10 descriptors. However, the current code misses an edge case
      when a TSO segment spans multiple descriptors within a large frag.
      This change fixes the edge case.
      
      gve_can_send_tso() relies on the assumption that max gso size (9728)
      is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
      fragment a TSO segment can never span more than 2 descriptors.
      
      Fixes: a57e5de4 ("gve: DQO: Add TX path")
      Signed-off-by: default avatarPraveen Kaligineedi <pkaligineedi@google.com>
      Signed-off-by: default avatarBailey Forrest <bcf@google.com>
      Reviewed-by: default avatarJeroen de Borst <jeroendb@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      36e3b949
    • Taehee Yoo's avatar
      bnxt_en: update xdp_rxq_info in queue restart logic · b537633c
      Taehee Yoo authored
      When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver
      updates(creates and deletes) a page_pool.
      But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still
      connected to an old page_pool.
      So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but
      bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool.
      
      An old page_pool is no longer used so it is supposed to be
      deleted by page_pool_destroy() but it isn't.
      Because the xdp_rxq_info is holding the reference count for it and the
      xdp_rxq_info is not updated, an old page_pool will not be deleted in
      the queue restart logic.
      
      Before restarting 1 queue:
      ./tools/net/ynl/samples/page-pool
      enp10s0f1np1[6] page pools: 4 (zombies: 0)
      	refs: 8192 bytes: 33554432 (refs: 0 bytes: 0)
      	recycling: 0.0% (alloc: 128:8048 recycle: 0:0)
      
      After restarting 1 queue:
      ./tools/net/ynl/samples/page-pool
      enp10s0f1np1[6] page pools: 5 (zombies: 0)
      	refs: 10240 bytes: 41943040 (refs: 0 bytes: 0)
      	recycling: 20.0% (alloc: 160:10080 recycle: 1920:128)
      
      Before restarting queues, an interface has 4 page_pools.
      After restarting one queue, an interface has 5 page_pools, but it
      should be 4, not 5.
      The reason is that queue restarting logic creates a new page_pool and
      an old page_pool is not deleted due to the absence of an update of
      xdp_rxq_info logic.
      
      Fixes: 2d694c27 ("bnxt_en: implement netdev_queue_mgmt_ops")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarDavid Wei <dw@davidwei.uk>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b537633c
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f7578df9
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-07-25
      
      We've added 14 non-merge commits during the last 8 day(s) which contain
      a total of 19 files changed, 177 insertions(+), 70 deletions(-).
      
      The main changes are:
      
      1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
         BPF sockhash. Also add test coverage for this case, from Michal Luczaj.
      
      2) Fix a segmentation issue when downgrading gso_size in the BPF helper
         bpf_skb_adjust_room(), from Fred Li.
      
      3) Fix a compiler warning in resolve_btfids due to a missing type cast,
         from Liwei Song.
      
      4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
         boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.
      
      5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
         from Stanislav Fomichev.
      
      6) Fix function prototype BTF dumping in libbpf for prototypes that have
         no input arguments, from Andrii Nakryiko.
      
      7) Fix stacktrace symbol resolution in perf script for BPF programs
         containing subprograms, from Hou Tao.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
        xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
        bpf: Fix a segment issue when downgrading gso_size
        tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
        bpf, events: Use prog to emit ksymbol event for main program
        selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
        selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
        selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
        af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
        bpftool: Fix typo in usage help
        libbpf: Fix no-args func prototype BTF dumping syntax
        MAINTAINERS: Update powerpc BPF JIT maintainers
        MAINTAINERS: Update email address of Naveen
        selftests/bpf: fexit_sleep: Fix stack allocation for arm64
      ====================
      
      Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7578df9
    • Matthieu Baerts (NGI0)'s avatar
      tcp: process the 3rd ACK with sk_socket for TFO/MPTCP · c1668292
      Matthieu Baerts (NGI0) authored
      The 'Fixes' commit recently changed the behaviour of TCP by skipping the
      processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
      skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
      unnecessary ACK in case of simultaneous connect(). Unfortunately, that
      had an impact on TFO and MPTCP.
      
      I started to look at the impact on MPTCP, because the MPTCP CI found
      some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
      suggested me to look at the impact on TFO with "plain" TCP.
      
      For MPTCP, when receiving the 3rd ACK of a request adding a new path
      (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
      has been created when the MPTCP connection got established before with
      the first path. The newly added 'goto' will then skip the processing of
      the segment text (step 7) and not go through tcp_data_queue() where the
      MPTCP options are validated, and some actions are triggered, e.g.
      sending the MPJ 4th ACK [2] as demonstrated by the new errors when
      running a packetdrill test [3] establishing a second subflow.
      
      This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
      delayed. Still, we don't want to have this behaviour as it delays the
      switch to the fully established mode, and invalid MPTCP options in this
      3rd ACK will not be caught any more. This modification also affects the
      MPTCP + TFO feature as well, and being the reason why the selftests
      started to be unstable the last few days [4].
      
      For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
      passing: if the 3rd ACK contains data, and the connection is accept()ed
      before receiving them, these data would no longer be processed, and thus
      not ACKed.
      
      One last thing about MPTCP, in case of simultaneous connect(), a
      fallback to TCP will be done, which seems fine:
      
        `../common/defaults.sh`
      
         0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
        +0 >  . 1:1(0) ack 1           <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>
      
      Simultaneous SYN-data crossing is also not supported by TFO, see [6].
      
      Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
      that's a more generic solution than the one initially proposed, and
      also enough to fix the issues described above.
      
      Later on, Eric Dumazet mentioned that an ACK should still be sent in
      reaction to the second SYN+ACK that is received: not sending a DUPACK
      here seems wrong and could hurt:
      
         0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > S  0:0(0)                <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
        +0 < S  0:0(0)       win 1000 <mss 1000, sackOK, nop, nop>
        +0 > S. 0:0(0) ack 1          <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
        +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
        +0 >  . 1:1(0) ack 1          <nop, nop, sack 0:1>  // <== Here
      
      So in this version, the 'goto consume' is dropped, to always send an ACK
      when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
      seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
      simultaneous SYN crossing: that's what is expected here.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1]
      Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2]
      Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3]
      Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4]
      Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5]
      Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6]
      Fixes: 23e89e8e ("tcp: Don't drop SYN+ACK for simultaneous connect().")
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Suggested-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c1668292
    • Stanislav Fomichev's avatar
      selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test · 9b9969c4
      Stanislav Fomichev authored
      This flag is now required to use tx_metadata_len.
      
      Fixes: 40808a23 ("selftests/bpf: Add TX side to xdp_metadata")
      Reported-by: default avatarJulian Schindel <mail@arctic-alpaca.de>
      Signed-off-by: default avatarStanislav Fomichev <sdf@fomichev.me>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
      9b9969c4