1. 26 Feb, 2024 1 commit
    • Jakub Kicinski's avatar
      net: veth: clear GRO when clearing XDP even when down · fe9f8013
      Jakub Kicinski authored
      veth sets NETIF_F_GRO automatically when XDP is enabled,
      because both features use the same NAPI machinery.
      
      The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which
      is called both on ndo_stop and when XDP is turned off.
      To avoid the flag from being cleared when the device is brought
      down, the clearing is skipped when IFF_UP is not set.
      Bringing the device down should indeed not modify its features.
      
      Unfortunately, this means that clearing is also skipped when
      XDP is disabled _while_ the device is down. And there's nothing
      on the open path to bring the device features back into sync.
      IOW if user enables XDP, disables it and then brings the device
      up we'll end up with a stray GRO flag set but no NAPI instances.
      
      We don't depend on the GRO flag on the datapath, so the datapath
      won't crash. We will crash (or hang), however, next time features
      are sync'ed (either by user via ethtool or peer changing its config).
      The GRO flag will go away, and veth will try to disable the NAPIs.
      But the open path never created them since XDP was off, the GRO flag
      was a stray. If NAPI was initialized before we'll hang in napi_disable().
      If it never was we'll crash trying to stop uninitialized hrtimer.
      
      Move the GRO flag updates to the XDP enable / disable paths,
      instead of mixing them with the ndo_open / ndo_close paths.
      
      Fixes: d3256efd ("veth: allow enabling NAPI even without XDP")
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe9f8013
  2. 24 Feb, 2024 1 commit
  3. 23 Feb, 2024 8 commits
    • Geoff Levand's avatar
      ps3/gelic: Fix SKB allocation · b0b1210b
      Geoff Levand authored
      Commit 3ce4f9c3 ("net/ps3_gelic_net: Add gelic_descr structures") of
      6.8-rc1 had a copy-and-paste error where the pointer that holds the
      allocated SKB (struct gelic_descr.skb)  was set to NULL after the SKB was
      allocated. This resulted in a kernel panic when the SKB pointer was
      accessed.
      
      This fix moves the initialization of the gelic_descr to before the SKB
      is allocated.
      Reported-by: default avatarsambat goson <sombat3960@gmail.com>
      Fixes: 3ce4f9c3 ("net/ps3_gelic_net: Add gelic_descr structures")
      Signed-off-by: default avatarGeoff Levand <geoff@infradead.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0b1210b
    • Vladimir Oltean's avatar
      net: dpaa: fman_memac: accept phy-interface-type = "10gbase-r" in the device tree · 734f06db
      Vladimir Oltean authored
      Since commit 5d93cfcf ("net: dpaa: Convert to phylink"), we support
      the "10gbase-r" phy-mode through a driver-based conversion of "xgmii",
      but we still don't actually support it when the device tree specifies
      "10gbase-r" proper.
      
      This is because boards such as LS1046A-RDB do not define pcs-handle-names
      (for whatever reason) in the ethernet@f0000 device tree node, and the
      code enters through this code path:
      
      	err = of_property_match_string(mac_node, "pcs-handle-names", "xfi");
      	// code takes neither branch and falls through
      	if (err >= 0) {
      		(...)
      	} else if (err != -EINVAL && err != -ENODATA) {
      		goto _return_fm_mac_free;
      	}
      
      	(...)
      
      	/* For compatibility, if pcs-handle-names is missing, we assume this
      	 * phy is the first one in pcsphy-handle
      	 */
      	err = of_property_match_string(mac_node, "pcs-handle-names", "sgmii");
      	if (err == -EINVAL || err == -ENODATA)
      		pcs = memac_pcs_create(mac_node, 0); // code takes this branch
      	else if (err < 0)
      		goto _return_fm_mac_free;
      	else
      		pcs = memac_pcs_create(mac_node, err);
      
      	// A default PCS is created and saved in "pcs"
      
      	// This determination fails and mistakenly saves the default PCS
      	// memac->sgmii_pcs instead of memac->xfi_pcs, because at this
      	// stage, mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER.
      	if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
      		memac->xfi_pcs = pcs;
      	else
      		memac->sgmii_pcs = pcs;
      
      In other words, in the absence of pcs-handle-names, the default
      xfi_pcs assignment logic only works when in the device tree we have
      PHY_INTERFACE_MODE_XGMII.
      
      By reversing the order between the fallback xfi_pcs assignment and the
      "xgmii" overwrite with "10gbase-r", we are able to support both values
      in the device tree, with identical behavior.
      
      Currently, it is impossible to make the s/xgmii/10gbase-r/ device tree
      conversion, because it would break forward compatibility (new device
      tree with old kernel). The only way to modify existing device trees to
      phy-interface-mode = "10gbase-r" is to fix stable kernels to accept this
      value and handle it properly.
      
      One reason why the conversion is desirable is because with pre-phylink
      kernels, the Aquantia PHY driver used to warn about the improper use
      of PHY_INTERFACE_MODE_XGMII [1]. It is best to have a single (latest)
      device tree that works with all supported stable kernel versions.
      
      Note that the blamed commit does not constitute a regression per se.
      Older stable kernels like 6.1 still do not work with "10gbase-r", but
      for a different reason. That is a battle for another time.
      
      [1] https://lore.kernel.org/netdev/20240214-ls1046-dts-use-10gbase-r-v1-1-8c2d68547393@concurrent-rt.com/
      
      Fixes: 5d93cfcf ("net: dpaa: Convert to phylink")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSean Anderson <sean.anderson@seco.com>
      Acked-by: default avatarMadalin Bucur <madalin.bucur@oss.nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      734f06db
    • Jeremy Kerr's avatar
      net: mctp: take ownership of skb in mctp_local_output · 3773d65a
      Jeremy Kerr authored
      Currently, mctp_local_output only takes ownership of skb on success, and
      we may leak an skb if mctp_local_output fails in specific states; the
      skb ownership isn't transferred until the actual output routing occurs.
      
      Instead, make mctp_local_output free the skb on all error paths up to
      the route action, so it always consumes the passed skb.
      
      Fixes: 833ef3b9 ("mctp: Populate socket implementation")
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3773d65a
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · e872469c
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-02-20 (ice)
      
      This series contains updates to ice driver only.
      
      Yochai sets parent device to properly reflect connection state between
      source DPLL and output pin.
      
      Arkadiusz fixes additional issues related to DPLL; proper reporting of
      phase_adjust value and preventing use/access of data while resetting.
      
      Amritha resolves ASSERT_RTNL() being triggered on certain reset/rebuild
      flows.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix ASSERT_RTNL() warning during certain scenarios
        ice: fix pin phase adjust updates on PF reset
        ice: fix dpll periodic work data updates on PF reset
        ice: fix dpll and dpll_pin data access on PF reset
        ice: fix dpll input pin phase_adjust value updates
        ice: fix connection state of DPLL and out pin
      ====================
      Reviewed-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240220214444.1039759-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e872469c
    • Florian Westphal's avatar
      net: ip_tunnel: prevent perpetual headroom growth · 5ae1e992
      Florian Westphal authored
      syzkaller triggered following kasan splat:
      BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
      Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191
      [..]
       kasan_report+0xda/0x110 mm/kasan/report.c:588
       __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
       skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline]
       ___skb_get_hash net/core/flow_dissector.c:1791 [inline]
       __skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856
       skb_get_hash include/linux/skbuff.h:1556 [inline]
       ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748
       ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308
       __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
       netdev_start_xmit include/linux/netdevice.h:4954 [inline]
       xmit_one net/core/dev.c:3548 [inline]
       dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
       __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349
       dev_queue_xmit include/linux/netdevice.h:3134 [inline]
       neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592
       ...
       ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235
       ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
       ..
       iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
       ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831
       ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665
       __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
       netdev_start_xmit include/linux/netdevice.h:4954 [inline]
       xmit_one net/core/dev.c:3548 [inline]
       dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
       ...
      
      The splat occurs because skb->data points past skb->head allocated area.
      This is because neigh layer does:
        __skb_pull(skb, skb_network_offset(skb));
      
      ... but skb_network_offset() returns a negative offset and __skb_pull()
      arg is unsigned.  IOW, we skb->data gets "adjusted" by a huge value.
      
      The negative value is returned because skb->head and skb->data distance is
      more than 64k and skb->network_header (u16) has wrapped around.
      
      The bug is in the ip_tunnel infrastructure, which can cause
      dev->needed_headroom to increment ad infinitum.
      
      The syzkaller reproducer consists of packets getting routed via a gre
      tunnel, and route of gre encapsulated packets pointing at another (ipip)
      tunnel.  The ipip encapsulation finds gre0 as next output device.
      
      This results in the following pattern:
      
      1). First packet is to be sent out via gre0.
      Route lookup found an output device, ipip0.
      
      2).
      ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the future
      output device, rt.dev->needed_headroom (ipip0).
      
      3).
      ip output / start_xmit moves skb on to ipip0. which runs the same
      code path again (xmit recursion).
      
      4).
      Routing step for the post-gre0-encap packet finds gre0 as output device
      to use for ipip0 encapsulated packet.
      
      tunl0->needed_headroom is then incremented based on the (already bumped)
      gre0 device headroom.
      
      This repeats for every future packet:
      
      gre0->needed_headroom gets inflated because previous packets' ipip0 step
      incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0
      needed_headroom was increased.
      
      For each subsequent packet, gre/ipip0->needed_headroom grows until
      post-expand-head reallocations result in a skb->head/data distance of
      more than 64k.
      
      Once that happens, skb->network_header (u16) wraps around when
      pskb_expand_head tries to make sure that skb_network_offset() is unchanged
      after the headroom expansion/reallocation.
      
      After this skb_network_offset(skb) returns a different (and negative)
      result post headroom expansion.
      
      The next trip to neigh layer (or anything else that would __skb_pull the
      network header) makes skb->data point to a memory location outside
      skb->head area.
      
      v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to
      prevent perpetual increase instead of dropping the headroom increment
      completely.
      
      Reported-and-tested-by: syzbot+bfde3bef047a81b8fde6@syzkaller.appspotmail.com
      Closes: https://groups.google.com/g/syzkaller-bugs/c/fL9G6GtWskY/m/VKk_PR5FBAAJ
      Fixes: 243aad83 ("ip_gre: include route header_len in max_headroom calculation")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240220135606.4939-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ae1e992
    • Andre Werner's avatar
      net: smsc95xx: add support for SYS TEC USB-SPEmodule1 · 45532b21
      Andre Werner authored
      This patch adds support for the SYS TEC USB-SPEmodule1 10Base-T1L
      ethernet device to the existing smsc95xx driver by adding the new
      USB VID/PID pair.
      Signed-off-by: default avatarAndre Werner <andre.werner@systec-electronic.com>
      Link: https://lore.kernel.org/r/20240219053413.4732-1-andre.werner@systec-electronic.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45532b21
    • Florian Westphal's avatar
      netlink: add nla be16/32 types to minlen array · 9a0d1885
      Florian Westphal authored
      BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline]
      BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline]
      BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline]
      BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631
       nla_validate_range_unsigned lib/nlattr.c:222 [inline]
       nla_validate_int_range lib/nlattr.c:336 [inline]
       validate_nla lib/nlattr.c:575 [inline]
      ...
      
      The message in question matches this policy:
      
       [NFTA_TARGET_REV]       = NLA_POLICY_MAX(NLA_BE32, 255),
      
      but because NLA_BE32 size in minlen array is 0, the validation
      code will read past the malformed (too small) attribute.
      
      Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing:
      those likely should be added too.
      
      Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com
      Reported-by: default avatarxingwei lee <xrivendell7@gmail.com>
      Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6n8w@mail.gmail.com/raw
      Fixes: ecaf75ff ("netlink: introduce bigendian integer types")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9a0d1885
    • Ryosuke Yasuoka's avatar
      netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter · 661779e1
      Ryosuke Yasuoka authored
      syzbot reported the following uninit-value access issue [1]:
      
      netlink_to_full_skb() creates a new `skb` and puts the `skb->data`
      passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data
      size is specified as `len` and passed to skb_put_data(). This `len`
      is based on `skb->end` that is not data offset but buffer offset. The
      `skb->end` contains data and tailroom. Since the tailroom is not
      initialized when the new `skb` created, KMSAN detects uninitialized
      memory area when copying the data.
      
      This patch resolved this issue by correct the len from `skb->end` to
      `skb->len`, which is the actual data offset.
      
      BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
      BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline]
      BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline]
      BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
      BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline]
      BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
       instrument_copy_to_user include/linux/instrumented.h:114 [inline]
       copy_to_user_iter lib/iov_iter.c:24 [inline]
       iterate_ubuf include/linux/iov_iter.h:29 [inline]
       iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
       iterate_and_advance include/linux/iov_iter.h:271 [inline]
       _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
       copy_to_iter include/linux/uio.h:197 [inline]
       simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532
       __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420
       skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
       skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
       packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482
       sock_recvmsg_nosec net/socket.c:1044 [inline]
       sock_recvmsg net/socket.c:1066 [inline]
       sock_read_iter+0x467/0x580 net/socket.c:1136
       call_read_iter include/linux/fs.h:2014 [inline]
       new_sync_read fs/read_write.c:389 [inline]
       vfs_read+0x8f6/0xe00 fs/read_write.c:470
       ksys_read+0x20f/0x4c0 fs/read_write.c:613
       __do_sys_read fs/read_write.c:623 [inline]
       __se_sys_read fs/read_write.c:621 [inline]
       __x64_sys_read+0x93/0xd0 fs/read_write.c:621
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was stored to memory at:
       skb_put_data include/linux/skbuff.h:2622 [inline]
       netlink_to_full_skb net/netlink/af_netlink.c:181 [inline]
       __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline]
       __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325
       netlink_deliver_tap net/netlink/af_netlink.c:338 [inline]
       netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline]
       netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
       netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368
       netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
       ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
       __sys_sendmsg net/socket.c:2667 [inline]
       __do_sys_sendmsg net/socket.c:2676 [inline]
       __se_sys_sendmsg net/socket.c:2674 [inline]
       __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
       free_pages_prepare mm/page_alloc.c:1087 [inline]
       free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347
       free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533
       release_pages+0x23d3/0x2410 mm/swap.c:1042
       free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316
       tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
       tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
       tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300
       tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392
       exit_mmap+0x49e/0xd30 mm/mmap.c:3321
       __mmput+0x13f/0x530 kernel/fork.c:1349
       mmput+0x8a/0xa0 kernel/fork.c:1371
       exit_mm+0x1b8/0x360 kernel/exit.c:567
       do_exit+0xd57/0x4080 kernel/exit.c:858
       do_group_exit+0x2fd/0x390 kernel/exit.c:1021
       __do_sys_exit_group kernel/exit.c:1032 [inline]
       __se_sys_exit_group kernel/exit.c:1030 [inline]
       __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Bytes 3852-3903 of 3904 are uninitialized
      Memory access of size 3904 starts at ffff88812ea1e000
      Data copied to user address 0000000020003280
      
      CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
      
      Fixes: 1853c949 ("netlink, mmap: transform mmap skb into full skb on taps")
      Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1]
      Signed-off-by: default avatarRyosuke Yasuoka <ryasuoka@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      661779e1
  4. 22 Feb, 2024 19 commits
  5. 21 Feb, 2024 11 commits