1. 11 Jul, 2024 8 commits
    • Chengen Du's avatar
      net/sched: Fix UAF when resolving a clash · 26488172
      Chengen Du authored
      KASAN reports the following UAF:
      
       BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
       Read of size 1 at addr ffff888c07603600 by task handler130/6469
      
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x48/0x70
        print_address_description.constprop.0+0x33/0x3d0
        print_report+0xc0/0x2b0
        kasan_report+0xd0/0x120
        __asan_load1+0x6c/0x80
        tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
        tcf_ct_act+0x886/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
        __irq_exit_rcu+0x82/0xc0
        irq_exit_rcu+0xe/0x20
        common_interrupt+0xa1/0xb0
        </IRQ>
        <TASK>
        asm_common_interrupt+0x27/0x40
      
       Allocated by task 6469:
        kasan_save_stack+0x38/0x70
        kasan_set_track+0x25/0x40
        kasan_save_alloc_info+0x1e/0x40
        __kasan_krealloc+0x133/0x190
        krealloc+0xaa/0x130
        nf_ct_ext_add+0xed/0x230 [nf_conntrack]
        tcf_ct_act+0x1095/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
      
       Freed by task 6469:
        kasan_save_stack+0x38/0x70
        kasan_set_track+0x25/0x40
        kasan_save_free_info+0x2b/0x60
        ____kasan_slab_free+0x180/0x1f0
        __kasan_slab_free+0x12/0x30
        slab_free_freelist_hook+0xd2/0x1a0
        __kmem_cache_free+0x1a2/0x2f0
        kfree+0x78/0x120
        nf_conntrack_free+0x74/0x130 [nf_conntrack]
        nf_ct_destroy+0xb2/0x140 [nf_conntrack]
        __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack]
        nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack]
        __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack]
        tcf_ct_act+0x12ad/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
      
      The ct may be dropped if a clash has been resolved but is still passed to
      the tcf_ct_flow_table_process_conn function for further usage. This issue
      can be fixed by retrieving ct from skb again after confirming conntrack.
      
      Fixes: 0cc254e5 ("net/sched: act_ct: Offload connections with commit action")
      Co-developed-by: default avatarGerald Yang <gerald.yang@canonical.com>
      Signed-off-by: default avatarGerald Yang <gerald.yang@canonical.com>
      Signed-off-by: default avatarChengen Du <chengen.du@canonical.com>
      Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      26488172
    • Ronald Wahl's avatar
      net: ks8851: Fix potential TX stall after interface reopen · 7a99afef
      Ronald Wahl authored
      The amount of TX space in the hardware buffer is tracked in the tx_space
      variable. The initial value is currently only set during driver probing.
      
      After closing the interface and reopening it the tx_space variable has
      the last value it had before close. If it is smaller than the size of
      the first send packet after reopeing the interface the queue will be
      stopped. The queue is woken up after receiving a TX interrupt but this
      will never happen since we did not send anything.
      
      This commit moves the initialization of the tx_space variable to the
      ks8851_net_open function right before starting the TX queue. Also query
      the value from the hardware instead of using a hard coded value.
      
      Only the SPI chip variant is affected by this issue because only this
      driver variant actually depends on the tx_space variable in the xmit
      function.
      
      Fixes: 3dc5d445 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://patch.msgid.link/20240709195845.9089-1-rwahl@gmx.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7a99afef
    • Kuniyuki Iwashima's avatar
      udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). · 5c0b485a
      Kuniyuki Iwashima authored
      syzkaller triggered the warning [0] in udp_v4_early_demux().
      
      In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
      of the looked-up sk and use sock_pfree() as skb->destructor, so we check
      SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
      period.
      
      Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
      into the hash table.  Moreover, the SOCK_RCU_FREE check is done too early
      in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
      window:
      
        CPU1                                 CPU2
        ----                                 ----
        udp_v4_early_demux()                 udp_lib_get_port()
        |                                    |- hlist_add_head_rcu()
        |- sk = __udp4_lib_demux_lookup()    |
        |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
                                             `- sock_set_flag(sk, SOCK_RCU_FREE)
      
      We had the same bug in TCP and fixed it in commit 871019b2 ("net:
      set SOCK_RCU_FREE before inserting socket into hashtable").
      
      Let's apply the same fix for UDP.
      
      [0]:
      WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
      Modules linked in:
      CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda330 #13
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
      Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
      RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
      RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
      RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
      R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
      FS:  00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      PKRU: 55555554
      Call Trace:
       <TASK>
       ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
       ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
       NF_HOOK include/linux/netfilter.h:314 [inline]
       NF_HOOK include/linux/netfilter.h:308 [inline]
       ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
       __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
       __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
       netif_receive_skb_internal net/core/dev.c:5824 [inline]
       netif_receive_skb+0x271/0x300 net/core/dev.c:5884
       tun_rx_batched drivers/net/tun.c:1549 [inline]
       tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
       tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
       new_sync_write fs/read_write.c:497 [inline]
       vfs_write+0x76f/0x8d0 fs/read_write.c:590
       ksys_write+0xbf/0x190 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0x41/0x50 fs/read_write.c:652
       x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x4b/0x53
      RIP: 0033:0x7fc44a68bc1f
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
      RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
      RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
      RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
       </TASK>
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5c0b485a
    • Oleksij Rempel's avatar
      ethtool: netlink: do not return SQI value if link is down · c184cf94
      Oleksij Rempel authored
      Do not attach SQI value if link is down. "SQI values are only valid if
      link-up condition is present" per OpenAlliance specification of
      100Base-T1 Interoperability Test suite [1]. The same rule would apply
      for other link types.
      
      [1] https://opensig.org/automotive-ethernet-specifications/#
      
      Fixes: 80660219 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarWoojung Huh <woojung.huh@microchip.com>
      Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c184cf94
    • Dmitry Antipov's avatar
      ppp: reject claimed-as-LCP but actually malformed packets · f2aeb730
      Dmitry Antipov authored
      Since 'ppp_async_encode()' assumes valid LCP packets (with code
      from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that
      LCP packet has an actual body beyond PPP_LCP header bytes, and
      reject claimed-as-LCP but actually malformed data otherwise.
      
      Reported-by: syzbot+ec0723ba9605678b14bf@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f2aeb730
    • Jian Hui Lee's avatar
      net: ethernet: mtk-star-emac: set mac_managed_pm when probing · 8c6790b5
      Jian Hui Lee authored
      The below commit introduced a warning message when phy state is not in
      the states: PHY_HALTED, PHY_READY, and PHY_UP.
      commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      
      mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
      message during resume, indicate the phy resume/suspend is managed by the
      mac when probing.
      
      Fixes: 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      Signed-off-by: default avatarJian Hui Lee <jianhui.lee@canonical.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://patch.msgid.link/20240708065210.4178980-1-jianhui.lee@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8c6790b5
    • Vitaly Lifshits's avatar
      e1000e: fix force smbus during suspend flow · 76a0a3f9
      Vitaly Lifshits authored
      Commit 861e8086 ("e1000e: move force SMBUS from enable ulp function
      to avoid PHY loss issue") resolved a PHY access loss during suspend on
      Meteor Lake consumer platforms, but it affected corporate systems
      incorrectly.
      
      A better fix, working for both consumer and corporate systems, was
      proposed in commit bfd546a5 ("e1000e: move force SMBUS near the end
      of enable_ulp function"). However, it introduced a regression on older
      devices, such as [8086:15B8], [8086:15F9], [8086:15BE].
      
      This patch aims to fix the secondary regression, by limiting the scope of
      the changes to Meteor Lake platforms only.
      
      Fixes: bfd546a5 ("e1000e: move force SMBUS near the end of enable_ulp function")
      Reported-by: default avatarTodd Brandt <todd.e.brandt@intel.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940Reported-by: default avatarDieter Mummenschanz <dmummenschanz@web.de>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936Signed-off-by: default avatarVitaly Lifshits <vitaly.lifshits@intel.com>
      Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      76a0a3f9
    • Eric Dumazet's avatar
      tcp: avoid too many retransmit packets · 97a90635
      Eric Dumazet authored
      If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
      retracted its window to zero, tcp_retransmit_timer() can
      retransmit a packet every two jiffies (2 ms for HZ=1000),
      for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
      
      The fix is to make sure tcp_rtx_probe0_timed_out() takes
      icsk->icsk_user_timeout into account.
      
      Before blamed commit, the socket would not timeout after
      icsk->icsk_user_timeout, but would use standard exponential
      backoff for the retransmits.
      
      Also worth noting that before commit e89688e3 ("net: tcp:
      fix unexcepted socket die when snd_wnd is 0"), the issue
      would last 2 minutes instead of 4.
      
      Fixes: b701a99e ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      97a90635
  2. 10 Jul, 2024 2 commits
    • Aleksander Jan Bajkowski's avatar
      net: ethernet: lantiq_etop: fix double free in detach · e1533b63
      Aleksander Jan Bajkowski authored
      The number of the currently released descriptor is never incremented
      which results in the same skb being released multiple times.
      
      Fixes: 504d4721 ("MIPS: Lantiq: Add ethernet driver")
      Reported-by: default avatarJoe Perches <joe@perches.com>
      Closes: https://lore.kernel.org/all/fc1bf93d92bb5b2f99c6c62745507cc22f3a7b2d.camel@perches.com/Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20240708205826.5176-1-olek2@wp.plSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1533b63
    • Michal Kubiak's avatar
      i40e: Fix XDP program unloading while removing the driver · 01fc5142
      Michal Kubiak authored
      The commit 6533e558 ("i40e: Fix reset path while removing
      the driver") introduced a new PF state "__I40E_IN_REMOVE" to block
      modifying the XDP program while the driver is being removed.
      Unfortunately, such a change is useful only if the ".ndo_bpf()"
      callback was called out of the rmmod context because unloading the
      existing XDP program is also a part of driver removing procedure.
      In other words, from the rmmod context the driver is expected to
      unload the XDP program without reporting any errors. Otherwise,
      the kernel warning with callstack is printed out to dmesg.
      
      Example failing scenario:
       1. Load the i40e driver.
       2. Load the XDP program.
       3. Unload the i40e driver (using "rmmod" command).
      
      The example kernel warning log:
      
      [  +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870
      [...]
      [  +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870
      [...]
      [  +0.002726] Call Trace:
      [  +0.002457]  <TASK>
      [  +0.002119]  ? __warn+0x80/0x120
      [  +0.003245]  ? unregister_netdevice_many_notify+0x7a9/0x870
      [  +0.005586]  ? report_bug+0x164/0x190
      [  +0.003678]  ? handle_bug+0x3c/0x80
      [  +0.003503]  ? exc_invalid_op+0x17/0x70
      [  +0.003846]  ? asm_exc_invalid_op+0x1a/0x20
      [  +0.004200]  ? unregister_netdevice_many_notify+0x7a9/0x870
      [  +0.005579]  ? unregister_netdevice_many_notify+0x3cc/0x870
      [  +0.005586]  unregister_netdevice_queue+0xf7/0x140
      [  +0.004806]  unregister_netdev+0x1c/0x30
      [  +0.003933]  i40e_vsi_release+0x87/0x2f0 [i40e]
      [  +0.004604]  i40e_remove+0x1a1/0x420 [i40e]
      [  +0.004220]  pci_device_remove+0x3f/0xb0
      [  +0.003943]  device_release_driver_internal+0x19f/0x200
      [  +0.005243]  driver_detach+0x48/0x90
      [  +0.003586]  bus_remove_driver+0x6d/0xf0
      [  +0.003939]  pci_unregister_driver+0x2e/0xb0
      [  +0.004278]  i40e_exit_module+0x10/0x5f0 [i40e]
      [  +0.004570]  __do_sys_delete_module.isra.0+0x197/0x310
      [  +0.005153]  do_syscall_64+0x85/0x170
      [  +0.003684]  ? syscall_exit_to_user_mode+0x69/0x220
      [  +0.004886]  ? do_syscall_64+0x95/0x170
      [  +0.003851]  ? exc_page_fault+0x7e/0x180
      [  +0.003932]  entry_SYSCALL_64_after_hwframe+0x71/0x79
      [  +0.005064] RIP: 0033:0x7f59dc9347cb
      [  +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83
      c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f
      05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48
      [  +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      [  +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb
      [  +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748
      [  +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000
      [  +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000
      [  +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000
      [  +0.007151]  </TASK>
      [  +0.002204] ---[ end trace 0000000000000000 ]---
      
      Fix this by checking if the XDP program is being loaded or unloaded.
      Then, block only loading a new program while "__I40E_IN_REMOVE" is set.
      Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP
      callback to avoid unnecessary operations and checks.
      
      Fixes: 6533e558 ("i40e: Fix reset path while removing the driver")
      Signed-off-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://patch.msgid.link/20240708230750.625986-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      01fc5142
  3. 09 Jul, 2024 7 commits
    • Hugh Dickins's avatar
      net: fix rc7's __skb_datagram_iter() · f1538310
      Hugh Dickins authored
      X would not start in my old 32-bit partition (and the "n"-handling looks
      just as wrong on 64-bit, but for whatever reason did not show up there):
      "n" must be accumulated over all pages before it's added to "offset" and
      compared with "copy", immediately after the skb_frag_foreach_page() loop.
      
      Fixes: d2d30a37 ("net: allow skb_datagram_iter to be called from any context")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Link: https://patch.msgid.link/fef352e8-b89a-da51-f8ce-04bc39ee6481@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f1538310
    • Paolo Abeni's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 528269fe
      Paolo Abeni authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-07-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 3 non-merge commits during the last 1 day(s) which contain
      a total of 5 files changed, 81 insertions(+), 11 deletions(-).
      
      The main changes are:
      
      1) Fix a use-after-free in a corner case where tcx_entry got released too
         early. Also add BPF test coverage along with the fix, from Daniel Borkmann.
      
      2) Fix a kernel panic on Loongarch in sk_msg_recvmsg() which got triggered
         by running BPF sockmap selftests, from Geliang Tang.
      
      bpf-for-netdev
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        skmsg: Skip zero length skb in sk_msg_recvmsg
        selftests/bpf: Extend tcx tests to cover late tcx_entry release
        bpf: Fix too early release of tcx_entry
      ====================
      
      Link: https://patch.msgid.link/20240709091452.27840-1-daniel@iogearbox.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      528269fe
    • Ronald Wahl's avatar
      net: ks8851: Fix deadlock with the SPI chip variant · 0913ec33
      Ronald Wahl authored
      When SMP is enabled and spinlocks are actually functional then there is
      a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
      and ks8851_irq:
      
          watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
          call trace:
            queued_spin_lock_slowpath+0x100/0x284
            do_raw_spin_lock+0x34/0x44
            ks8851_start_xmit_spi+0x30/0xb8
            ks8851_start_xmit+0x14/0x20
            netdev_start_xmit+0x40/0x6c
            dev_hard_start_xmit+0x6c/0xbc
            sch_direct_xmit+0xa4/0x22c
            __qdisc_run+0x138/0x3fc
            qdisc_run+0x24/0x3c
            net_tx_action+0xf8/0x130
            handle_softirqs+0x1ac/0x1f0
            __do_softirq+0x14/0x20
            ____do_softirq+0x10/0x1c
            call_on_irq_stack+0x3c/0x58
            do_softirq_own_stack+0x1c/0x28
            __irq_exit_rcu+0x54/0x9c
            irq_exit_rcu+0x10/0x1c
            el1_interrupt+0x38/0x50
            el1h_64_irq_handler+0x18/0x24
            el1h_64_irq+0x64/0x68
            __netif_schedule+0x6c/0x80
            netif_tx_wake_queue+0x38/0x48
            ks8851_irq+0xb8/0x2c8
            irq_thread_fn+0x2c/0x74
            irq_thread+0x10c/0x1b0
            kthread+0xc8/0xd8
            ret_from_fork+0x10/0x20
      
      This issue has not been identified earlier because tests were done on
      a device with SMP disabled and so spinlocks were actually NOPs.
      
      Now use spin_(un)lock_bh for TX queue related locking to avoid execution
      of softirq work synchronously that would lead to a deadlock.
      
      Fixes: 3dc5d445 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240706101337.854474-1-rwahl@gmx.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0913ec33
    • Aleksandr Mishin's avatar
      octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability() · 442e26af
      Aleksandr Mishin authored
      In rvu_check_rsrc_availability() in case of invalid SSOW req, an incorrect
      data is printed to error log. 'req->sso' value is printed instead of
      'req->ssow'. Looks like "copy-paste" mistake.
      
      Fix this mistake by replacing 'req->sso' with 'req->ssow'.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 746ea742 ("octeontx2-af: Add RVU block LF provisioning support")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240705095317.12640-1-amishin@t-argos.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      442e26af
    • Jakub Kicinski's avatar
      bnxt: fix crashes when reducing ring count with active RSS contexts · 0d1b7d6c
      Jakub Kicinski authored
      bnxt doesn't check if a ring is used by RSS contexts when reducing
      ring count. Core performs a similar check for the drivers for
      the main context, but core doesn't know about additional contexts,
      so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring
      id to index bp->rx_ring[], which without the check may end up
      being out of bounds.
      
        BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
        Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
        Call Trace:
        __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
         bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
         __bnxt_setup_vnic_p5+0x12e/0x270
         __bnxt_open_nic+0x2262/0x2f30
         bnxt_open_nic+0x5d/0xf0
         ethnl_set_channels+0x5d4/0xb30
         ethnl_default_set_doit+0x2f1/0x620
      
      Core does track the additional contexts in net-next, so we can
      move this validation out of the driver as a follow up there.
      
      Fixes: b3d0083c ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Link: https://patch.msgid.link/20240705020005.681746-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0d1b7d6c
    • Geliang Tang's avatar
      skmsg: Skip zero length skb in sk_msg_recvmsg · f0c18025
      Geliang Tang authored
      When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
      platform, the following kernel panic occurs:
      
        [...]
        Oops[#1]:
        CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
        Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
           ... ...
           ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
          ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
         CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
         PRMD: 0000000c (PPLV0 +PIE +PWE)
         EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
         ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
        ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
         BADV: 0000000000000040
         PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
        Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
        Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
        Stack : ...
        Call Trace:
        [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
        [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
        [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
        [<90000000049aae34>] inet_recvmsg+0x54/0x100
        [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
        [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
        [<900000000481e27c>] sys_recvfrom+0x1c/0x40
        [<9000000004c076ec>] do_syscall+0x8c/0xc0
        [<9000000003731da4>] handle_syscall+0xc4/0x160
        Code: ...
        ---[ end trace 0000000000000000 ]---
        Kernel panic - not syncing: Fatal exception
        Kernel relocated by 0x3510000
         .text @ 0x9000000003710000
         .data @ 0x9000000004d70000
         .bss  @ 0x9000000006469400
        ---[ end Kernel panic - not syncing: Fatal exception ]---
        [...]
      
      This crash happens every time when running sockmap_skb_verdict_shutdown
      subtest in sockmap_basic.
      
      This crash is because a NULL pointer is passed to page_address() in the
      sk_msg_recvmsg(). Due to the different implementations depending on the
      architecture, page_address(NULL) will trigger a panic on Loongarch
      platform but not on x86 platform. So this bug was hidden on x86 platform
      for a while, but now it is exposed on Loongarch platform. The root cause
      is that a zero length skb (skb->len == 0) was put on the queue.
      
      This zero length skb is a TCP FIN packet, which was sent by shutdown(),
      invoked in test_sockmap_skb_verdict_shutdown():
      
      	shutdown(p1, SHUT_WR);
      
      In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
      page is put to this sge (see sg_set_page in sg_set_page), but this empty
      sge is queued into ingress_msg list.
      
      And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
      sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
      to kmap_local_page() and to page_address(), then kernel panics.
      
      To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
      if copy is zero, that means it's a zero length skb, skip invoking
      copy_page_to_iter(). We are using the EFAULT return triggered by
      copy_page_to_iter to check for is_fin in tcp_bpf.c.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Suggested-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
      f0c18025
    • Oleksij Rempel's avatar
      net: phy: microchip: lan87xx: reinit PHY after cable test · 30f747b8
      Oleksij Rempel authored
      Reinit PHY after cable test, otherwise link can't be established on
      tested port. This issue is reproducible on LAN9372 switches with
      integrated 100BaseT1 PHYs.
      
      Fixes: 78805025 ("net: phy: microchip_t1: add cable test support for lan87xx phy")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://patch.msgid.link/20240705084954.83048-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30f747b8
  4. 08 Jul, 2024 3 commits
    • Daniel Borkmann's avatar
      selftests/bpf: Extend tcx tests to cover late tcx_entry release · 5f1d18de
      Daniel Borkmann authored
      Add a test case which replaces an active ingress qdisc while keeping the
      miniq in-tact during the transition period to the new clsact qdisc.
      
        # ./vmtest.sh -- ./test_progs -t tc_link
        [...]
        ./test_progs -t tc_link
        [    3.412871] bpf_testmod: loading out-of-tree module taints kernel.
        [    3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
        #332     tc_links_after:OK
        #333     tc_links_append:OK
        #334     tc_links_basic:OK
        #335     tc_links_before:OK
        #336     tc_links_chain_classic:OK
        #337     tc_links_chain_mixed:OK
        #338     tc_links_dev_chain0:OK
        #339     tc_links_dev_cleanup:OK
        #340     tc_links_dev_mixed:OK
        #341     tc_links_ingress:OK
        #342     tc_links_invalid:OK
        #343     tc_links_prepend:OK
        #344     tc_links_replace:OK
        #345     tc_links_revision:OK
        Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Link: https://lore.kernel.org/r/20240708133130.11609-2-daniel@iogearbox.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      5f1d18de
    • Daniel Borkmann's avatar
      bpf: Fix too early release of tcx_entry · 1cb6f0ba
      Daniel Borkmann authored
      Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported
      an issue that the tcx_entry can be released too early leading to a use
      after free (UAF) when an active old-style ingress or clsact qdisc with a
      shared tc block is later replaced by another ingress or clsact instance.
      
      Essentially, the sequence to trigger the UAF (one example) can be as follows:
      
        1. A network namespace is created
        2. An ingress qdisc is created. This allocates a tcx_entry, and
           &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the
           same time, a tcf block with index 1 is created.
        3. chain0 is attached to the tcf block. chain0 must be connected to
           the block linked to the ingress qdisc to later reach the function
           tcf_chain0_head_change_cb_del() which triggers the UAF.
        4. Create and graft a clsact qdisc. This causes the ingress qdisc
           created in step 1 to be removed, thus freeing the previously linked
           tcx_entry:
      
           rtnetlink_rcv_msg()
             => tc_modify_qdisc()
               => qdisc_create()
                 => clsact_init() [a]
               => qdisc_graft()
                 => qdisc_destroy()
                   => __qdisc_destroy()
                     => ingress_destroy() [b]
                       => tcx_entry_free()
                         => kfree_rcu() // tcx_entry freed
      
        5. Finally, the network namespace is closed. This registers the
           cleanup_net worker, and during the process of releasing the
           remaining clsact qdisc, it accesses the tcx_entry that was
           already freed in step 4, causing the UAF to occur:
      
           cleanup_net()
             => ops_exit_list()
               => default_device_exit_batch()
                 => unregister_netdevice_many()
                   => unregister_netdevice_many_notify()
                     => dev_shutdown()
                       => qdisc_put()
                         => clsact_destroy() [c]
                           => tcf_block_put_ext()
                             => tcf_chain0_head_change_cb_del()
                               => tcf_chain_head_change_item()
                                 => clsact_chain_head_change()
                                   => mini_qdisc_pair_swap() // UAF
      
      There are also other variants, the gist is to add an ingress (or clsact)
      qdisc with a specific shared block, then to replace that qdisc, waiting
      for the tcx_entry kfree_rcu() to be executed and subsequently accessing
      the current active qdisc's miniq one way or another.
      
      The correct fix is to turn the miniq_active boolean into a counter. What
      can be observed, at step 2 above, the counter transitions from 0->1, at
      step [a] from 1->2 (in order for the miniq object to remain active during
      the replacement), then in [b] from 2->1 and finally [c] 1->0 with the
      eventual release. The reference counter in general ranges from [0,2] and
      it does not need to be atomic since all access to the counter is protected
      by the rtnl mutex. With this in place, there is no longer a UAF happening
      and the tcx_entry is freed at the correct time.
      
      Fixes: e420bed0 ("bpf: Add fd-based tcx multi-prog infra with link support")
      Reported-by: default avatarPedro Pinto <xten@osec.io>
      Co-developed-by: default avatarPedro Pinto <xten@osec.io>
      Signed-off-by: default avatarPedro Pinto <xten@osec.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Hyunwoo Kim <v4bel@theori.io>
      Cc: Wongi Lee <qwerty@theori.io>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Link: https://lore.kernel.org/r/20240708133130.11609-1-daniel@iogearbox.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      1cb6f0ba
    • Chris Packham's avatar
      docs: networking: devlink: capitalise length value · 83c36e7c
      Chris Packham authored
      Correct the example to match the help text from the devlink utility.
      Signed-off-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83c36e7c
  5. 06 Jul, 2024 7 commits
    • Neal Cardwell's avatar
      tcp: fix incorrect undo caused by DSACK of TLP retransmit · 0ec986ed
      Neal Cardwell authored
      Loss recovery undo_retrans bookkeeping had a long-standing bug where a
      DSACK from a spurious TLP retransmit packet could cause an erroneous
      undo of a fast recovery or RTO recovery that repaired a single
      really-lost packet (in a sequence range outside that of the TLP
      retransmit). Basically, because the loss recovery state machine didn't
      account for the fact that it sent a TLP retransmit, the DSACK for the
      TLP retransmit could erroneously be implicitly be interpreted as
      corresponding to the normal fast recovery or RTO recovery retransmit
      that plugged a real hole, thus resulting in an improper undo.
      
      For example, consider the following buggy scenario where there is a
      real packet loss but the congestion control response is improperly
      undone because of this bug:
      
      + send packets P1, P2, P3, P4
      + P1 is really lost
      + send TLP retransmit of P4
      + receive SACK for original P2, P3, P4
      + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
      + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
      + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
      
      The fix: when we initialize undo machinery in tcp_init_undo(), if
      there is a TLP retransmit in flight, then increment tp->undo_retrans
      so that we make sure that we receive a DSACK corresponding to the TLP
      retransmit, as well as DSACKs for all later normal retransmits, before
      triggering a loss recovery undo. Note that we also have to move the
      line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
      we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
      it only afterward.
      
      Also note that the bug dates back to the original 2013 TLP
      implementation, commit 6ba8a3b1 ("tcp: Tail loss probe (TLP)").
      
      However, this patch will only compile and work correctly with kernels
      that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
      commit 76be93fc ("tcp: allow at most one TLP probe per flight").
      So we associate this fix with that later commit.
      
      Fixes: 76be93fc ("tcp: allow at most one TLP probe per flight")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Kevin Yang <yyd@google.com>
      Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0ec986ed
    • Jakub Kicinski's avatar
      Merge branch 'wireguard-fixes-for-6-10-rc7' · 842c361b
      Jakub Kicinski authored
      Jason A. Donenfeld says:
      
      ====================
      wireguard fixes for 6.10-rc7
      
      These are four small fixes for WireGuard, which are all marked for
      stable:
      
      1) A QEMU command line fix to remove deprecated flags.
      
      2) Use of proper unaligned helpers to avoid unaligned memory access on
         some systems, from Helge.
      
      3) Two patches to annotate intentional data races, so KCSAN and syzbot
         don't get upset.
      ====================
      
      Link: https://patch.msgid.link/20240704154517.1572127-1-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      842c361b
    • Jason A. Donenfeld's avatar
      wireguard: send: annotate intentional data race in checking empty queue · 381a7d45
      Jason A. Donenfeld authored
      KCSAN reports a race in wg_packet_send_keepalive, which is intentional:
      
          BUG: KCSAN: data-race in wg_packet_send_keepalive / wg_packet_send_staged_packets
      
          write to 0xffff88814cd91280 of 8 bytes by task 3194 on cpu 0:
           __skb_queue_head_init include/linux/skbuff.h:2162 [inline]
           skb_queue_splice_init include/linux/skbuff.h:2248 [inline]
           wg_packet_send_staged_packets+0xe5/0xad0 drivers/net/wireguard/send.c:351
           wg_xmit+0x5b8/0x660 drivers/net/wireguard/device.c:218
           __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
           netdev_start_xmit include/linux/netdevice.h:4954 [inline]
           xmit_one net/core/dev.c:3548 [inline]
           dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3564
           __dev_queue_xmit+0xeff/0x1d80 net/core/dev.c:4349
           dev_queue_xmit include/linux/netdevice.h:3134 [inline]
           neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1592
           neigh_output include/net/neighbour.h:542 [inline]
           ip6_finish_output2+0xa66/0xce0 net/ipv6/ip6_output.c:137
           ip6_finish_output+0x1a5/0x490 net/ipv6/ip6_output.c:222
           NF_HOOK_COND include/linux/netfilter.h:303 [inline]
           ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:243
           dst_output include/net/dst.h:451 [inline]
           NF_HOOK include/linux/netfilter.h:314 [inline]
           ndisc_send_skb+0x4a2/0x670 net/ipv6/ndisc.c:509
           ndisc_send_rs+0x3ab/0x3e0 net/ipv6/ndisc.c:719
           addrconf_dad_completed+0x640/0x8e0 net/ipv6/addrconf.c:4295
           addrconf_dad_work+0x891/0xbc0
           process_one_work kernel/workqueue.c:2633 [inline]
           process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
           worker_thread+0x525/0x730 kernel/workqueue.c:2787
           kthread+0x1d7/0x210 kernel/kthread.c:388
           ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
      
          read to 0xffff88814cd91280 of 8 bytes by task 3202 on cpu 1:
           skb_queue_empty include/linux/skbuff.h:1798 [inline]
           wg_packet_send_keepalive+0x20/0x100 drivers/net/wireguard/send.c:225
           wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
           wg_packet_handshake_receive_worker+0x445/0x5e0 drivers/net/wireguard/receive.c:213
           process_one_work kernel/workqueue.c:2633 [inline]
           process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
           worker_thread+0x525/0x730 kernel/workqueue.c:2787
           kthread+0x1d7/0x210 kernel/kthread.c:388
           ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
      
          value changed: 0xffff888148fef200 -> 0xffff88814cd91280
      
      Mark this race as intentional by using the skb_queue_empty_lockless()
      function rather than skb_queue_empty(), which uses READ_ONCE()
      internally to annotate the race.
      
      Cc: stable@vger.kernel.org
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://patch.msgid.link/20240704154517.1572127-5-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      381a7d45
    • Jason A. Donenfeld's avatar
      wireguard: queueing: annotate intentional data race in cpu round robin · 2fe3d6d2
      Jason A. Donenfeld authored
      KCSAN reports a race in the CPU round robin function, which, as the
      comment points out, is intentional:
      
          BUG: KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets
      
          read to 0xffff88811254eb28 of 4 bytes by task 3160 on cpu 1:
           wg_cpumask_next_online drivers/net/wireguard/queueing.h:127 [inline]
           wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
           wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
           wg_packet_send_staged_packets+0x60e/0xac0 drivers/net/wireguard/send.c:388
           wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
           wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
           wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
           process_one_work kernel/workqueue.c:3248 [inline]
           process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
           worker_thread+0x526/0x720 kernel/workqueue.c:3409
           kthread+0x1d1/0x210 kernel/kthread.c:389
           ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
          write to 0xffff88811254eb28 of 4 bytes by task 3158 on cpu 0:
           wg_cpumask_next_online drivers/net/wireguard/queueing.h:130 [inline]
           wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
           wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
           wg_packet_send_staged_packets+0x6e5/0xac0 drivers/net/wireguard/send.c:388
           wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
           wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
           wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
           process_one_work kernel/workqueue.c:3248 [inline]
           process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
           worker_thread+0x526/0x720 kernel/workqueue.c:3409
           kthread+0x1d1/0x210 kernel/kthread.c:389
           ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
          value changed: 0xffffffff -> 0x00000000
      
      Mark this race as intentional by using READ/WRITE_ONCE().
      
      Cc: stable@vger.kernel.org
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://patch.msgid.link/20240704154517.1572127-4-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fe3d6d2
    • Helge Deller's avatar
      wireguard: allowedips: avoid unaligned 64-bit memory accesses · 948f991c
      Helge Deller authored
      On the parisc platform, the kernel issues kernel warnings because
      swap_endian() tries to load a 128-bit IPv6 address from an unaligned
      memory location:
      
       Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df)
       Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc)
      
      Avoid such unaligned memory accesses by instead using the
      get_unaligned_be64() helper macro.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      [Jason: replace src[8] in original patch with src+8]
      Cc: stable@vger.kernel.org
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      948f991c
    • Jason A. Donenfeld's avatar
      wireguard: selftests: use acpi=off instead of -no-acpi for recent QEMU · 2cb489eb
      Jason A. Donenfeld authored
      QEMU 9.0 removed -no-acpi, in favor of machine properties, so update the
      Makefile to use the correct QEMU invocation.
      
      Cc: stable@vger.kernel.org
      Fixes: b83fdcd9 ("wireguard: selftests: use microvm on x86")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://patch.msgid.link/20240704154517.1572127-2-Jason@zx2c4.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2cb489eb
    • Dan Carpenter's avatar
      net: bcmasp: Fix error code in probe() · 0c754d9d
      Dan Carpenter authored
      Return an error code if bcmasp_interface_create() fails.  Don't return
      success.
      
      Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Reviewed-by: default avatarJustin Chen <justin.chen@broadcom.com>
      Link: https://patch.msgid.link/ZoWKBkHH9D1fqV4r@stanley.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c754d9d
  6. 05 Jul, 2024 1 commit
  7. 04 Jul, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 033771c0
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, wireless and netfilter.
      
        There's one fix for power management with Intel's e1000e here,
        Thorsten tells us there's another problem that started in v6.9. We're
        trying to wrap that up but I don't think it's blocking.
      
        Current release - new code bugs:
      
         - wifi: mac80211: disable softirqs for queued frame handling
      
         - af_unix: fix uninit-value in __unix_walk_scc(), with the new
           garbage collection algo
      
        Previous releases - regressions:
      
         - Bluetooth:
            - qca: fix BT enable failure for QCA6390 after warm reboot
            - add quirk to ignore reserved PHY bits in LE Extended Adv Report,
              abused by some Broadcom controllers found on Apple machines
      
         - wifi: wilc1000: fix ies_len type in connect path
      
        Previous releases - always broken:
      
         - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
           avoid premature timeouts
      
         - net: make sure skb_datagram_iter maps fragments page by page, in
           case we somehow get compound highmem mixed in
      
         - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
           queues are used
      
        Misc:
      
         - MAINTAINERS: Remembering Larry Finger"
      
      * tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        bnxt_en: Fix the resource check condition for RSS contexts
        mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
        inet_diag: Initialize pad field in struct inet_diag_req_v2
        tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
        selftests: make order checking verbose in msg_zerocopy selftest
        selftests: fix OOM in msg_zerocopy selftest
        ice: use proper macro for testing bit
        ice: Reject pin requests with unsupported flags
        ice: Don't process extts if PTP is disabled
        ice: Fix improper extts handling
        selftest: af_unix: Add test case for backtrack after finalising SCC.
        af_unix: Fix uninit-value in __unix_walk_scc()
        bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
        net: rswitch: Avoid use-after-free in rswitch_poll()
        netfilter: nf_tables: unconditionally flush pending work before notifier
        wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
        wifi: iwlwifi: mvm: avoid link lookup in statistics
        wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
        wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
        wifi: wilc1000: fix ies_len type in connect path
        ...
      033771c0
    • Linus Torvalds's avatar
      Merge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · d470e9f5
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - Fix and add physical to virtual address translations in dasd and
         virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
         More code cleanup will follow.
      
       - Small defconfig updates
      
      * tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
        s390/vfio_ccw: Fix target addresses of TIC CCWs
        s390: Update defconfigs
      d470e9f5
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.10-5' of... · 2d19be09
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fix from Hans de Goede:
      
       - Fix regression in toshiba_acpi introduced in 6.10-rc1
      
      * tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86: toshiba_acpi: Fix quickstart quirk handling
      2d19be09
    • Linus Torvalds's avatar
      Merge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux · 4d85acef
      Linus Torvalds authored
      Pull Kselftest fix from Mickaël Salaün:
       "Fix Kselftests timeout.
      
        We can't use CLONE_VFORK, since that blocks the parent - and thus the
        timeout handling - until the child exits or execve's.
      
        Go back to using plain fork()"
      
      * tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
        selftests/harness: Fix tests timeout and race condition
      4d85acef
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of... · 8faccfef
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from, Andrew Morton:
       "6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
        and three unrelated MM fixes"
      
      * tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        nilfs2: fix incorrect inode allocation from reserved inodes
        nilfs2: add missing check for inode numbers on directory entries
        nilfs2: fix inode number range checks
        mm: avoid overflows in dirty throttling logic
        Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
        mm: optimize the redundant loop of mm_update_owner_next()
      8faccfef
    • Pavan Chebbi's avatar
      bnxt_en: Fix the resource check condition for RSS contexts · 5d350dc3
      Pavan Chebbi authored
      While creating a new RSS context, bnxt_rfs_capable() currently
      makes a strict check to see if the required VNICs are already
      available.  If the current VNICs are not what is required,
      either too many or not enough, it will call the firmware to
      reserve the exact number required.
      
      There is a bug in the firmware when the driver tries to
      relinquish some reserved VNICs and RSS contexts.  It will
      cause the default VNIC to lose its RSS configuration and
      cause receive packets to be placed incorrectly.
      
      Workaround this problem by skipping the resource reduction.
      The driver will not reduce the VNIC and RSS context reservations
      when a context is deleted.  The resources will be available for
      use when new contexts are created later.
      
      Potentially, this workaround can cause us to run out of VNIC
      and RSS contexts if there are a lot of VF functions creating
      and deleting RSS contexts.  In the future, we will conditionally
      disable this workaround when the firmware fix is available.
      
      Fixes: 438ba39b ("bnxt_en: Improve RSS context reservation infrastructure")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5d350dc3
    • Aleksandr Mishin's avatar
      mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file · 8ce34dcc
      Aleksandr Mishin authored
      In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
      but doesn't reset pointer to NULL and returns 0. In case of any error
      occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
      calls mlxsw_linecard_types_fini() which performs memory deallocation again.
      
      Add pointer reset to NULL.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: b217127e ("mlxsw: core_linecards: Add line card objects and implement provisioning")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8ce34dcc
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · eec5969c
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.10
      
      Hopefully the last fixes for v6.10. Fix a regression in wilc1000
      where bitrate Information Elements longer than 255 bytes were broken.
      Few fixes also to mac80211 and iwlwifi.
      
      * tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
        wifi: iwlwifi: mvm: avoid link lookup in statistics
        wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
        wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
        wifi: wilc1000: fix ies_len type in connect path
        wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
      ====================
      
      Link: https://patch.msgid.link/20240704111431.11DEDC3277B@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eec5969c
    • Paolo Abeni's avatar
      Merge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · e3671971
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following batch contains a oneliner patch to inconditionally flush
      workqueue containing stale objects to be released, syzbot managed to
      trigger UaF. Patch from Florian Westphal.
      
      netfilter pull request 24-07-04
      
      * tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: unconditionally flush pending work before notifier
      ====================
      
      Link: https://patch.msgid.link/20240703223304.1455-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e3671971
    • Shigeru Yoshida's avatar
      inet_diag: Initialize pad field in struct inet_diag_req_v2 · 61cf1c73
      Shigeru Yoshida authored
      KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
      sockets uses the pad field in struct inet_diag_req_v2 for the
      underlying protocol. This field corresponds to the sdiag_raw_protocol
      field in struct inet_diag_req_raw.
      
      inet_diag_get_exact_compat() converts inet_diag_req to
      inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
      occurs when raw_lookup() accesses the sdiag_raw_protocol field.
      
      Fix this by initializing the pad field in
      inet_diag_get_exact_compat(). Also, do the same fix in
      inet_diag_dump_compat() to avoid the similar issue in the future.
      
      [1]
      BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
      BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
       raw_lookup net/ipv4/raw_diag.c:49 [inline]
       raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
       raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
       inet_diag_cmd_exact+0x7d9/0x980
       inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
       inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
       sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
       netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
       sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
       netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
       netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
       netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x332/0x3d0 net/socket.c:745
       ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
       ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
       __sys_sendmsg net/socket.c:2668 [inline]
       __do_sys_sendmsg net/socket.c:2677 [inline]
       __se_sys_sendmsg net/socket.c:2675 [inline]
       __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
       x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was stored to memory at:
       raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
       raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
       inet_diag_cmd_exact+0x7d9/0x980
       inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
       inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
       sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
       netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
       sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
       netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
       netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
       netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x332/0x3d0 net/socket.c:745
       ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
       ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
       __sys_sendmsg net/socket.c:2668 [inline]
       __do_sys_sendmsg net/socket.c:2677 [inline]
       __se_sys_sendmsg net/socket.c:2675 [inline]
       __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
       x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Local variable req.i created at:
       inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
       inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
       sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
      
      CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d #32
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      
      Fixes: 432490f9 ("net: ip, diag -- Add diag interface for raw sockets")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      61cf1c73
    • Kuniyuki Iwashima's avatar
      tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO. · 4b74726c
      Kuniyuki Iwashima authored
      When we process segments with TCP AO, we don't check it in
      tcp_parse_options().  Thus, opt_rx->saw_unknown is set to 1,
      which unconditionally triggers the BPF TCP option parser.
      
      Let's avoid the unnecessary BPF invocation.
      
      Fixes: 0a3a8090 ("net/tcp: Verify inbound TCP-AO signed segments")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b74726c
    • Jakub Kicinski's avatar
      Merge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest' · aa09b7e0
      Jakub Kicinski authored
      Zijian Zhang says:
      
      ====================
      fix OOM and order check in msg_zerocopy selftest
      
      In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
      on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
      until the socket is not writable. Typically, it will start the receiving
      process after around 30+ sendmsgs. However, as the introduction of commit
      dfa2f048 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
      always writable and does not get any chance to run recv notifications.
      The selftest always exits with OUT_OF_MEMORY because the memory used by
      opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
      different value to trigger OOM on older kernels too.
      
      Thus, we introduce "cfg_notification_limit" to force sender to receive
      notifications after some number of sendmsgs.
      
      And, we find that when lock debugging is on, notifications may not come in
      order. Thus, we have order checking outputs managed by cfg_verbose, to
      avoid too many outputs in this case.
      ====================
      
      Link: https://patch.msgid.link/20240701225349.3395580-1-zijianzhang@bytedance.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa09b7e0