1. 04 Oct, 2024 8 commits
    • Kacper Ludwinski's avatar
      selftests: net: no_forwarding: fix VID for $swp2 in one_bridge_two_pvids() test · 9f49d14e
      Kacper Ludwinski authored
      Currently, the second bridge command overwrites the first one.
      Fix this by adding this VID to the interface behind $swp2.
      
      The one_bridge_two_pvids() test intends to check that there is no
      leakage of traffic between bridge ports which have a single VLAN - the
      PVID VLAN.
      
      Because of a typo, port $swp1 is configured with a PVID twice (second
      command overwrites first), and $swp2 isn't configured at all (and since
      the bridge vlan_default_pvid property is set to 0, this port will not
      have a PVID at all, so it will drop all untagged and priority-tagged
      traffic).
      
      So, instead of testing the configuration that was intended, we are
      testing a different one, where one port has PVID 2 and the other has
      no PVID. This incorrect version of the test should also pass, but is
      ineffective for its purpose, so fix the typo.
      
      This typo has an impact on results of the test,
      potentially leading to wrong conclusions regarding
      the functionality of a network device.
      
      The tests results:
      
      TEST: Switch ports in VLAN-aware bridge with different PVIDs:
      	Unicast non-IP untagged   [ OK ]
      	Multicast non-IP untagged   [ OK ]
      	Broadcast non-IP untagged   [ OK ]
      	Unicast IPv4 untagged   [ OK ]
      	Multicast IPv4 untagged   [ OK ]
      	Unicast IPv6 untagged   [ OK ]
      	Multicast IPv6 untagged   [ OK ]
      	Unicast non-IP VID 1   [ OK ]
      	Multicast non-IP VID 1   [ OK ]
      	Broadcast non-IP VID 1   [ OK ]
      	Unicast IPv4 VID 1   [ OK ]
      	Multicast IPv4 VID 1   [ OK ]
      	Unicast IPv6 VID 1   [ OK ]
      	Multicast IPv6 VID 1   [ OK ]
      	Unicast non-IP VID 4094   [ OK ]
      	Multicast non-IP VID 4094   [ OK ]
      	Broadcast non-IP VID 4094   [ OK ]
      	Unicast IPv4 VID 4094   [ OK ]
      	Multicast IPv4 VID 4094   [ OK ]
      	Unicast IPv6 VID 4094   [ OK ]
      	Multicast IPv6 VID 4094   [ OK ]
      
      Fixes: 476a4f05 ("selftests: forwarding: add a no_forwarding.sh test")
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarKacper Ludwinski <kac.ludwinski@icloud.com>
      Link: https://patch.msgid.link/20241002051016.849-1-kac.ludwinski@icloud.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f49d14e
    • Jakub Kicinski's avatar
      Merge branch 'ibmvnic-fix-for-send-scrq-direct' · 500257db
      Jakub Kicinski authored
      Nick Child says:
      
      ====================
      ibmvnic: Fix for send scrq direct
      
      This is a v2 of a patchset (now just patch) which addresses a
      bug in a new feature which is causing major link UP issues with
      certain physical cards.
      
      For a full summary of the issue:
        1. During vnic initialization we get the following values from vnic
           server regarding "Transmit / Receive Descriptor Requirement" (see
            PAPR Table 584. CAPABILITIES Commands):
          - LSO Tx frame = 0x0F , header offsets + L2, L3, L4 headers required
          - CSO Tx frame = 0x0C , header offsets + L2 header required
          - standard frame = 0x0C , header offsets + L2 header required
        2. Assume we are dealing with only "standard frames" from now on (no
           CSO, no LSO)
        3. When using 100G backing device, we don't hand vnic server any header
           information and TX is successful
        4. When using 25G backing device, we don't hand vnic server any header
          information and TX fails and we get "Adapter Error" transport events.
      The obvious issue here is that vnic client should be respecting the 0X0C
      header requirement for standard frames.  But 100G cards will also give
      0x0C despite the fact that we know TX works if we ignore it. That being
      said, we still must respect values given from the managing server. Will
      need to work with them going forward to hopefully get 100G cards to
      return 0x00 for this bitstring so the performance gains of using
      send_subcrq_direct can be continued.
      ====================
      
      Link: https://patch.msgid.link/20241001163200.1802522-1-nnac123@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      500257db
    • Nick Child's avatar
      ibmvnic: Inspect header requirements before using scrq direct · de390657
      Nick Child authored
      Previously, the TX header requirement for standard frames was ignored.
      This requirement is a bitstring sent from the VIOS which maps to the
      type of header information needed during TX. If no header information,
      is needed then send subcrq direct can be used (which can be more
      performant).
      
      This bitstring was previously ignored for standard packets (AKA non LSO,
      non CSO) due to the belief that the bitstring was over-cautionary. It
      turns out that there are some configurations where the backing device
      does need header information for transmission of standard packets. If
      the information is not supplied then this causes continuous "Adapter
      error" transport events. Therefore, this bitstring should be respected
      and observed before considering the use of send subcrq direct.
      
      Fixes: 74839f7a ("ibmvnic: Introduce send sub-crq direct")
      Signed-off-by: default avatarNick Child <nnac123@linux.ibm.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241001163200.1802522-2-nnac123@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      de390657
    • Jakub Kicinski's avatar
      Merge branch 'netfilter-br_netfilter-fix-panic-with-metadata_dst-skb' · 69ea1d4a
      Jakub Kicinski authored
      Andy Roulin says:
      
      ====================
      netfilter: br_netfilter: fix panic with metadata_dst skb
      
      There's a kernel panic possible in the br_netfilter module when sending
      untagged traffic via a VxLAN device. Traceback is included below.
      This happens during the check for fragmentation in br_nf_dev_queue_xmit
      if the MTU on the VxLAN device is not big enough.
      
      It is dependent on:
      1) the br_netfilter module being loaded;
      2) net.bridge.bridge-nf-call-iptables set to 1;
      3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
      4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
      
      This case was never supported in the first place, so the first patch drops
      such packets.
      
      A regression selftest is added as part of the second patch.
      
      PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
      [  176.291791] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000110
      [  176.292101] Mem abort info:
      [  176.292184]   ESR = 0x0000000096000004
      [  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  176.292530]   SET = 0, FnV = 0
      [  176.292709]   EA = 0, S1PTW = 0
      [  176.292862]   FSC = 0x04: level 0 translation fault
      [  176.293013] Data abort info:
      [  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
      [  176.294166] [0000000000000110] pgd=0000000000000000,
      p4d=0000000000000000
      [  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
      br_netfilter bridge stp llc ipv6 crct10dif_ce
      [  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
      6.8.0-rc3-g5b3fbd61 #2
      [  176.296314] Hardware name: linux,dummy-virt (DT)
      [  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
      BTYPE=--)
      [  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
      [  176.297636] sp : ffff800080003630
      [  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
      ffff6828c49ad9f8
      [  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
      00000000000003e8
      [  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
      ffff6828c3b16d28
      [  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
      0000000000000014
      [  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
      0000000095744632
      [  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
      ffffb7e137926a70
      [  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
      0000000000000000
      [  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
      f20e0100bebafeca
      [  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
      0000000000000000
      [  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
      ffff6828c7f918f0
      [  176.300889] Call trace:
      [  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
      [  176.301703]  nf_hook_slow+0x48/0x124
      [  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
      [  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
      [  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
      [  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
      [  176.303359]  nf_hook_slow+0x48/0x124
      [  176.303803]  __br_forward+0xc4/0x194 [bridge]
      [  176.304013]  br_flood+0xd4/0x168 [bridge]
      [  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
      [  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
      [  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
      [  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
      [  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
      [  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
      [  176.306197]  __netif_receive_skb+0x18/0x64
      [  176.306369]  process_backlog+0x80/0x124
      [  176.306540]  __napi_poll+0x38/0x17c
      [  176.306636]  net_rx_action+0x124/0x26c
      [  176.306758]  __do_softirq+0x100/0x26c
      [  176.307051]  ____do_softirq+0x10/0x1c
      [  176.307162]  call_on_irq_stack+0x24/0x4c
      [  176.307289]  do_softirq_own_stack+0x1c/0x2c
      [  176.307396]  do_softirq+0x54/0x6c
      [  176.307485]  __local_bh_enable_ip+0x8c/0x98
      [  176.307637]  __dev_queue_xmit+0x22c/0xd28
      [  176.307775]  neigh_resolve_output+0xf4/0x1a0
      [  176.308018]  ip_finish_output2+0x1c8/0x628
      [  176.308137]  ip_do_fragment+0x5b4/0x658
      [  176.308279]  ip_fragment.constprop.0+0x48/0xec
      [  176.308420]  __ip_finish_output+0xa4/0x254
      [  176.308593]  ip_finish_output+0x34/0x130
      [  176.308814]  ip_output+0x6c/0x108
      [  176.308929]  ip_send_skb+0x50/0xf0
      [  176.309095]  ip_push_pending_frames+0x30/0x54
      [  176.309254]  raw_sendmsg+0x758/0xaec
      [  176.309568]  inet_sendmsg+0x44/0x70
      [  176.309667]  __sys_sendto+0x110/0x178
      [  176.309758]  __arm64_sys_sendto+0x28/0x38
      [  176.309918]  invoke_syscall+0x48/0x110
      [  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
      [  176.310353]  do_el0_svc+0x1c/0x28
      [  176.310434]  el0_svc+0x34/0xb4
      [  176.310551]  el0t_64_sync_handler+0x120/0x12c
      [  176.310690]  el0t_64_sync+0x190/0x194
      [  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
      [  176.315743] ---[ end trace 0000000000000000 ]---
      [  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
      interrupt
      [  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
      [  176.316564] PHYS_OFFSET: 0xffff97d780000000
      [  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
      [  176.317210] Memory Limit: none
      [  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
      Exception in interrupt ]---\
      ====================
      
      Link: https://patch.msgid.link/20241001154400.22787-1-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      69ea1d4a
    • Andy Roulin's avatar
      selftests: add regression test for br_netfilter panic · bc4d22b7
      Andy Roulin authored
      Add a new netfilter selftests to test against br_netfilter panics when
      VxLAN single-device is used together with untagged traffic and high MTU.
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://patch.msgid.link/20241001154400.22787-3-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc4d22b7
    • Andy Roulin's avatar
      netfilter: br_netfilter: fix panic with metadata_dst skb · f9ff7665
      Andy Roulin authored
      Fix a kernel panic in the br_netfilter module when sending untagged
      traffic via a VxLAN device.
      This happens during the check for fragmentation in br_nf_dev_queue_xmit.
      
      It is dependent on:
      1) the br_netfilter module being loaded;
      2) net.bridge.bridge-nf-call-iptables set to 1;
      3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
      4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
      
      When forwarding the untagged packet to the VxLAN bridge port, before
      the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
      changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
      of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
      
      Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
      for frames that needs to be fragmented: frames with higher MTU than the
      VxLAN device end up calling br_nf_ip_fragment, which in turns call
      ip_skb_dst_mtu.
      
      The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
      with valid dst->dev, thus the crash.
      
      This case was never supported in the first place, so drop the packet
      instead.
      
      PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
      [  176.291791] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000110
      [  176.292101] Mem abort info:
      [  176.292184]   ESR = 0x0000000096000004
      [  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  176.292530]   SET = 0, FnV = 0
      [  176.292709]   EA = 0, S1PTW = 0
      [  176.292862]   FSC = 0x04: level 0 translation fault
      [  176.293013] Data abort info:
      [  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
      [  176.294166] [0000000000000110] pgd=0000000000000000,
      p4d=0000000000000000
      [  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
      br_netfilter bridge stp llc ipv6 crct10dif_ce
      [  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
      6.8.0-rc3-g5b3fbd61 #2
      [  176.296314] Hardware name: linux,dummy-virt (DT)
      [  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
      BTYPE=--)
      [  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
      [  176.297636] sp : ffff800080003630
      [  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
      ffff6828c49ad9f8
      [  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
      00000000000003e8
      [  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
      ffff6828c3b16d28
      [  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
      0000000000000014
      [  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
      0000000095744632
      [  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
      ffffb7e137926a70
      [  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
      0000000000000000
      [  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
      f20e0100bebafeca
      [  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
      0000000000000000
      [  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
      ffff6828c7f918f0
      [  176.300889] Call trace:
      [  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
      [  176.301703]  nf_hook_slow+0x48/0x124
      [  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
      [  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
      [  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
      [  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
      [  176.303359]  nf_hook_slow+0x48/0x124
      [  176.303803]  __br_forward+0xc4/0x194 [bridge]
      [  176.304013]  br_flood+0xd4/0x168 [bridge]
      [  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
      [  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
      [  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
      [  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
      [  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
      [  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
      [  176.306197]  __netif_receive_skb+0x18/0x64
      [  176.306369]  process_backlog+0x80/0x124
      [  176.306540]  __napi_poll+0x38/0x17c
      [  176.306636]  net_rx_action+0x124/0x26c
      [  176.306758]  __do_softirq+0x100/0x26c
      [  176.307051]  ____do_softirq+0x10/0x1c
      [  176.307162]  call_on_irq_stack+0x24/0x4c
      [  176.307289]  do_softirq_own_stack+0x1c/0x2c
      [  176.307396]  do_softirq+0x54/0x6c
      [  176.307485]  __local_bh_enable_ip+0x8c/0x98
      [  176.307637]  __dev_queue_xmit+0x22c/0xd28
      [  176.307775]  neigh_resolve_output+0xf4/0x1a0
      [  176.308018]  ip_finish_output2+0x1c8/0x628
      [  176.308137]  ip_do_fragment+0x5b4/0x658
      [  176.308279]  ip_fragment.constprop.0+0x48/0xec
      [  176.308420]  __ip_finish_output+0xa4/0x254
      [  176.308593]  ip_finish_output+0x34/0x130
      [  176.308814]  ip_output+0x6c/0x108
      [  176.308929]  ip_send_skb+0x50/0xf0
      [  176.309095]  ip_push_pending_frames+0x30/0x54
      [  176.309254]  raw_sendmsg+0x758/0xaec
      [  176.309568]  inet_sendmsg+0x44/0x70
      [  176.309667]  __sys_sendto+0x110/0x178
      [  176.309758]  __arm64_sys_sendto+0x28/0x38
      [  176.309918]  invoke_syscall+0x48/0x110
      [  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
      [  176.310353]  do_el0_svc+0x1c/0x28
      [  176.310434]  el0_svc+0x34/0xb4
      [  176.310551]  el0t_64_sync_handler+0x120/0x12c
      [  176.310690]  el0t_64_sync+0x190/0x194
      [  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
      [  176.315743] ---[ end trace 0000000000000000 ]---
      [  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
      interrupt
      [  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
      [  176.316564] PHYS_OFFSET: 0xffff97d780000000
      [  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
      [  176.317210] Memory Limit: none
      [  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
      Exception in interrupt ]---\
      
      Fixes: 11538d03 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9ff7665
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix reception from VLAN-unaware bridges · 1f9fc48f
      Vladimir Oltean authored
      The blamed commit introduced an unexpected regression in the sja1105
      driver. Packets from VLAN-unaware bridge ports get received correctly,
      but the protocol stack can't seem to decode them properly.
      
      For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit
      did introduce a functional change: dsa_switch_rcv() used to call
      dsa_untag_bridge_pvid(), which looked like this:
      
      	err = br_vlan_get_proto(br, &proto);
      	if (err)
      		return skb;
      
      	/* Move VLAN tag from data to hwaccel */
      	if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) {
      		skb = skb_vlan_untag(skb);
      		if (!skb)
      			return NULL;
      	}
      
      and now it calls dsa_software_vlan_untag() which has just this:
      
      	/* Move VLAN tag from data to hwaccel */
      	if (!skb_vlan_tag_present(skb)) {
      		skb = skb_vlan_untag(skb);
      		if (!skb)
      			return NULL;
      	}
      
      thus lacks any skb->protocol == bridge VLAN protocol check. That check
      is deferred until a later check for skb->vlan_proto (in the hwaccel area).
      
      The new code is problematic because, for VLAN-untagged packets,
      skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType
      and turns them into a hwaccel VLAN tag. This is what breaks the protocol
      stack.
      
      It would be tempting to "make it work as before" and only call
      skb_vlan_untag() for those packets with the skb->protocol actually
      representing a VLAN.
      
      But the premise of the newly introduced dsa_software_vlan_untag() core
      function is not wrong. Drivers set ds->untag_bridge_pvid or
      ds->untag_vlan_aware_bridge_pvid presumably because they send all
      traffic to the CPU reception path as VLAN-tagged. So why should we spend
      any additional CPU cycles assuming that the packet may be VLAN-untagged?
      And why does the sja1105 driver opt into ds->untag_bridge_pvid if it
      doesn't always deliver packets to the CPU as VLAN-tagged?
      
      The answer to the latter question is indeed more interesting: it doesn't
      need to. This got done in commit 884be12f ("net: dsa: sja1105: add
      support for imprecise RX"), because I thought it would be needed, but I
      didn't realize that it doesn't actually make a difference.
      
      As explained in the commit message of the blamed patch, ds->untag_bridge_pvid
      only makes a difference in the VLAN-untagged receive path of a bridge port.
      However, in that operating mode, tag_sja1105.c makes use of VLAN tags
      with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags
      as if they were DSA tags (aka tag_8021q operation). Even if commit
      884be12f ("net: dsa: sja1105: add support for imprecise RX") added
      this logic in sja1105_bridge_vlan_add():
      
      	/* Always install bridge VLANs as egress-tagged on the CPU port. */
      	if (dsa_is_cpu_port(ds, port))
      		flags = 0;
      
      that was for _bridge_ VLANs, which are _not_ committed to hardware
      in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does
      anything at all). Even prior to that change, the tag_8021q VLANs
      were always installed as egress-tagged on the CPU port, see
      dsa_switch_tag_8021q_vlan_add():
      
      	u16 flags = 0; // egress-tagged, non-PVID
      
      	if (dsa_port_is_user(dp))
      		flags |= BRIDGE_VLAN_INFO_UNTAGGED |
      			 BRIDGE_VLAN_INFO_PVID;
      
      	err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid,
      					     flags);
      	if (err)
      		return err;
      
      Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid,
      rather than ds->untag_bridge_pvid, is a separate discussion. To fix the
      current bug in VLAN-unaware bridge mode, I would argue that the sja1105
      driver should not request something it doesn't need, rather than
      complicating the core DSA helper. Whereas before the blamed commit, this
      setting was harmless, now it has caused breakage.
      
      Fixes: 93e4649e ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f9fc48f
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 096c0fa4
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf)
      
      This series contains updates to ice and idpf drivers:
      
      For ice:
      
      Michal corrects setting of dst VSI on LAN filters and adds clearing of
      port VLAN configuration during reset.
      
      Gui-Dong Han corrects failures to decrement refcount in some error
      paths.
      
      Przemek resolves a memory leak in ice_init_tx_topology().
      
      Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper
      value.
      
      Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly
      restored after reset.
      
      For idpf:
      
      Ahmed sets uninitialized dyn_ctl_intrvl_s value.
      
      Josh corrects use and reporting of mailbox size.
      
      Larysa corrects order of function calls during de-initialization.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        idpf: deinit virtchnl transaction manager after vport and vectors
        idpf: use actual mbx receive payload length
        idpf: fix VF dynamic interrupt ctl register initialization
        ice: fix VLAN replay after reset
        ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins
        ice: fix memleak in ice_init_tx_topology()
        ice: clear port vlan config during reset
        ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count()
        ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins()
        ice: set correct dst VSI in only LAN filters
      ====================
      
      Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      096c0fa4
  2. 03 Oct, 2024 32 commits
    • Leo Stone's avatar
      Documentation: networking/tcp_ao: typo and grammar fixes · 2d7a098b
      Leo Stone authored
      Fix multiple grammatical issues and add a missing period to improve
      readability.
      Signed-off-by: default avatarLeo Stone <leocstone@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240929005001.370991-1-leocstone@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d7a098b
    • Jakub Kicinski's avatar
      Merge branch 'rxrpc-miscellaneous-fixes' · 35f12108
      Jakub Kicinski authored
      David Howells says:
      
      ====================
      rxrpc: Miscellaneous fixes
      
      Here some miscellaneous fixes for AF_RXRPC:
      
       (1) Fix a race in the I/O thread vs UDP socket setup.
      
       (2) Fix an uninitialised variable.
      ====================
      
      Link: https://patch.msgid.link/20241001132702.3122709-1-dhowells@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      35f12108
    • David Howells's avatar
      rxrpc: Fix uninitialised variable in rxrpc_send_data() · 7a310f8d
      David Howells authored
      Fix the uninitialised txb variable in rxrpc_send_data() by moving the code
      that loads it above all the jumps to maybe_error, txb being stored back
      into call->tx_pending right before the normal return.
      
      Fixes: b0f571ec ("rxrpc: Fix locking in rxrpc's sendmsg")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Closes: https://lists.infradead.org/pipermail/linux-afs/2024-October/008896.htmlSigned-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Link: https://patch.msgid.link/20241001132702.3122709-3-dhowells@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a310f8d
    • David Howells's avatar
      rxrpc: Fix a race between socket set up and I/O thread creation · bc212465
      David Howells authored
      In rxrpc_open_socket(), it sets up the socket and then sets up the I/O
      thread that will handle it.  This is a problem, however, as there's a gap
      between the two phases in which a packet may come into rxrpc_encap_rcv()
      from the UDP packet but we oops when trying to wake the not-yet created I/O
      thread.
      
      As a quick fix, just make rxrpc_encap_rcv() discard the packet if there's
      no I/O thread yet.
      
      A better, but more intrusive fix would perhaps be to rearrange things such
      that the socket creation is done by the I/O thread.
      
      Fixes: a275da62 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: yuxuanzhe@outlook.com
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: Simon Horman <horms@kernel.org>
      cc: linux-afs@lists.infradead.org
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241001132702.3122709-2-dhowells@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc212465
    • Jakub Kicinski's avatar
      Merge branch 'tcp-3-fixes-for-retrans_stamp-and-undo-logic' · 9af25dd9
      Jakub Kicinski authored
      Neal Cardwell says:
      
      ====================
      tcp: 3 fixes for retrans_stamp and undo logic
      
      Geumhwan Yu <geumhwan.yu@samsung.com> recently reported and diagnosed
      a regression in TCP loss recovery undo logic in the case where a TCP
      connection enters fast recovery, is unable to retransmit anything due to
      TSQ, and then receives an ACK allowing forward progress. The sender should
      be able to undo the spurious loss recovery in this case, but was not doing
      so. The first patch fixes this regression.
      
      Running our suite of packetdrill tests with the first fix, the tests
      highlighted two other small bugs in the way retrans_stamp is updated in
      some rare corner cases. The second two patches fix those other two small
      bugs.
      
      Thanks to Geumhwan Yu for the bug report!
      ====================
      
      Link: https://patch.msgid.link/20241001200517.2756803-1-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9af25dd9
    • Neal Cardwell's avatar
      tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits out · 27c80efc
      Neal Cardwell authored
      Fix tcp_rcv_synrecv_state_fastopen() to not zero retrans_stamp
      if retransmits are outstanding.
      
      tcp_fastopen_synack_timer() sets retrans_stamp, so typically we'll
      need to zero retrans_stamp here to prevent spurious
      retransmits_timed_out(). The logic to zero retrans_stamp is from this
      2019 commit:
      
      commit cd736d8b ("tcp: fix retrans timestamp on passive Fast Open")
      
      However, in the corner case where the ACK of our TFO SYNACK carried
      some SACK blocks that caused us to enter TCP_CA_Recovery then that
      non-zero retrans_stamp corresponds to the active fast recovery, and we
      need to leave retrans_stamp with its current non-zero value, for
      correct ETIMEDOUT and undo behavior.
      
      Fixes: cd736d8b ("tcp: fix retrans timestamp on passive Fast Open")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241001200517.2756803-4-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      27c80efc
    • Neal Cardwell's avatar
      tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe · b41b4cbd
      Neal Cardwell authored
      Fix tcp_enter_recovery() so that if there are no retransmits out then
      we zero retrans_stamp when entering fast recovery. This is necessary
      to fix two buggy behaviors.
      
      Currently a non-zero retrans_stamp value can persist across multiple
      back-to-back loss recovery episodes. This is because we generally only
      clears retrans_stamp if we are completely done with loss recoveries,
      and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
      behavior causes two bugs:
      
      (1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
      immediately by a new CA_Recovery, the retrans_stamp value can persist
      and can be a time before this new CA_Recovery episode starts. That
      means that timestamp-based undo will be using the wrong retrans_stamp
      (a value that is too old) when comparing incoming TS ecr values to
      retrans_stamp to see if the current fast recovery episode can be
      undone.
      
      (2) If there is a roughly minutes-long sequence of back-to-back fast
      recovery episodes, one after another (e.g. in a shallow-buffered or
      policed bottleneck), where each fast recovery successfully makes
      forward progress and recovers one window of sequence space (but leaves
      at least one retransmit in flight at the end of the recovery),
      followed by several RTOs, then the ETIMEDOUT check may be using the
      wrong retrans_stamp (a value set at the start of the first fast
      recovery in the sequence). This can cause a very premature ETIMEDOUT,
      killing the connection prematurely.
      
      This commit changes the code to zero retrans_stamp when entering fast
      recovery, when this is known to be safe (no retransmits are out in the
      network). That ensures that when starting a fast recovery episode, and
      it is safe to do so, retrans_stamp is set when we send the fast
      retransmit packet. That addresses both bug (1) and bug (2) by ensuring
      that (if no retransmits are out when we start a fast recovery) we use
      the initial fast retransmit of this fast recovery as the time value
      for undo and ETIMEDOUT calculations.
      
      This makes intuitive sense, since the start of a new fast recovery
      episode (in a scenario where no lost packets are out in the network)
      means that the connection has made forward progress since the last RTO
      or fast recovery, and we should thus "restart the clock" used for both
      undo and ETIMEDOUT logic.
      
      Note that if when we start fast recovery there *are* retransmits out
      in the network, there can still be undesirable (1)/(2) issues. For
      example, after this patch we can still have the (1) and (2) problems
      in cases like this:
      
      + round 1: sender sends flight 1
      
      + round 2: sender receives SACKs and enters fast recovery 1,
        retransmits some packets in flight 1 and then sends some new data as
        flight 2
      
      + round 3: sender receives some SACKs for flight 2, notes losses, and
        retransmits some packets to fill the holes in flight 2
      
      + fast recovery has some lost retransmits in flight 1 and continues
        for one or more rounds sending retransmits for flight 1 and flight 2
      
      + fast recovery 1 completes when snd_una reaches high_seq at end of
        flight 1
      
      + there are still holes in the SACK scoreboard in flight 2, so we
        enter fast recovery 2, but some retransmits in the flight 2 sequence
        range are still in flight (retrans_out > 0), so we can't execute the
        new retrans_stamp=0 added here to clear retrans_stamp
      
      It's not yet clear how to fix these remaining (1)/(2) issues in an
      efficient way without breaking undo behavior, given that retrans_stamp
      is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
      expensive) strategy would be to set retrans_stamp to the timestamp of
      the earliest outstanding retransmit when entering fast recovery. But
      at least this commit makes things better.
      
      Note that this does not change the semantics of retrans_stamp; it
      simply makes retrans_stamp accurate in some cases where it was not
      before:
      
      (1) Some loss recovery, followed by an immediate entry into a fast
      recovery, where there are no retransmits out when entering the fast
      recovery.
      
      (2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
      and then the ACK that completes the 3-way handshake has SACK blocks
      that trigger a fast recovery. In this case when entering fast recovery
      we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
      and set the retrans_stamp based on the timestamp of the fast recovery.
      
      We introduce a tcp_retrans_stamp_cleanup() helper, because this
      two-line sequence already appears in 3 places and is about to appear
      in 2 more as a result of this bug fix patch series. Once this bug fix
      patches series in the net branch makes it into the net-next branch
      we'll update the 3 other call sites to use the new helper.
      
      This is a long-standing issue. The Fixes tag below is chosen to be the
      oldest commit at which the patch will apply cleanly, which is from
      Linux v3.5 in 2012.
      
      Fixes: 1fbc3405 ("tcp: early retransmit: tcp_enter_recovery()")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241001200517.2756803-3-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b41b4cbd
    • Neal Cardwell's avatar
      tcp: fix to allow timestamp undo if no retransmits were sent · e37ab737
      Neal Cardwell authored
      Fix the TCP loss recovery undo logic in tcp_packet_delayed() so that
      it can trigger undo even if TSQ prevents a fast recovery episode from
      reaching tcp_retransmit_skb().
      
      Geumhwan Yu <geumhwan.yu@samsung.com> recently reported that after
      this commit from 2019:
      
      commit bc9f38c8 ("tcp: avoid unconditional congestion window undo
      on SYN retransmit")
      
      ...and before this fix we could have buggy scenarios like the
      following:
      
      + Due to reordering, a TCP connection receives some SACKs and enters a
        spurious fast recovery.
      
      + TSQ prevents all invocations of tcp_retransmit_skb(), because many
        skbs are queued in lower layers of the sending machine's network
        stack; thus tp->retrans_stamp remains 0.
      
      + The connection receives a TCP timestamp ECR value echoing a
        timestamp before the fast recovery, indicating that the fast
        recovery was spurious.
      
      + The connection fails to undo the spurious fast recovery because
        tp->retrans_stamp is 0, and thus tcp_packet_delayed() returns false,
        due to the new logic in the 2019 commit: commit bc9f38c8 ("tcp:
        avoid unconditional congestion window undo on SYN retransmit")
      
      This fix tweaks the logic to be more similar to the
      tcp_packet_delayed() logic before bc9f38c8, except that we take
      care not to be fooled by the FLAG_SYN_ACKED code path zeroing out
      tp->retrans_stamp (the bug noted and fixed by Yuchung in
      bc9f38c8).
      
      Note that this returns the high-level behavior of tcp_packet_delayed()
      to again match the comment for the function, which says: "Nothing was
      retransmitted or returned timestamp is less than timestamp of the
      first retransmission." Note that this comment is in the original
      2005-04-16 Linux git commit, so this is evidently long-standing
      behavior.
      
      Fixes: bc9f38c8 ("tcp: avoid unconditional congestion window undo on SYN retransmit")
      Reported-by: default avatarGeumhwan Yu <geumhwan.yu@samsung.com>
      Diagnosed-by: default avatarGeumhwan Yu <geumhwan.yu@samsung.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241001200517.2756803-2-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e37ab737
    • Jakub Kicinski's avatar
      Merge branch 'fix-aqr-pma-capabilities' · ec636707
      Jakub Kicinski authored
      Abhishek Chauhan says:
      
      ====================
      Fix AQR PMA capabilities
      
      Patch 1:-
      AQR115c reports incorrect PMA capabilities which includes
      10G/5G and also incorrectly disables capabilities like autoneg
      and 10Mbps support.
      
      AQR115c as per the Marvell databook supports speeds up to 2.5Gbps
      with autonegotiation.
      
      Patch 2:-
      Remove the use of phy_set_max_speed in phy driver as the
      function is mainly used in MAC driver to set the max
      speed.
      
      Instead use get_features to fix up Phy PMA capabilities for
      AQR111, AQR111B0, AQR114C and AQCS109
      ====================
      
      Link: https://patch.msgid.link/20241001224626.2400222-1-quic_abchauha@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec636707
    • Abhishek Chauhan's avatar
      net: phy: aquantia: remove usage of phy_set_max_speed · 8f61d733
      Abhishek Chauhan authored
      Remove the use of phy_set_max_speed in phy driver as the
      function is mainly used in MAC driver to set the max
      speed.
      
      Instead use get_features to fix up Phy PMA capabilities for
      AQR111, AQR111B0, AQR114C and AQCS109
      
      Fixes: 038ba1dc ("net: phy: aquantia: add AQR111 and AQR111B0 PHY ID")
      Fixes: 0974f1f0 ("net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109")
      Fixes: c278ec64 ("net: phy: aquantia: add support for AQR114C PHY ID")
      Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/Signed-off-by: default avatarAbhishek Chauhan <quic_abchauha@quicinc.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://patch.msgid.link/20241001224626.2400222-3-quic_abchauha@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f61d733
    • Abhishek Chauhan's avatar
      net: phy: aquantia: AQR115c fix up PMA capabilities · 17cbfcdd
      Abhishek Chauhan authored
      AQR115c reports incorrect PMA capabilities which includes
      10G/5G and also incorrectly disables capabilities like autoneg
      and 10Mbps support.
      
      AQR115c as per the Marvell databook supports speeds up to 2.5Gbps
      with autonegotiation.
      
      Fixes: 0ebc581f ("net: phy: aquantia: add support for aqr115c")
      Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/Signed-off-by: default avatarAbhishek Chauhan <quic_abchauha@quicinc.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://patch.msgid.link/20241001224626.2400222-2-quic_abchauha@quicinc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      17cbfcdd
    • Sebastian Andrzej Siewior's avatar
      sfc: Don't invoke xdp_do_flush() from netpoll. · 55e80246
      Sebastian Andrzej Siewior authored
      Yury reported a crash in the sfc driver originated from
      netpoll_send_udp(). The netconsole sends a message and then netpoll
      invokes the driver's NAPI function with a budget of zero. It is
      dedicated to allow driver to free TX resources, that it may have used
      while sending the packet.
      
      In the netpoll case the driver invokes xdp_do_flush() unconditionally,
      leading to crash because bpf_net_context was never assigned.
      
      Invoke xdp_do_flush() only if budget is not zero.
      
      Fixes: 401cb7da ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
      Reported-by: default avatarYury Vostrikov <mon@unformed.ru>
      Closes: https://lore.kernel.org/5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.comSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Link: https://patch.msgid.link/20241002125837.utOcRo6Y@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55e80246
    • Ingo van Lil's avatar
      net: phy: dp83869: fix memory corruption when enabling fiber · a842e443
      Ingo van Lil authored
      When configuring the fiber port, the DP83869 PHY driver incorrectly
      calls linkmode_set_bit() with a bit mask (1 << 10) rather than a bit
      number (10). This corrupts some other memory location -- in case of
      arm64 the priv pointer in the same structure.
      
      Since the advertising flags are updated from supported at the end of the
      function the incorrect line isn't needed at all and can be removed.
      
      Fixes: a29de52b ("net: dp83869: Add ability to advertise Fiber connection")
      Signed-off-by: default avatarIngo van Lil <inguin@gmx.de>
      Reviewed-by: default avatarAlexander Sverdlin <alexander.sverdlin@siemens.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241002161807.440378-1-inguin@gmx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a842e443
    • Linus Torvalds's avatar
      Merge tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8c245fe7
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from ieee802154, bluetooth and netfilter.
      
        Current release - regressions:
      
         - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc
      
         - eth: am65-cpsw: fix forever loop in cleanup code
      
        Current release - new code bugs:
      
         - eth: mlx5: HWS, fixed double-free in error flow of creating SQ
      
        Previous releases - regressions:
      
         - core: avoid potential underflow in qdisc_pkt_len_init() with UFO
      
         - core: test for not too small csum_start in virtio_net_hdr_to_skb()
      
         - vrf: revert "vrf: remove unnecessary RCU-bh critical section"
      
         - bluetooth:
             - fix uaf in l2cap_connect
             - fix possible crash on mgmt_index_removed
      
         - dsa: improve shutdown sequence
      
         - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq
      
         - eth: ip_gre: fix drops of small packets in ipgre_xmit
      
        Previous releases - always broken:
      
         - core: fix gso_features_check to check for both
           dev->gso_{ipv4_,}max_size
      
         - core: fix tcp fraglist segmentation after pull from frag_list
      
         - netfilter: nf_tables: prevent nf_skb_duplicated corruption
      
         - sctp: set sk_state back to CLOSED if autobind fails in
           sctp_listen_start
      
         - mac802154: fix potential RCU dereference issue in
           mac802154_scan_worker
      
         - eth: fec: restart PPS after link state change"
      
      * tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
        sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
        dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems
        doc: net: napi: Update documentation for napi_schedule_irqoff
        net/ncsi: Disable the ncsi work before freeing the associated structure
        net: phy: qt2025: Fix warning: unused import DeviceId
        gso: fix udp gso fraglist segmentation after pull from frag_list
        bridge: mcast: Fail MDB get request on empty entry
        vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
        net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code
        net: phy: realtek: Check the index value in led_hw_control_get
        ppp: do not assume bh is held in ppp_channel_bridge_input()
        selftests: rds: move include.sh to TEST_FILES
        net: test for not too small csum_start in virtio_net_hdr_to_skb()
        net: gso: fix tcp fraglist segmentation after pull from frag_list
        ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
        net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
        net: add more sanity checks to qdisc_pkt_len_init()
        net: avoid potential underflow in qdisc_pkt_len_init() with UFO
        net: ethernet: ti: cpsw_ale: Fix warning on some platforms
        net: microchip: Make FDMA config symbol invisible
        ...
      8c245fe7
    • Linus Torvalds's avatar
      Merge tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbd · 9c02404b
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
      
       - small cleanup patches leveraging struct size to improve access bounds checking
      
      * tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbd:
        ksmbd: Use struct_size() to improve smb_direct_rdma_xmit()
        ksmbd: Annotate struct copychunk_ioctl_req with __counted_by_le()
        ksmbd: Use struct_size() to improve get_file_alternate_info()
      9c02404b
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 20c2474f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "vfs:
      
         - Ensure that iter_folioq_get_pages() advances to the next slot
           otherwise it will end up using the same folio with an out-of-bound
           offset.
      
        iomap:
      
         - Dont unshare delalloc extents which can't be reflinked, and thus
           can't be shared.
      
         - Constrain the file range passed to iomap_file_unshare() directly in
           iomap instead of requiring the callers to do it.
      
        netfs:
      
         - Use folioq_count instead of folioq_nr_slot to prevent an
           unitialized value warning in netfs_clear_buffer().
      
         - Fix missing wakeup after issuing writes by scheduling the write
           collector only if all the subrequest queues are empty and thus no
           writes are pending.
      
         - Fix two minor documentation bugs"
      
      * tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        iomap: constrain the file range passed to iomap_file_unshare
        iomap: don't bother unsharing delalloc extents
        netfs: Fix missing wakeup after issuing writes
        Documentation: add missing folio_queue entry
        folio_queue: fix documentation
        netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer
        iov_iter: fix advancing slot in iter_folioq_get_pages()
      20c2474f
    • Xin Long's avatar
      sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start · 8beee4d8
      Xin Long authored
      In sctp_listen_start() invoked by sctp_inet_listen(), it should set the
      sk_state back to CLOSED if sctp_autobind() fails due to whatever reason.
      
      Otherwise, next time when calling sctp_inet_listen(), if sctp_sk(sk)->reuse
      is already set via setsockopt(SCTP_REUSE_PORT), sctp_sk(sk)->bind_hash will
      be dereferenced as sk_state is LISTENING, which causes a crash as bind_hash
      is NULL.
      
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        RIP: 0010:sctp_inet_listen+0x7f0/0xa20 net/sctp/socket.c:8617
        Call Trace:
         <TASK>
         __sys_listen_socket net/socket.c:1883 [inline]
         __sys_listen+0x1b7/0x230 net/socket.c:1894
         __do_sys_listen net/socket.c:1902 [inline]
      
      Fixes: 5e8f3f70 ("sctp: simplify sctp listening code")
      Reported-by: syzbot+f4e0f821e3a3b7cee51d@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Link: https://patch.msgid.link/a93e655b3c153dc8945d7a812e6d8ab0d52b7aa0.1727729391.git.lucien.xin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8beee4d8
    • Ravikanth Tuniki's avatar
      dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems · c6929644
      Ravikanth Tuniki authored
      Add missing reg minItems as based on current binding document
      only ethernet MAC IO space is a supported configuration.
      
      There is a bug in schema, current examples contain 64-bit
      addressing as well as 32-bit addressing. The schema validation
      does pass incidentally considering one 64-bit reg address as
      two 32-bit reg address entries. If we change axi_ethernet_eth1
      example node reg addressing to 32-bit schema validation reports:
      
      Documentation/devicetree/bindings/net/xlnx,axi-ethernet.example.dtb:
      ethernet@40000000: reg: [[1073741824, 262144]] is too short
      
      To fix it add missing reg minItems constraints and to make things clearer
      stick to 32-bit addressing in examples.
      
      Fixes: cbb1ca6d ("dt-bindings: net: xlnx,axi-ethernet: convert bindings document to yaml")
      Signed-off-by: default avatarRavikanth Tuniki <ravikanth.tuniki@amd.com>
      Signed-off-by: default avatarRadhey Shyam Pandey <radhey.shyam.pandey@amd.com>
      Acked-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Link: https://patch.msgid.link/1727723615-2109795-1-git-send-email-radhey.shyam.pandey@amd.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c6929644
    • Sean Anderson's avatar
      doc: net: napi: Update documentation for napi_schedule_irqoff · b63ad06d
      Sean Anderson authored
      Since commit 8380c81d ("net: Treat __napi_schedule_irqoff() as
      __napi_schedule() on PREEMPT_RT"), napi_schedule_irqoff will do the
      right thing if IRQs are threaded. Therefore, there is no need to use
      IRQF_NO_THREAD.
      Signed-off-by: default avatarSean Anderson <sean.anderson@linux.dev>
      Reviewed-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://patch.msgid.link/20240930153955.971657-1-sean.anderson@linux.devSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b63ad06d
    • Paolo Abeni's avatar
      Merge tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 1127c73a
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h
         regarding flowtable hooks, from Phil Sutter.
      
      2) Fix nft_audit.sh selftests with newer nft binaries, due to different
         (valid) audit output, also from Phil.
      
      3) Disable BH when duplicating packets via nf_dup infrastructure,
         otherwise race on nf_skb_duplicated for locally generated traffic.
         From Eric.
      
      4) Missing return in callback of selftest C program, from zhang jiao.
      
      netfilter pull request 24-10-02
      
      * tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: netfilter: Add missing return value
        netfilter: nf_tables: prevent nf_skb_duplicated corruption
        selftests: netfilter: Fix nft_audit.sh for newer nft binaries
        netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED
      ====================
      
      Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1127c73a
    • Darrick J. Wong's avatar
      iomap: constrain the file range passed to iomap_file_unshare · a311a08a
      Darrick J. Wong authored
      File contents can only be shared (i.e. reflinked) below EOF, so it makes
      no sense to try to unshare ranges beyond EOF.  Constrain the file range
      parameters here so that we don't have to do that in the callers.
      
      Fixes: 5f4e5752 ("fs: add iomap_file_dirty")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogsReviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a311a08a
    • Darrick J. Wong's avatar
      iomap: don't bother unsharing delalloc extents · f7a4874d
      Darrick J. Wong authored
      If unshare encounters a delalloc reservation in the srcmap, that means
      that the file range isn't shared because delalloc reservations cannot be
      reflinked.  Therefore, don't try to unshare them.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20241002150040.GB21853@frogsfrogsfrogsReviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f7a4874d
    • Eddie James's avatar
      net/ncsi: Disable the ncsi work before freeing the associated structure · a0ffa68c
      Eddie James authored
      The work function can run after the ncsi device is freed, resulting
      in use-after-free bugs or kernel panic.
      
      Fixes: 2d283bdd ("net/ncsi: Resource management")
      Signed-off-by: default avatarEddie James <eajames@linux.ibm.com>
      Link: https://patch.msgid.link/20240925155523.1017097-1-eajames@linux.ibm.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a0ffa68c
    • FUJITA Tomonori's avatar
      net: phy: qt2025: Fix warning: unused import DeviceId · fa7dfeae
      FUJITA Tomonori authored
      Fix the following warning when the driver is compiled as built-in:
      
            warning: unused import: `DeviceId`
            --> drivers/net/phy/qt2025.rs:18:5
            |
         18 |     DeviceId, Driver,
            |     ^^^^^^^^
            |
            = note: `#[warn(unused_imports)]` on by default
      
      device_table in module_phy_driver macro is defined only when the
      driver is built as a module. Use phy::DeviceId in the macro instead of
      importing `DeviceId` since `phy` is always used.
      
      Fixes: fd3eaad8 ("net: phy: add Applied Micro QT2025 PHY driver")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202409190717.i135rfVo-lkp@intel.com/Reviewed-by: default avatarAlice Ryhl <aliceryhl@google.com>
      Reviewed-by: default avatarTrevor Gross <tmgross@umich.edu>
      Signed-off-by: default avatarFUJITA Tomonori <fujita.tomonori@gmail.com>
      Reviewed-by: default avatarFiona Behrens <me@kloenk.dev>
      Acked-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Link: https://patch.msgid.link/20240926121404.242092-1-fujita.tomonori@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa7dfeae
    • Willem de Bruijn's avatar
      gso: fix udp gso fraglist segmentation after pull from frag_list · a1e40ac5
      Willem de Bruijn authored
      Detect gso fraglist skbs with corrupted geometry (see below) and
      pass these to skb_segment instead of skb_segment_list, as the first
      can segment them correctly.
      
      Valid SKB_GSO_FRAGLIST skbs
      - consist of two or more segments
      - the head_skb holds the protocol headers plus first gso_size
      - one or more frag_list skbs hold exactly one segment
      - all but the last must be gso_size
      
      Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
      modify these skbs, breaking these invariants.
      
      In extreme cases they pull all data into skb linear. For UDP, this
      causes a NULL ptr deref in __udpv4_gso_segment_list_csum at
      udp_hdr(seg->next)->dest.
      
      Detect invalid geometry due to pull, by checking head_skb size.
      Don't just drop, as this may blackhole a destination. Convert to be
      able to pass to regular skb_segment.
      
      Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: stable@vger.kernel.org
      Link: https://patch.msgid.link/20241001171752.107580-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1e40ac5
    • Ido Schimmel's avatar
      bridge: mcast: Fail MDB get request on empty entry · 555f45d2
      Ido Schimmel authored
      When user space deletes a port from an MDB entry, the port is removed
      synchronously. If this was the last port in the entry and the entry is
      not joined by the host itself, then the entry is scheduled for deletion
      via a timer.
      
      The above means that it is possible for the MDB get netlink request to
      retrieve an empty entry which is scheduled for deletion. This is
      problematic as after deleting the last port in an entry, user space
      cannot rely on a non-zero return code from the MDB get request as an
      indication that the port was successfully removed.
      
      Fix by returning an error when the entry's port list is empty and the
      entry is not joined by the host.
      
      Fixes: 68b380a3 ("bridge: mcast: Add MDB get support")
      Reported-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Closes: https://lore.kernel.org/netdev/c92569919307749f879b9482b0f3e125b7d9d2e3.1726480066.git.jamie.bainbridge@gmail.com/Tested-by: default avatarJamie Bainbridge <jamie.bainbridge@gmail.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://patch.msgid.link/20240929123640.558525-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      555f45d2
    • Willem de Bruijn's avatar
      vrf: revert "vrf: Remove unnecessary RCU-bh critical section" · b04c4d9e
      Willem de Bruijn authored
      This reverts commit 504fc6f4.
      
      dev_queue_xmit_nit is expected to be called with BH disabled.
      __dev_queue_xmit has the following:
      
              /* Disable soft irqs for various locks below. Also
               * stops preemption for RCU.
               */
              rcu_read_lock_bh();
      
      VRF must follow this invariant. The referenced commit removed this
      protection. Which triggered a lockdep warning:
      
      	================================
      	WARNING: inconsistent lock state
      	6.11.0 #1 Tainted: G        W
      	--------------------------------
      	inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      	btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
      	ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
      	{IN-SOFTIRQ-W} state was registered at:
      	  lock_acquire+0x19a/0x4f0
      	  _raw_spin_lock+0x27/0x40
      	  packet_rcv+0xa33/0x1320
      	  __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
      	  __netif_receive_skb_list_core+0x2c9/0x890
      	  netif_receive_skb_list_internal+0x610/0xcc0
                [...]
      
      	other info that might help us debug this:
      	 Possible unsafe locking scenario:
      
      	       CPU0
      	       ----
      	  lock(rlock-AF_PACKET);
      	  <Interrupt>
      	    lock(rlock-AF_PACKET);
      
      	 *** DEADLOCK ***
      
      	Call Trace:
      	 <TASK>
      	 dump_stack_lvl+0x73/0xa0
      	 mark_lock+0x102e/0x16b0
      	 __lock_acquire+0x9ae/0x6170
      	 lock_acquire+0x19a/0x4f0
      	 _raw_spin_lock+0x27/0x40
      	 tpacket_rcv+0x863/0x3b30
      	 dev_queue_xmit_nit+0x709/0xa40
      	 vrf_finish_direct+0x26e/0x340 [vrf]
      	 vrf_l3_out+0x5f4/0xe80 [vrf]
      	 __ip_local_out+0x51e/0x7a0
                [...]
      
      Fixes: 504fc6f4 ("vrf: Remove unnecessary RCU-bh critical section")
      Link: https://lore.kernel.org/netdev/20240925185216.1990381-1-greearb@candelatech.com/Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://patch.msgid.link/20240929061839.1175300-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b04c4d9e
    • Dan Carpenter's avatar
      net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code · 3c97fe4f
      Dan Carpenter authored
      This error handling has a typo.  It should i++ instead of i--.  In the
      original code the error handling will loop until it crashes.
      
      Fixes: da70d184 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarAlexander Sverdlin <alexander.sverdlin@siemens.com>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://patch.msgid.link/8e7960cc-415d-48d7-99ce-f623022ec7b5@stanley.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c97fe4f
    • Hui Wang's avatar
      net: phy: realtek: Check the index value in led_hw_control_get · c283782f
      Hui Wang authored
      Just like rtl8211f_led_hw_is_supported() and
      rtl8211f_led_hw_control_set(), the rtl8211f_led_hw_control_get() also
      needs to check the index value, otherwise the caller is likely to get
      an incorrect rules.
      
      Fixes: 17784801 ("net: phy: realtek: Add support for PHY LEDs on RTL8211F")
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Reviewed-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://patch.msgid.link/20240927114610.1278935-1-hui.wang@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c283782f
    • Eric Dumazet's avatar
      ppp: do not assume bh is held in ppp_channel_bridge_input() · aec72910
      Eric Dumazet authored
      Networking receive path is usually handled from BH handler.
      However, some protocols need to acquire the socket lock, and
      packets might be stored in the socket backlog is the socket was
      owned by a user process.
      
      In this case, release_sock(), __release_sock(), and sk_backlog_rcv()
      might call the sk->sk_backlog_rcv() handler in process context.
      
      sybot caught ppp was not considering this case in
      ppp_channel_bridge_input() :
      
      WARNING: inconsistent lock state
      6.11.0-rc7-syzkaller-g5f5673607153 #0 Not tainted
      --------------------------------
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      ksoftirqd/1/24 [HC0[0]:SC1[1]:HE1:SE0] takes:
       ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
       ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline]
       ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304
      {SOFTIRQ-ON-W} state was registered at:
         lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
         __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
         _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154
         spin_lock include/linux/spinlock.h:351 [inline]
         ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline]
         ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304
         pppoe_rcv_core+0xfc/0x314 drivers/net/ppp/pppoe.c:379
         sk_backlog_rcv include/net/sock.h:1111 [inline]
         __release_sock+0x1a8/0x3d8 net/core/sock.c:3004
         release_sock+0x68/0x1b8 net/core/sock.c:3558
         pppoe_sendmsg+0xc8/0x5d8 drivers/net/ppp/pppoe.c:903
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg net/socket.c:745 [inline]
         __sys_sendto+0x374/0x4f4 net/socket.c:2204
         __do_sys_sendto net/socket.c:2216 [inline]
         __se_sys_sendto net/socket.c:2212 [inline]
         __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212
         __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
         invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
         el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
         do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
         el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
         el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
         el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      irq event stamp: 282914
       hardirqs last  enabled at (282914): [<ffff80008b42e30c>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
       hardirqs last  enabled at (282914): [<ffff80008b42e30c>] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194
       hardirqs last disabled at (282913): [<ffff80008b42e13c>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
       hardirqs last disabled at (282913): [<ffff80008b42e13c>] _raw_spin_lock_irqsave+0x2c/0x7c kernel/locking/spinlock.c:162
       softirqs last  enabled at (282904): [<ffff8000801f8e88>] softirq_handle_end kernel/softirq.c:400 [inline]
       softirqs last  enabled at (282904): [<ffff8000801f8e88>] handle_softirqs+0xa3c/0xbfc kernel/softirq.c:582
       softirqs last disabled at (282909): [<ffff8000801fbdf8>] run_ksoftirqd+0x70/0x158 kernel/softirq.c:928
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&pch->downl);
        <Interrupt>
          lock(&pch->downl);
      
       *** DEADLOCK ***
      
      1 lock held by ksoftirqd/1/24:
        #0: ffff80008f74dfa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x10/0x4c include/linux/rcupdate.h:325
      
      stack backtrace:
      CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
      Call trace:
        dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319
        show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326
        __dump_stack lib/dump_stack.c:93 [inline]
        dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119
        dump_stack+0x1c/0x28 lib/dump_stack.c:128
        print_usage_bug+0x698/0x9ac kernel/locking/lockdep.c:4000
       mark_lock_irq+0x980/0xd2c
        mark_lock+0x258/0x360 kernel/locking/lockdep.c:4677
        __lock_acquire+0xf48/0x779c kernel/locking/lockdep.c:5096
        lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
        __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
        _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154
        spin_lock include/linux/spinlock.h:351 [inline]
        ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline]
        ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304
        ppp_async_process+0x98/0x150 drivers/net/ppp/ppp_async.c:495
        tasklet_action_common+0x318/0x3f4 kernel/softirq.c:785
        tasklet_action+0x68/0x8c kernel/softirq.c:811
        handle_softirqs+0x2e4/0xbfc kernel/softirq.c:554
        run_ksoftirqd+0x70/0x158 kernel/softirq.c:928
        smpboot_thread_fn+0x4b0/0x90c kernel/smpboot.c:164
        kthread+0x288/0x310 kernel/kthread.c:389
        ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
      
      Fixes: 4cf476ce ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
      Reported-by: syzbot+bd8d55ee2acd0a71d8ce@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/netdev/66f661e2.050a0220.38ace9.000f.GAE@google.com/T/#uSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: James Chapman <jchapman@katalix.com>
      Link: https://patch.msgid.link/20240927074553.341910-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aec72910
    • Hangbin Liu's avatar
      selftests: rds: move include.sh to TEST_FILES · 8ed7cf66
      Hangbin Liu authored
      The include.sh file is generated for inclusion and should not be executable.
      Otherwise, it will be added to kselftest-list.txt. Additionally, add the
      executable bit for test.py at the same time to ensure proper functionality.
      
      Fixes: 3ade6ce1 ("selftests: rds: add testing infrastructure")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://patch.msgid.link/20240927041349.81216-1-liuhangbin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8ed7cf66
    • Eric Dumazet's avatar
      net: test for not too small csum_start in virtio_net_hdr_to_skb() · 49d14b54
      Eric Dumazet authored
      syzbot was able to trigger this warning [1], after injecting a
      malicious packet through af_packet, setting skb->csum_start and thus
      the transport header to an incorrect value.
      
      We can at least make sure the transport header is after
      the end of the network header (with a estimated minimal size).
      
      [1]
      [   67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0
      mac=(-1,-1) mac_len=0 net=(16,-6) trans=10
      shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
      csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0)
      hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
      priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0
      encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
      [   67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9
      [   67.877764] sk family=17 type=3 proto=0
      [   67.878279] skb linear:   00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00
      [   67.879128] skb frag:     00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02
      [   67.879877] skb frag:     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.880647] skb frag:     00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00
      [   67.881156] skb frag:     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.881753] skb frag:     00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.882173] skb frag:     00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.882790] skb frag:     00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.883171] skb frag:     00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.883733] skb frag:     00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.884206] skb frag:     00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e
      [   67.884704] skb frag:     000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00
      [   67.885139] skb frag:     000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.885677] skb frag:     000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.886042] skb frag:     000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.886408] skb frag:     000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.887020] skb frag:     000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.887384] skb frag:     00000100: 00 00
      [   67.887878] ------------[ cut here ]------------
      [   67.887908] offset (-6) >= skb_headlen() (14)
      [   67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs
      [   67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011
      [   67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
      [   67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891043] Call Trace:
      [   67.891173]  <TASK>
      [   67.891274] ? __warn (kernel/panic.c:741)
      [   67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219)
      [   67.891348] ? handle_bug (arch/x86/kernel/traps.c:239)
      [   67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
      [   67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
      [   67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1))
      [   67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113)
      [   67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200)
      [   67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13))
      [   67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1))
      [   67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670)
      [   67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227)
      [   67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604)
      [   67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25))
      [   67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1))
      [   67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4))
      [   67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657)
      [   67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3))
      [   67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
      [   67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
      [   67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
      [   67.891734] ? do_sock_setsockopt (net/socket.c:2335)
      [   67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
      [   67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
      [   67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
      [   67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
      Fixes: 9181d6f8 ("net: add more sanity check in virtio_net_hdr_to_skb()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49d14b54