1. 12 Jul, 2023 3 commits
  2. 11 Jul, 2023 7 commits
    • Suman Ghosh's avatar
      octeontx2-pf: Add additional check for MCAM rules · 8278ee2a
      Suman Ghosh authored
      Due to hardware limitation, MCAM drop rule with
      ether_type == 802.1Q and vlan_id == 0 is not supported. Hence rejecting
      such rules.
      
      Fixes: dce677da ("octeontx2-pf: Add vlan-etype to ntuple filters")
      Signed-off-by: default avatarSuman Ghosh <sumang@marvell.com>
      Link: https://lore.kernel.org/r/20230710103027.2244139-1-sumang@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8278ee2a
    • Lu Hongfei's avatar
      net: dsa: Removed unneeded of_node_put in felix_parse_ports_node · 04499f28
      Lu Hongfei authored
      Remove unnecessary of_node_put from the continue path to prevent
      child node from being released twice, which could avoid resource
      leak or other unexpected issues.
      Signed-off-by: default avatarLu Hongfei <luhongfei@vivo.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: de879a01 ("net: dsa: felix: add functionality when not all ports are supported")
      Link: https://lore.kernel.org/r/20230710031859.36784-1-luhongfei@vivo.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      04499f28
    • Paolo Abeni's avatar
      Merge branch 'net-fec-fix-some-issues-of-ndo_xdp_xmit' · c0dbbdf5
      Paolo Abeni authored
      Wei Fang says:
      
      ====================
      net: fec: fix some issues of ndo_xdp_xmit()
      
      We encountered some issues when testing the ndo_xdp_xmit() interface
      of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
      easy to reproduce, and the specific reproduction steps are as follows.
      
      step1: The ethernet port of a board (board A) is connected to the EQOS
      port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
      to another ethernet port, such as a switch port.
      
      step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
      and send packets to i.MX8MP/i.MX93. The command is shown below.
      ./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
      56:bf:0d:68:b0:9e -s 1500
      
      step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
      XDP frames from EQOS port to FEC port. The command is shown below.
      ./xdp_redirect eth1 eth0
      
      After a few moments, the warning or error logs will be printed in the
      console, for more details, please refer to the commit message of each
      patch.
      ====================
      
      Link: https://lore.kernel.org/r/20230706081012.2278063-1-wei.fang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c0dbbdf5
    • Wei Fang's avatar
      net: fec: use netdev_err_once() instead of netdev_err() · 84a10947
      Wei Fang authored
      In the case of heavy XDP traffic to be transmitted, the console
      will print the error log continuously if there are lack of enough
      BDs to accommodate the frames. The log looks like below.
      
      [  160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      
      Not only will this log be replicated and redundant, it will also
      degrade XDP performance. So we use netdev_err_once() instead of
      netdev_err() now.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      84a10947
    • Wei Fang's avatar
      net: fec: increase the size of tx ring and update tx_wake_threshold · 56b3c6ba
      Wei Fang authored
      When the XDP feature is enabled and with heavy XDP frames to be
      transmitted, there is a considerable probability that available
      tx BDs are insufficient. This will lead to some XDP frames to be
      discarded and the "NOT enough BD for SG!" error log will appear
      in the console (as shown below).
      
      [  160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      
      In the case of heavy XDP traffic, sometimes the speed of recycling
      tx BDs may be slower than the speed of sending XDP frames. There
      may be several specific reasons, such as the interrupt is not
      responsed in time, the efficiency of the NAPI callback function is
      too low due to all the queues (tx queues and rx queues) share the
      same NAPI, and so on.
      
      After trying various methods, I think that increase the size of tx
      BD ring is simple and effective. Maybe the best resolution is that
      allocate NAPI for each queue to improve the efficiency of the NAPI
      callback, but this change is a bit big and I didn't try this method.
      Perheps this method will be implemented in a future patch.
      
      This patch also updates the tx_wake_threshold of tx ring which is
      related to the size of tx ring in the previous logic. Otherwise,
      the tx_wake_threshold will be too high (403 BDs), which is more
      likely to impact the slow path in the case of heavy XDP traffic,
      because XDP path and slow path share the tx BD rings. According
      to Jakub's suggestion, the tx_wake_threshold is at least equal to
      tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
      entries is overflowing, we should be able to apply a hysteresis
      of a few tens of entries.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      56b3c6ba
    • Wei Fang's avatar
      net: fec: recycle pages for transmitted XDP frames · 20f79739
      Wei Fang authored
      Once the XDP frames have been successfully transmitted through the
      ndo_xdp_xmit() interface, it's the driver responsibility to free
      the frames so that the page_pool can recycle the pages and reuse
      them. However, this action is not implemented in the fec driver.
      This leads to a user-visible problem that the console will print
      the following warning log.
      
      [  157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
      [  217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
      [  278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
      [  338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
      [  399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
      
      Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
      while cleaning the tx BD ring.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      20f79739
    • Wei Fang's avatar
      net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP · be7ecbe7
      Wei Fang authored
      When a XDP program is installed or uninstalled, fec_restart() will
      be invoked to reset MAC and buffer descriptor rings. It's reasonable
      not to transmit any packet during the process of reset. However, the
      NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default,
      that is to say, it's possible that the fec_enet_xdp_xmit() will be
      invoked even if the process of reset is not finished. In this case,
      the redirected XDP frames might be dropped and available transmit BDs
      may be incorrectly deemed insufficient. So this patch disable the
      NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure
      this feature when the bpf program is installed or uninstalled.
      
      Fixes: e4ac7cc6 ("net: fec: turn on XDP features")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      be7ecbe7
  3. 10 Jul, 2023 3 commits
  4. 09 Jul, 2023 2 commits
  5. 08 Jul, 2023 8 commits
    • Eric Dumazet's avatar
      udp6: fix udp6_ehashfn() typo · 51d03e2f
      Eric Dumazet authored
      Amit Klein reported that udp6_ehash_secret was initialized but never used.
      
      Fixes: 1bbdceef ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d03e2f
    • Kuniyuki Iwashima's avatar
      icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev(). · 2aaa8a15
      Kuniyuki Iwashima authored
      With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
      has the link-local address as src and dst IP and will be forwarded to
      an external IP in the IPv6 Ext Hdr.
      
      For example, the script below generates a packet whose src IP is the
      link-local address and dst is updated to 11::.
      
        # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
        # python3
        >>> from socket import *
        >>> from scapy.all import *
        >>>
        >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
        >>>
        >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
        >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
        >>>
        >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
        >>> sk.sendto(bytes(pkt), (DST_ADDR, 0))
      
      For such a packet, we call ip6_route_input() to look up a route for the
      next destination in these three functions depending on the header type.
      
        * ipv6_rthdr_rcv()
        * ipv6_rpl_srh_rcv()
        * ipv6_srh_rcv()
      
      If no route is found, ip6_null_entry is set to skb, and the following
      dst_input(skb) calls ip6_pkt_drop().
      
      Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
      as the input device is the loopback interface.  Then, we have to check if
      skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
      for ip6_null_entry.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
       PF: supervisor read access in kernel mode
       PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
      Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
      RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
      RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
      RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
      R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
      R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
      FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ip6_pkt_drop (net/ipv6/route.c:4513)
       ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
       ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
       ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
       __netif_receive_skb_one_core (net/core/dev.c:5455)
       process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
       __napi_poll (net/core/dev.c:6460)
       net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
       do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
       </IRQ>
       <TASK>
       __local_bh_enable_ip (kernel/softirq.c:381)
       __dev_queue_xmit (net/core/dev.c:4231)
       ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
       rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
       sock_sendmsg (net/socket.c:725 net/socket.c:748)
       __sys_sendto (net/socket.c:2134)
       __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7f9dc751baea
      Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
      RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
      RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
       </TASK>
      Modules linked in:
      CR2: 0000000000000000
       ---[ end trace 0000000000000000 ]---
      RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
      Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
      RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
      RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
      RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
      R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
      R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
      FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
      PKRU: 55555554
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: disabled
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Reported-by: default avatarWang Yufen <wangyufen@huawei.com>
      Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aaa8a15
    • David S. Miller's avatar
      Merge branch 's390-ism-fixes' · bbffab69
      David S. Miller authored
      Niklas Schnelle says:
      
      ====================
      s390/ism: Fixes to client handling
      
      This is v2 of the patch previously titled "s390/ism: Detangle ISM client
      IRQ and event forwarding". As suggested by Paolo Abeni I split the patch
      up. While doing so I noticed another problem that was fixed by this patch
      concerning the way the workqueues access the client structs. This means the
      second patch turning the workqueues into simple direct calls also fixes
      a problem. Finally I split off a third patch just for fixing
      ism_unregister_client()s error path.
      
      The code after these 3 patches is identical to the result of the v1 patch
      except that I also turned the dev_err() for still registered DMBs into
      a WARN().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbffab69
    • Niklas Schnelle's avatar
      s390/ism: Do not unregister clients with registered DMBs · 266deeea
      Niklas Schnelle authored
      When ism_unregister_client() is called but the client still has DMBs
      registered it returns -EBUSY and prints an error. This only happens
      after the client has already been unregistered however. This is
      unexpected as the unregister claims to have failed. Furthermore as this
      implies a client bug a WARN() is more appropriate. Thus move the
      deregistration after the check and use WARN().
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      266deeea
    • Niklas Schnelle's avatar
      s390/ism: Fix and simplify add()/remove() callback handling · 76631ffa
      Niklas Schnelle authored
      Previously the clients_lock was protecting the clients array against
      concurrent addition/removal of clients but was also accessed from IRQ
      context. This meant that it had to be a spinlock and that the add() and
      remove() callbacks in which clients need to do allocation and take
      mutexes can't be called under the clients_lock. To work around this these
      callbacks were moved to workqueues. This not only introduced significant
      complexity but is also subtly broken in at least one way.
      
      In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
      communicate the added/removed ISM device to the work function. While
      write access to client[i]->tgt_ism is protected by the clients_lock and
      the code waits that there is no pending add/remove work before and after
      setting clients[i]->tgt_ism this is not enough. The problem is that the
      wait happens based on per ISM device counters. Thus a concurrent
      ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
      a clients[i]->tgt_ism between unlocking the clients_lock and the
      subsequent wait for the work to finnish.
      
      Thankfully with the clients_lock no longer held in IRQ context it can be
      turned into a mutex which can be held during the calls to add()/remove()
      completely removing the need for the workqueues and the associated
      broken housekeeping including the per ISM device counters and the
      clients[i]->tgt_ism.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76631ffa
    • Niklas Schnelle's avatar
      s390/ism: Fix locking for forwarding of IRQs and events to clients · 6b5c13b5
      Niklas Schnelle authored
      The clients array references all registered clients and is protected by
      the clients_lock. Besides its use as general list of clients the clients
      array is accessed in ism_handle_irq() to forward ISM device events to
      clients.
      
      While the clients_lock is taken in the IRQ handler when calling
      handle_event() it is however incorrectly not held during the
      client->handle_irq() call and for the preceding clients[] access leaving
      it unprotected against concurrent client (un-)registration.
      
      Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
      and ism_unregister_dmb() are not protected by any lock. This is
      especially problematic as the client ID from the ism->sba_client_arr[]
      is not checked against NO_CLIENT and neither is the client pointer
      checked.
      
      Instead of expanding the use of the clients_lock further add a separate
      array in struct ism_dev which references clients subscribed to the
      device's events and IRQs. This array is protected by ism->lock which is
      already taken in ism_handle_irq() and can be taken outside the IRQ
      handler when adding/removing subscribers or the accessing
      ism->sba_client_arr[]. This also means that the clients_lock is no
      longer taken in IRQ context.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b5c13b5
    • Paolo Abeni's avatar
      net: prevent skb corruption on frag list segmentation · c329b261
      Paolo Abeni authored
      Ian reported several skb corruptions triggered by rx-gro-list,
      collecting different oops alike:
      
      [   62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [   62.631083] #PF: supervisor read access in kernel mode
      [   62.636312] #PF: error_code(0x0000) - not-present page
      [   62.641541] PGD 0 P4D 0
      [   62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [   62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
      [   62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
      [   62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
      ./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
      net/ipv4/udp_offload.c:277)
      [   62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
      [   62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
      [   62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
      [   62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
      [   62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
      [   62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
      [   62.728654] FS:  0000000000000000(0000) GS:ffffa127efa40000(0000)
      knlGS:0000000000000000
      [   62.736852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
      [   62.749948] Call Trace:
      [   62.752498]  <TASK>
      [   62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
      [   62.787605] skb_mac_gso_segment (net/core/gro.c:141)
      [   62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
      [   62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
      net/core/dev.c:3659)
      [   62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
      [   62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
      [   62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
      net/netfilter/core.c:626)
      [   62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
      [   62.825652] maybe_deliver (net/bridge/br_forward.c:193)
      [   62.829420] br_flood (net/bridge/br_forward.c:233)
      [   62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
      [   62.837403] br_handle_frame (net/bridge/br_input.c:298
      net/bridge/br_input.c:416)
      [   62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
      [   62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
      [   62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
      net/core/dev.c:5727)
      [   62.876795] napi_complete_done (./include/linux/list.h:37
      ./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
      [   62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
      [   62.893534] __napi_poll (net/core/dev.c:6498)
      [   62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
      net/core/dev.c:6640)
      [   62.905276] kthread (kernel/kthread.c:379)
      [   62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
      [   62.917119]  </TASK>
      
      In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
      bridge, both to the local input path and to an egress device (tun).
      
      The segmentation of such packets unsafely writes to the cloned skbs
      with shared heads.
      
      This change addresses the issue by uncloning as needed the
      to-be-segmented skbs.
      Reported-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Tested-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c329b261
    • Rafał Miłecki's avatar
      net: bgmac: postpone turning IRQs off to avoid SoC hangs · e7731194
      Rafał Miłecki authored
      Turning IRQs off is done by accessing Ethernet controller registers.
      That can't be done until device's clock is enabled. It results in a SoC
      hang otherwise.
      
      This bug remained unnoticed for years as most bootloaders keep all
      Ethernet interfaces turned on. It seems to only affect a niche SoC
      family BCM47189. It has two Ethernet controllers but CFE bootloader uses
      only the first one.
      
      Fixes: 34322615 ("net: bgmac: Mask interrupts during probe")
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7731194
  6. 07 Jul, 2023 15 commits
    • Ivan Babrou's avatar
      udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint · 8139dccd
      Ivan Babrou authored
      The tracepoint has existed for 12 years, but it only covered udp
      over the legacy IPv4 protocol. Having it enabled for udp6 removes
      the unnecessary difference in error visibility.
      Signed-off-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Fixes: 296f7ea7 ("udp: add tracepoints for queueing skb to rcvbuf")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8139dccd
    • Shannon Nelson's avatar
      ionic: remove dead device fail path · 3a7af34f
      Shannon Nelson authored
      Remove the probe error path code that leaves the driver bound
      to the device, but with essentially a dead device.  This was
      useful maybe twice early in the driver's life and no longer
      makes sense to keep.
      
      Fixes: 30a1e6d0 ("ionic: keep ionic dev on lif init fail")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a7af34f
    • Nitya Sunkad's avatar
      ionic: remove WARN_ON to prevent panic_on_warn · abfb2a58
      Nitya Sunkad authored
      Remove unnecessary early code development check and the WARN_ON
      that it uses.  The irq alloc and free paths have long been
      cleaned up and this check shouldn't have stuck around so long.
      
      Fixes: 77ceb68e ("ionic: Add notifyq support")
      Signed-off-by: default avatarNitya Sunkad <nitya.sunkad@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abfb2a58
    • Sai Krishna's avatar
      octeontx2-af: Move validation of ptp pointer before its usage · 7709fbd4
      Sai Krishna authored
      Moved PTP pointer validation before its use to avoid smatch warning.
      Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
      
      Fixes: 2ef4e45d ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
      Signed-off-by: default avatarNaveen Mamindlapalli <naveenm@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarSai Krishna <saikrishnag@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7709fbd4
    • Ratheesh Kannoth's avatar
      octeontx2-af: Promisc enable/disable through mbox · af42088b
      Ratheesh Kannoth authored
      In legacy silicon, promiscuous mode is only modified
      through CGX mbox messages. In CN10KB silicon, it is modified
      from CGX mbox and NIX. This breaks legacy application
      behaviour. Fix this by removing call from NIX.
      
      Fixes: d6c9784b ("octeontx2-af: Invoke exact match functions if supported")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af42088b
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · b61aac02
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (igc)
      
      This series contains updates to igc driver only.
      
      Husaini adds check to increment Qbv change error counter only on taprio
      Qbvs. He also removes delay during Tx ring configuration and
      resolves Tx hang that could occur when transmitting on a gate to be
      closed.
      
      Prasad Koya reports ethtool link mode as TP (twisted pair).
      
      Tee Min corrects value for max SDU.
      
      Aravindhan ensures that registers for PPS are always programmed to occur
      in future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b61aac02
    • Junfeng Guo's avatar
      gve: Set default duplex configuration to full · 0503efea
      Junfeng Guo authored
      Current duplex mode was unset in the driver, resulting in the default
      parameter being set to 0, which corresponds to half duplex. It might
      mislead users to have incorrect expectation about the driver's
      transmission capabilities.
      Set the default duplex configuration to full, as the driver runs in
      full duplex mode at this point.
      
      Fixes: 7e074d5a ("gve: Enable Link Speed Reporting in the driver.")
      Signed-off-by: default avatarJunfeng Guo <junfeng.guo@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0503efea
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 41b9eff0
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (ice)
      
      This series contains updates to ice driver only.
      
      Sridhar fixes incorrect comparison of max Tx rate limit to occur against
      each TC value rather than the aggregate. He also resolves an issue with
      the wrong VSI being used when setting max Tx rate when TCs are enabled.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix tx queue rate limit when TCs are configured
        ice: Fix max_rate check while configuring TX rate limits
      ====================
      
      Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b9eff0
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4863b57b
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-07-05
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
        net/mlx5: Query hca_cap_2 only when supported
        net/mlx5e: TC, CT: Offload ct clear only once
        net/mlx5e: Check for NOT_READY flag state after locking
        net/mlx5: Register a unique thermal zone per device
        net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
        net/mlx5e: fix memory leak in mlx5e_ptp_open
        net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
        net/mlx5e: fix double free in mlx5e_destroy_flow_table
      ====================
      
      Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4863b57b
    • M A Ramdhan's avatar
      net/sched: cls_fw: Fix improper refcount update leads to use-after-free · 0323bce5
      M A Ramdhan authored
      In the event of a failure in tcf_change_indev(), fw_set_parms() will
      immediately return an error after incrementing or decrementing
      reference counter in tcf_bind_filter().  If attacker can control
      reference counter to zero and make reference freed, leading to
      use after free.
      
      In order to prevent this, move the point of possible failure above the
      point where the TC_FW_CLASSID is handled.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Signed-off-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0323bce5
    • Quan Zhou's avatar
      wifi: mt76: mt7921e: fix init command fail with enabled device · 525c469e
      Quan Zhou authored
      For some cases as below, we may encounter the unpreditable chip stats
      in driver probe()
      * The system reboot flow do not work properly, such as kernel oops while
        rebooting, and then the driver do not go back to default status at
        this moment.
      * Similar to the flow above. If the device was enabled in BIOS or UEFI,
        the system may switch to Linux without driver fully shutdown.
      
      To avoid the problem, force push the device back to default in probe()
      * mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
      * mt7921_wfsys_reset()    : cleanup chip config before resource init.
      
      Error log
      [59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
      [59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
      [59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
      [59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
      [59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
      [59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
      [59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
      [59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
      [59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
      [59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
      [59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
      [59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.923845] mt7921e 0000:02:00.0: hardware init failed
      
      Cc: stable@vger.kernel.org
      Fixes: 5c14a5f9 ("mt76: mt7921: introduce mt7921e support")
      Tested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Tested-by: default avatarJuan Martinez <juan.martinez@amd.com>
      Co-developed-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarQuan Zhou <quan.zhou@mediatek.com>
      Signed-off-by: default avatarDeren Wu <deren.wu@mediatek.com>
      Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      525c469e
    • Jakub Kicinski's avatar
      Merge branch 'fix-dropping-of-oversize-preemptible-frames-with-felix-dsa-driver' · 1ce1a745
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix dropping of oversize preemptible frames with felix DSA driver
      
      It has been reported that preemptible traffic doesn't completely behave
      as expected. Namely, large packets should be able to be squeezed
      (through fragmentation) through taprio time slots smaller than the
      transmission time of the full frame. That does not happen due to logic
      in the driver (for oversize frame dropping with taprio) that was not
      updated in order for this use case to work.
      
      I am not sure whether it qualifies as "net" material, because some
      structural changes are involved, and it is a "never worked" scenario.
      OTOH, this is a complaint coming from users for a v6.4 kernel.
      It's up to maintainers to decide whether this series can be considered;
      I've submitted it as non-RFC in the optimistic case that it will be :)
      
      Demo script illustrating the issue below.
      
      add_taprio()
      {
      	local ifname=$1
      
      	echo "Creating root taprio"
      	tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
      		num_tc 8 \
      		map 0 1 2 3 4 5 6 7 \
      		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      		base-time 0 \
      		sched-entry S 01 1216 \
      		sched-entry S fe 12368 \
      		fp P E E E E E E E \
      		flags 0x2
      }
      
      remove_taprio()
      {
      	local ifname=$1
      
      	echo "Removing taprio"
      	tc qdisc del dev $ifname root
      }
      
      ip netns add ns0
      ip link set eno0 netns ns0 && ip -n ns0 link set eno0 up && ip -n ns0 addr add 192.168.100.1/24 dev eno0
      ip addr add 192.168.100.2/24 dev swp0 && ip link set swp0 up
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled on verify-enabled off tx-enabled on
      ethtool --set-mm swp0 pmac-enabled on verify-enabled off tx-enabled on
      add_taprio swp0
      
      ping 192.168.100.1 -s 1000 -c 5 # sent through TC0
      ethtool -I --show-mm swp0 | grep MACMergeFragCountTx # should increase
      
      ip addr flush swp0 && ip link set swp0 down
      remove_taprio swp0
      ethtool --set-mm swp0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns del ns0
      ====================
      
      Link: https://lore.kernel.org/r/20230705104422.49025-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ce1a745
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix oversize frame dropping for preemptible TCs · c6efb4ae
      Vladimir Oltean authored
      This switch implements Hold/Release in a strange way, with no control
      from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
      and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
      based on the schedule.
      
      Namely, when the gate of a preemptible TC is about to close (actually
      QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
      the QSYS seems to emit a HOLD request pulse towards the MAC which
      preempts the currently transmitted packet, and further packets are held
      back in the queue system.
      
      This allows large frames to be squeezed through small time slots,
      because HOLD requests initiated by the gate events result in the frame
      being segmented in multiple fragments, the bit time of which is equal to
      the size of the time slot.
      
      It has been reported that the vsc9959_tas_guard_bands_update() logic
      breaks this, because it doesn't take preemptible TCs into account, and
      enables oversized frame dropping when the time slot doesn't allow a full
      MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
      Packets larger than 128B are dropped instead of being sent in multiple
      fragments.
      
      Confusingly, the manual says:
      
      | For guard band, SDU calculation of a traffic class of a port, if
      | preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
      | QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
      | QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.
      
      but this only refers to the static guard band durations, and the
      QMAXSDU_CFG_* registers have dual purpose - the other being oversized
      frame dropping, which takes place irrespective of whether frames are
      preemptible or express.
      
      So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and modify the guard
      band logic to consider a different (lower) oversize limit for
      preemptible traffic classes.
      
      Fixes: 403ffc2c ("net: mscc: ocelot: add support for preemptible traffic classes")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6efb4ae
    • Vladimir Oltean's avatar
      net: dsa: felix: make vsc9959_tas_guard_bands_update() visible to ocelot->ops · c6081914
      Vladimir Oltean authored
      In a future change we will need to make
      ocelot_port_update_active_preemptible_tcs() call
      vsc9959_tas_guard_bands_update(), but that is currently not possible,
      since the ocelot switch lib does not have access to functions private to
      the DSA wrapper.
      
      Move the pointer to vsc9959_tas_guard_bands_update() from felix->info
      (which is private to the DSA driver) to ocelot->ops (which is also
      visible to the ocelot switch lib).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-3-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6081914
    • Vladimir Oltean's avatar
      net: mscc: ocelot: extend ocelot->fwd_domain_lock to cover ocelot->tas_lock · 009d30f1
      Vladimir Oltean authored
      In a future commit we will have to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and that will be
      impossible due to the AB/BA locking dependencies between
      ocelot->tas_lock and ocelot->fwd_domain_lock.
      
      Just like we did in commit 3ff468ef ("net: mscc: ocelot: remove
      struct ocelot_mm_state :: lock"), the only solution is to expand the
      scope of ocelot->fwd_domain_lock for it to also serialize changes made
      to the Time-Aware Shaper, because those will have to result in a
      recalculation of cut-through TCs, which is something that depends on the
      forwarding domain.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      009d30f1
  7. 06 Jul, 2023 2 commits