1. 13 Dec, 2023 9 commits
    • Krzysztof Kozlowski's avatar
      stmmac: dwmac-loongson: drop useless check for compatible fallback · 31fea092
      Krzysztof Kozlowski authored
      Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
      so checking for some other compatible does not make sense.  It cannot be
      bound to unsupported platform.
      
      Drop useless, incorrect (space in between) and undocumented compatible.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Acked-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31fea092
    • Yanteng Si's avatar
      stmmac: dwmac-loongson: Make sure MDIO is initialized before use · e87d3a13
      Yanteng Si authored
      Generic code will use mdio. If it is not initialized before use,
      the kernel will Oops.
      
      Fixes: 30bba69d ("stmmac: pci: Add dwmac support for Loongson")
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Signed-off-by: default avatarFeiyang Chen <chenfeiyang@loongson.cn>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e87d3a13
    • Salvatore Dipietro's avatar
      tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set · f3f32a35
      Salvatore Dipietro authored
      Based on the tcp man page, if TCP_NODELAY is set, it disables Nagle's algorithm
      and packets are sent as soon as possible. However in the `tcp_push` function
      where autocorking is evaluated the `nonagle` value set by TCP_NODELAY is not
      considered which can trigger unexpected corking of packets and induce delays.
      
      For example, if two packets are generated as part of a server's reply, if the
      first one is not transmitted on the wire quickly enough, the second packet can
      trigger the autocorking in `tcp_push` and be delayed instead of sent as soon as
      possible. It will either wait for additional packets to be coalesced or an ACK
      from the client before transmitting the corked packet. This can interact badly
      if the receiver has tcp delayed acks enabled, introducing 40ms extra delay in
      completion times. It is not always possible to control who has delayed acks
      set, but it is possible to adjust when and how autocorking is triggered.
      Patch prevents autocorking if the TCP_NODELAY flag is set on the socket.
      
      Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
      Apache Tomcat 9.0.83 running the basic servlet below:
      
      import java.io.IOException;
      import java.io.OutputStreamWriter;
      import java.io.PrintWriter;
      import javax.servlet.ServletException;
      import javax.servlet.http.HttpServlet;
      import javax.servlet.http.HttpServletRequest;
      import javax.servlet.http.HttpServletResponse;
      
      public class HelloWorldServlet extends HttpServlet {
          @Override
          protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
              response.setContentType("text/html;charset=utf-8");
              OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
              String s = "a".repeat(3096);
              osw.write(s,0,s.length());
              osw.flush();
          }
      }
      
      Load was applied using  wrk2 (https://github.com/kinvolk/wrk2) from an AWS
      c6i.8xlarge instance.  With the current auto-corking behavior and TCP_NODELAY
      set an additional 40ms latency from P99.99+ values are observed.  With the
      patch applied we see no occurrences of 40ms latencies. The patch has also been
      tested with iperf and uperf benchmarks and no regression was observed.
      
      # No patch with tcp_autocorking=1 and TCP_NODELAY set on all sockets
      ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.49.177:8080/hello/hello'
        ...
       50.000%    0.91ms
       75.000%    1.12ms
       90.000%    1.46ms
       99.000%    1.73ms
       99.900%    1.96ms
       99.990%   43.62ms   <<< 40+ ms extra latency
       99.999%   48.32ms
      100.000%   49.34ms
      
      # With patch
      ./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.49.177:8080/hello/hello'
        ...
       50.000%    0.89ms
       75.000%    1.13ms
       90.000%    1.44ms
       99.000%    1.67ms
       99.900%    1.78ms
       99.990%    2.27ms   <<< no 40+ ms extra latency
       99.999%    3.71ms
      100.000%    4.57ms
      
      Fixes: f54b3111 ("tcp: auto corking")
      Signed-off-by: default avatarSalvatore Dipietro <dipiets@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3f32a35
    • Jiri Pirko's avatar
      dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() · 65c95f78
      Jiri Pirko authored
      User may not pass DPLL_A_PIN_STATE attribute in the pin set operation
      message. Sanitize that by checking if the attr pointer is not null
      and process the passed state attribute value only in that case.
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Fixes: 9d71b54b ("dpll: netlink: Add DPLL framework base functions")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Acked-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      65c95f78
    • Jakub Kicinski's avatar
      Merge branch 'ena-driver-xdp-bug-fixes' · 154bb2fa
      Jakub Kicinski authored
      David Arinzon says:
      
      ====================
      ENA driver XDP bug fixes
      
      This patchset contains multiple XDP-related bug fixes
      in the ENA driver.
      ====================
      
      Link: https://lore.kernel.org/r/20231211062801.27891-1-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      154bb2fa
    • David Arinzon's avatar
      net: ena: Fix XDP redirection error · 4ab138ca
      David Arinzon authored
      When sending TX packets, the meta descriptor can be all zeroes
      as no meta information is required (as in XDP).
      
      This patch removes the validity check, as when
      `disable_meta_caching` is enabled, such TX packets will be
      dropped otherwise.
      
      Fixes: 0e3a3f6d ("net: ena: support new LLQ acceleration mode")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ab138ca
    • David Arinzon's avatar
      net: ena: Fix DMA syncing in XDP path when SWIOTLB is on · d7601170
      David Arinzon authored
      This patch fixes two issues:
      
      Issue 1
      -------
      Description
      ```````````
      Current code does not call dma_sync_single_for_cpu() to sync data from
      the device side memory to the CPU side memory before the XDP code path
      uses the CPU side data.
      This causes the XDP code path to read the unset garbage data in the CPU
      side memory, resulting in incorrect handling of the packet by XDP.
      
      Solution
      ````````
      1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to
         use the data in the CPU side memory.
      2. The XDP code verdict can be XDP_PASS, in which case there is a
         fallback to the non-XDP code, which also calls
         dma_sync_single_for_cpu().
         To avoid calling dma_sync_single_for_cpu() twice:
      2.1. Put the dma_sync_single_for_cpu() in the code in such a place where
           it happens before XDP and non-XDP code.
      2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code
           for the first buffer only (rx_copybreak and non-rx_copybreak
           cases), since the new call that was added covers these cases.
           The call to dma_sync_single_for_cpu() for the second buffer and on
           stays because only the first buffer is handled by the newly added
           dma_sync_single_for_cpu(). And there is no need for special
           handling of the second buffer and on for the XDP path since
           currently the driver supports only single buffer packets.
      
      Issue 2
      -------
      Description
      ```````````
      In case the XDP code forwarded the packet (ENA_XDP_FORWARDED),
      ena_unmap_rx_buff_attrs() is called with attrs set to 0.
      This means that before unmapping the buffer, the internal function
      dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on
      the whole buffer (not only on the data part of it).
      This sync is both wasteful (since a sync was already explicitly
      called before) and also causes a bug, which will be explained
      using the below diagram.
      
      The following diagram shows the flow of events causing the bug.
      The order of events is (1)-(4) as shown in the diagram.
      
      CPU side memory area
      
           (3)convert_to_xdp_frame() initializes the
              headroom with xdpf metadata
                            ||
                            \/
                ___________________________________
               |                                   |
       0       |                                   V                       4K
       ---------------------------------------------------------------------
       | xdpf->data      | other xdpf       |   < data >   | tailroom ||...|
       |                 | fields           |              | GARBAGE  ||   |
       ---------------------------------------------------------------------
      
                         /\                        /\
                         ||                        ||
         (4)ena_unmap_rx_buff_attrs() calls     (2)dma_sync_single_for_cpu()
            dma_sync_single_for_cpu() on the       copies data from device
            whole buffer page, overwriting         side to CPU side memory
            the xdpf->data with GARBAGE.           ||
       0                                                                   4K
       ---------------------------------------------------------------------
       | headroom                           |   < data >   | tailroom ||...|
       | GARBAGE                            |              | GARBAGE  ||   |
       ---------------------------------------------------------------------
      
      Device side memory area                      /\
                                                   ||
                                     (1) device writes RX packet data
      
      After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data
      becomes corrupted, and so when it is later accessed in
      ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault,
      crashing the kernel.
      
      Solution
      ````````
      Explicitly tell ena_unmap_rx_buff_attrs() not to call
      dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC
      flag.
      
      Fixes: f7d625ad ("net: ena: Add dynamic recycling mechanism for rx buffers")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d7601170
    • David Arinzon's avatar
      net: ena: Fix xdp drops handling due to multibuf packets · 505b1a88
      David Arinzon authored
      Current xdp code drops packets larger than ENA_XDP_MAX_MTU.
      This is an incorrect condition since the problem is not the
      size of the packet, rather the number of buffers it contains.
      
      This commit:
      
      1. Identifies and drops XDP multi-buffer packets at the
         beginning of the function.
      2. Increases the xdp drop statistic when this drop occurs.
      3. Adds a one-time print that such drops are happening to
         give better indication to the user.
      
      Fixes: 838c93dc ("net: ena: implement XDP drop support")
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      505b1a88
    • David Arinzon's avatar
      net: ena: Destroy correct number of xdp queues upon failure · 41db6f99
      David Arinzon authored
      The ena_setup_and_create_all_xdp_queues() function freed all the
      resources upon failure, after creating only xdp_num_queues queues,
      instead of freeing just the created ones.
      
      In this patch, the only resources that are freed, are the ones
      allocated right before the failure occurs.
      
      Fixes: 548c4940 ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShahar Itzko <itzko@amazon.com>
      Signed-off-by: default avatarDavid Arinzon <darinzon@amazon.com>
      Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41db6f99
  2. 12 Dec, 2023 4 commits
    • Dong Chenchen's avatar
      net: Remove acked SYN flag from packet in the transmit queue correctly · f99cd562
      Dong Chenchen authored
      syzkaller report:
      
       kernel BUG at net/core/skbuff.c:3452!
       invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e776-dirty #135
       RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
       Call Trace:
       icmp_glue_bits (net/ipv4/icmp.c:357)
       __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
       ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
       icmp_push_reply (net/ipv4/icmp.c:370)
       __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
       ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
       __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
       ip_output (net/ipv4/ip_output.c:427)
       __ip_queue_xmit (net/ipv4/ip_output.c:535)
       __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
       __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
       tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
       tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
       tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)
      
      The panic issue was trigered by tcp simultaneous initiation.
      The initiation process is as follows:
      
            TCP A                                            TCP B
      
        1.  CLOSED                                           CLOSED
      
        2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
      
        3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
      
        4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
      
        5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
      
        // TCP B: not send challenge ack for ack limit or packet loss
        // TCP A: close
      	tcp_close
      	   tcp_send_fin
                    if (!tskb && tcp_under_memory_pressure(sk))
                        tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
                 TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag
      
        6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...
      
        // TCP B: send challenge ack to SYN_FIN_ACK
      
        7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack
      
        // TCP A:  <SND.UNA=101>
      
        8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic
      
      	__tcp_retransmit_skb  //skb->len=0
      	    tcp_trim_head
      		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
      		    __pskb_trim_head
      			skb->data_len -= len // skb->len=-1, wrap around
      	    ... ...
      	    ip_fragment
      		icmp_glue_bits //BUG_ON
      
      If we use tcp_trim_head() to remove acked SYN from packet that contains data
      or other flags, skb->len will be incorrectly decremented. We can remove SYN
      flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
      can fix the problem mentioned above.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDong Chenchen <dongchenchen2@huawei.com>
      Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f99cd562
    • Dinghao Liu's avatar
      qed: Fix a potential use-after-free in qed_cxt_tables_alloc · b65d52ac
      Dinghao Liu authored
      qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
      free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
      qed_cxt_tables_alloc() accesses the freed pointer on failure
      of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
      which may lead to use-after-free. Fix this issue by setting
      p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b65d52ac
    • Hyunwoo Kim's avatar
      net/rose: Fix Use-After-Free in rose_ioctl · 810c38a3
      Hyunwoo Kim authored
      Because rose_ioctl() accesses sk->sk_receive_queue
      without holding a sk->sk_receive_queue.lock, it can
      cause a race with rose_accept().
      A use-after-free for skb occurs with the following flow.
      ```
      rose_ioctl() -> skb_peek()
      rose_accept() -> skb_dequeue() -> kfree_skb()
      ```
      Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AXSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      810c38a3
    • Hyunwoo Kim's avatar
      atm: Fix Use-After-Free in do_vcc_ioctl · 24e90b9e
      Hyunwoo Kim authored
      Because do_vcc_ioctl() accesses sk->sk_receive_queue
      without holding a sk->sk_receive_queue.lock, it can
      cause a race with vcc_recvmsg().
      A use-after-free for skb occurs with the following flow.
      ```
      do_vcc_ioctl() -> skb_peek()
      vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
      ```
      Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AXSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      24e90b9e
  3. 11 Dec, 2023 6 commits
    • Hariprasad Kelam's avatar
      octeontx2-af: Fix pause frame configuration · e307b5a8
      Hariprasad Kelam authored
      The current implementation's default Pause Forward setting is causing
      unnecessary network traffic. This patch disables Pause Forward to
      address this issue.
      
      Fixes: 1121f6b0 ("octeontx2-af: Priority flow control configuration support")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e307b5a8
    • David S. Miller's avatar
      Merge branch 'octeontx2-fixes' · c3e04142
      David S. Miller authored
      Hariprasad Kelam says:
      
      ====================
      octeontx2: Fix issues with promisc/allmulti mode
      
      When interface is configured in promisc/all multi mode, low network
      performance observed. This series patches address the same.
      
      Patch1: Change the promisc/all multi mcam entry action to unicast if
      there are no trusted vfs associated with PF.
      
      Patch2: Configures RSS flow algorithm in promisc/all multi mcam entries
      to address flow distribution issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3e04142
    • Hariprasad Kelam's avatar
      octeontx2-af: Update RSS algorithm index · 570ba378
      Hariprasad Kelam authored
      The RSS flow algorithm is not set up correctly for promiscuous or all
      multi MCAM entries. This has an impact on flow distribution.
      
      This patch fixes the issue by updating flow algorithm index in above
      mentioned MCAM entries.
      
      Fixes: 967db352 ("octeontx2-af: add support for multicast/promisc packet replication feature")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      570ba378
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix promisc mcam entry action · dbda4368
      Hariprasad Kelam authored
      Current implementation is such that, promisc mcam entry action
      is set as multicast even when there are no trusted VFs. multicast
      action causes the hardware to copy packet data, which reduces
      the performance.
      
      This patch fixes this issue by setting the promisc mcam entry action to
      unicast instead of multicast when there are no trusted VFs. The same
      change is made for the 'allmulti' mcam entry action.
      
      Fixes: ffd2f89a ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbda4368
    • Shinas Rasheed's avatar
      octeon_ep: explicitly test for firmware ready value · 284f7176
      Shinas Rasheed authored
      The firmware ready value is 1, and get firmware ready status
      function should explicitly test for that value. The firmware
      ready value read will be 2 after driver load, and on unbind
      till firmware rewrites the firmware ready back to 0, the value
      seen by driver will be 2, which should be regarded as not ready.
      
      Fixes: 10c073e4 ("octeon_ep: defer probe if firmware not ready")
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      284f7176
    • Vlad Buslov's avatar
      net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table · 125f1c7f
      Vlad Buslov authored
      The referenced change added custom cleanup code to act_ct to delete any
      callbacks registered on the parent block when deleting the
      tcf_ct_flow_table instance. However, the underlying issue is that the
      drivers don't obtain the reference to the tcf_ct_flow_table instance when
      registering callbacks which means that not only driver callbacks may still
      be on the table when deleting it but also that the driver can still have
      pointers to its internal nf_flowtable and can use it concurrently which
      results either warning in netfilter[0] or use-after-free.
      
      Fix the issue by taking a reference to the underlying struct
      tcf_ct_flow_table instance when registering the callback and release the
      reference when unregistering. Expose new API required for such reference
      counting by adding two new callbacks to nf_flowtable_type and implementing
      them for act_ct flowtable_ct type. This fixes the issue by extending the
      lifetime of nf_flowtable until all users have unregistered.
      
      [0]:
      [106170.938634] ------------[ cut here ]------------
      [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm
      ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis
      try overlay mlx5_core
      [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1
      [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
      [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00
      [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202
      [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48
      [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000
      [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700
      [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58
      [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000
      [106170.951616] FS:  0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000
      [106170.952329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0
      [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [106170.954766] Call Trace:
      [106170.955057]  <TASK>
      [106170.955315]  ? __warn+0x79/0x120
      [106170.955648]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.956172]  ? report_bug+0x17c/0x190
      [106170.956537]  ? handle_bug+0x3c/0x60
      [106170.956891]  ? exc_invalid_op+0x14/0x70
      [106170.957264]  ? asm_exc_invalid_op+0x16/0x20
      [106170.957666]  ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core]
      [106170.958172]  ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core]
      [106170.958788]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.959339]  ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core]
      [106170.959854]  ? mapping_remove+0x154/0x1d0 [mlx5_core]
      [106170.960342]  ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core]
      [106170.960927]  mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core]
      [106170.961441]  mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core]
      [106170.962001]  mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core]
      [106170.962524]  mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core]
      [106170.963034]  mlx5e_flow_put+0x73/0xe0 [mlx5_core]
      [106170.963506]  mlx5e_put_flow_list+0x38/0x70 [mlx5_core]
      [106170.964002]  mlx5e_rep_update_flows+0xec/0x290 [mlx5_core]
      [106170.964525]  mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core]
      [106170.965056]  process_one_work+0x13a/0x2c0
      [106170.965443]  worker_thread+0x2e5/0x3f0
      [106170.965808]  ? rescuer_thread+0x410/0x410
      [106170.966192]  kthread+0xc6/0xf0
      [106170.966515]  ? kthread_complete_and_exit+0x20/0x20
      [106170.966970]  ret_from_fork+0x2d/0x50
      [106170.967332]  ? kthread_complete_and_exit+0x20/0x20
      [106170.967774]  ret_from_fork_asm+0x11/0x20
      [106170.970466]  </TASK>
      [106170.970726] ---[ end trace 0000000000000000 ]---
      
      Fixes: 77ac5e40 ("net/sched: act_ct: remove and free nf_table callbacks")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      125f1c7f
  4. 10 Dec, 2023 2 commits
    • Zhipeng Lu's avatar
      octeontx2-af: fix a use-after-free in rvu_nix_register_reporters · 28a7cb04
      Zhipeng Lu authored
      The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl)
      after the create_workqueue fails, and after that free, the rvu_dl will
      be translate back through the following call chain:
      
      rvu_nix_health_reporters_destroy
        |-> rvu_nix_health_reporters_create
             |-> rvu_health_reporters_create
                   |-> rvu_register_dl (label err_dl_health)
      
      Finally. in the err_dl_health label, rvu_dl being freed again in
      rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy.
      In the second calls of rvu_nix_health_reporters_destroy, however,
      it uses rvu_dl->rvu_nix_health_reporter, which is already freed at
      the end of rvu_nix_health_reporters_destroy in the first call.
      
      So this patch prevents the first destroy by instantly returning -ENONMEN
      when create_workqueue fails. In addition, since the failure of
      create_workqueue is the only entrence of label err, it has been
      integrated into the error-handling path of create_workqueue.
      
      Fixes: 5ed66306 ("octeontx2-af: Add devlink health reporters for NIX")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28a7cb04
    • Radu Bulie's avatar
      net: fec: correct queue selection · 9fc95fe9
      Radu Bulie authored
      The old implementation extracted VLAN TCI info from the payload
      before the VLAN tag has been pushed in the payload.
      
      Another problem was that the VLAN TCI was extracted even if the
      packet did not have VLAN protocol header.
      
      This resulted in invalid VLAN TCI and as a consequence a random
      queue was computed.
      
      This patch fixes the above issues and use the VLAN TCI from the
      skb if it is present or VLAN TCI from payload if present. If no
      VLAN header is present queue 0 is selected.
      
      Fixes: 52c4a1a8 ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations")
      Signed-off-by: default avatarRadu Bulie <radu-andrei.bulie@nxp.com>
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fc95fe9
  5. 09 Dec, 2023 15 commits
  6. 08 Dec, 2023 4 commits