1. 14 Dec, 2023 10 commits
    • Moshe Shemesh's avatar
      net/mlx5: Fix fw tracer first block check · 4261edf1
      Moshe Shemesh authored
      While handling new traces, to verify it is not the first block being
      written, last_timestamp is checked. But instead of checking it is non
      zero it is verified to be zero. Fix to verify last_timestamp is not
      zero.
      
      Fixes: c71ad41c ("net/mlx5: FW tracer, events handling")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarFeras Daoud <ferasda@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4261edf1
    • Carolina Jubran's avatar
      net/mlx5e: XDP, Drop fragmented packets larger than MTU size · bcaf109f
      Carolina Jubran authored
      XDP transmits fragmented packets that are larger than MTU size instead of
      dropping those packets. The drop check that checks whether a packet is larger
      than MTU is comparing MTU size against the linear part length only.
      
      Adjust the drop check to compare MTU size against both linear and non-linear
      part lengths to avoid transmitting fragmented packets larger than MTU size.
      
      Fixes: 39a1665d ("net/mlx5e: Implement sending multi buffer XDP frames")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      bcaf109f
    • Chris Mi's avatar
      net/mlx5e: Decrease num_block_tc when unblock tc offload · be86106f
      Chris Mi authored
      The cited commit increases num_block_tc when unblock tc offload.
      Actually should decrease it.
      
      Fixes: c8e350e6 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      be86106f
    • Jianbo Liu's avatar
      net/mlx5e: Fix overrun reported by coverity · da75fa54
      Jianbo Liu authored
      Coverity Scan reports the following issue. But it's impossible that
      mlx5_get_dev_index returns 7 for PF, even if the index is calculated
      from PCI FUNC ID. So add the checking to make coverity slience.
      
      CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN)
      Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte
      elements at element index 7 (byte offset 63) using index
      mlx5_get_dev_index(peer_dev) (which evaluates to 7).
      
      Fixes: 9bee385a ("net/mlx5: E-switch, refactor FDB miss rule add/remove")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      da75fa54
    • Dinghao Liu's avatar
      net/mlx5e: fix a potential double-free in fs_udp_create_groups · e75efc64
      Dinghao Liu authored
      When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
      fs_udp_create_groups() will free ft->g. However, its caller
      fs_udp_create_table() will free ft->g again through calling
      mlx5e_destroy_flow_table(), which will lead to a double-free.
      Fix this by setting ft->g to NULL in fs_udp_create_groups().
      
      Fixes: 1c80bd68 ("net/mlx5e: Introduce Flow Steering UDP API")
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e75efc64
    • Shifeng Li's avatar
      net/mlx5e: Fix a race in command alloc flow · 8f5100da
      Shifeng Li authored
      Fix a cmd->ent use after free due to a race on command entry.
      Such race occurs when one of the commands releases its last refcount and
      frees its index and entry while another process running command flush
      flow takes refcount to this command entry. The process which handles
      commands flush may see this command as needed to be flushed if the other
      process allocated a ent->idx but didn't set ent to cmd->ent_arr in
      cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
      the spin lock.
      
      [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
      [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
      [70013.081968]
      [70013.082028] Workqueue: events aer_isr
      [70013.082053] Call Trace:
      [70013.082067]  dump_stack+0x8b/0xbb
      [70013.082086]  print_address_description+0x6a/0x270
      [70013.082102]  kasan_report+0x179/0x2c0
      [70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
      [70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
      [70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
      [70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
      [70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
      [70013.082409]  pci_device_remove+0xf3/0x280
      [70013.082439]  device_release_driver_internal+0x1c3/0x470
      [70013.082453]  pci_stop_bus_device+0x109/0x160
      [70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
      [70013.082485]  pcie_do_fatal_recovery+0x167/0x550
      [70013.082493]  aer_isr+0x7d2/0x960
      [70013.082543]  process_one_work+0x65f/0x12d0
      [70013.082556]  worker_thread+0x87/0xb50
      [70013.082571]  kthread+0x2e9/0x3a0
      [70013.082592]  ret_from_fork+0x1f/0x40
      
      The logical relationship of this error is as follows:
      
                   aer_recover_work              |          ent->work
      -------------------------------------------+------------------------------
      aer_recover_work_func                      |
      |- pcie_do_recovery                        |
        |- report_error_detected                 |
          |- mlx5_pci_err_detected               |cmd_work_handler
            |- mlx5_enter_error_state            |  |- cmd_alloc_index
              |- enter_error_state               |    |- lock cmd->alloc_lock
                |- mlx5_cmd_flush                |    |- clear_bit
                  |- mlx5_cmd_trigger_completions|    |- unlock cmd->alloc_lock
                    |- lock cmd->alloc_lock      |
                    |- vector = ~dev->cmd.vars.bitmask
                    |- for_each_set_bit          |
                      |- cmd_ent_get(cmd->ent_arr[i]) (UAF)
                    |- unlock cmd->alloc_lock    |  |- cmd->ent_arr[ent->idx]=ent
      
      The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
      protected by the cmd->alloc_lock in cmd_work_handler().
      
      Fixes: 50b2412b ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarShifeng Li <lishifeng@sangfor.com.cn>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8f5100da
    • Shifeng Li's avatar
      net/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list() · ddb38ddf
      Shifeng Li authored
      Out_sz that the size of out buffer is calculated using query_nic_vport
      _context_in structure when driver query the MAC list. However query_nic
      _vport_context_in structure is smaller than query_nic_vport_context_out.
      When allowed_list_size is greater than 96, calling ether_addr_copy() will
      trigger an slab-out-of-bounds.
      
      [ 1170.055866] BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
      [ 1170.055869] Read of size 4 at addr ffff88bdbc57d912 by task kworker/u128:1/461
      [ 1170.055870]
      [ 1170.055932] Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core]
      [ 1170.055936] Call Trace:
      [ 1170.055949]  dump_stack+0x8b/0xbb
      [ 1170.055958]  print_address_description+0x6a/0x270
      [ 1170.055961]  kasan_report+0x179/0x2c0
      [ 1170.056061]  mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
      [ 1170.056162]  esw_update_vport_addr_list+0x2c5/0xcd0 [mlx5_core]
      [ 1170.056257]  esw_vport_change_handle_locked+0xd08/0x1a20 [mlx5_core]
      [ 1170.056377]  esw_vport_change_handler+0x6b/0x90 [mlx5_core]
      [ 1170.056381]  process_one_work+0x65f/0x12d0
      [ 1170.056383]  worker_thread+0x87/0xb50
      [ 1170.056390]  kthread+0x2e9/0x3a0
      [ 1170.056394]  ret_from_fork+0x1f/0x40
      
      Fixes: e16aea27 ("net/mlx5: Introduce access functions to modify/query vport mac lists")
      Cc: Ding Hui <dinghui@sangfor.com.cn>
      Signed-off-by: default avatarShifeng Li <lishifeng@sangfor.com.cn>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ddb38ddf
    • Vlad Buslov's avatar
      net/mlx5e: fix double free of encap_header · 8e13cd73
      Vlad Buslov authored
      Cited commit introduced potential double free since encap_header can be
      destroyed twice in some cases - once by error cleanup sequence in
      mlx5e_tc_tun_{create|update}_header_ipv{4|6}(), once by generic
      mlx5e_encap_put() that user calls as a result of getting an error from
      tunnel create|update. At the same time the point where e->encap_header is
      assigned can't be delayed because the function can still return non-error
      code 0 as a result of checking for NUD_VALID flag, which will cause
      neighbor update to dereference NULL encap_header.
      
      Fix the issue by:
      
      - Nulling local encap_header variables in
      mlx5e_tc_tun_{create|update}_header_ipv{4|6}() to make kfree(encap_header)
      call in error cleanup sequence noop after that point.
      
      - Assigning reformat_params.data from e->encap_header instead of local
      variable encap_header that was set to NULL pointer by previous step. Also
      assign reformat_params.size from e->encap_size for uniformity and in order
      to make the code less error-prone in the future.
      
      Fixes: d589e785 ("net/mlx5e: Allow concurrent creation of encap entries")
      Reported-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reported-by: default avatarCruz Zhao <cruzzhao@linux.alibaba.com>
      Reported-by: default avatarTianchen Ding <dtcccc@linux.alibaba.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8e13cd73
    • Vlad Buslov's avatar
      Revert "net/mlx5e: fix double free of encap_header" · 5d089684
      Vlad Buslov authored
      This reverts commit 6f9b1a07.
      
      This patch is causing a null ptr issue, the proper fix is in the next
      patch.
      
      Fixes: 6f9b1a07 ("net/mlx5e: fix double free of encap_header")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5d089684
    • Vlad Buslov's avatar
      Revert "net/mlx5e: fix double free of encap_header in update funcs" · 66ca8d4d
      Vlad Buslov authored
      This reverts commit 3a4aa3cb.
      
      This patch is causing a null ptr issue, the proper fix is in the next
      patch.
      
      Fixes: 3a4aa3cb ("net/mlx5e: fix double free of encap_header in update funcs")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      66ca8d4d
  2. 13 Dec, 2023 12 commits
  3. 12 Dec, 2023 4 commits
    • Dong Chenchen's avatar
      net: Remove acked SYN flag from packet in the transmit queue correctly · f99cd562
      Dong Chenchen authored
      syzkaller report:
      
       kernel BUG at net/core/skbuff.c:3452!
       invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e776-dirty #135
       RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
       Call Trace:
       icmp_glue_bits (net/ipv4/icmp.c:357)
       __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
       ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
       icmp_push_reply (net/ipv4/icmp.c:370)
       __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
       ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
       __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
       ip_output (net/ipv4/ip_output.c:427)
       __ip_queue_xmit (net/ipv4/ip_output.c:535)
       __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
       __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
       tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
       tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
       tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)
      
      The panic issue was trigered by tcp simultaneous initiation.
      The initiation process is as follows:
      
            TCP A                                            TCP B
      
        1.  CLOSED                                           CLOSED
      
        2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
      
        3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
      
        4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
      
        5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
      
        // TCP B: not send challenge ack for ack limit or packet loss
        // TCP A: close
      	tcp_close
      	   tcp_send_fin
                    if (!tskb && tcp_under_memory_pressure(sk))
                        tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
                 TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag
      
        6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...
      
        // TCP B: send challenge ack to SYN_FIN_ACK
      
        7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack
      
        // TCP A:  <SND.UNA=101>
      
        8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic
      
      	__tcp_retransmit_skb  //skb->len=0
      	    tcp_trim_head
      		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
      		    __pskb_trim_head
      			skb->data_len -= len // skb->len=-1, wrap around
      	    ... ...
      	    ip_fragment
      		icmp_glue_bits //BUG_ON
      
      If we use tcp_trim_head() to remove acked SYN from packet that contains data
      or other flags, skb->len will be incorrectly decremented. We can remove SYN
      flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
      can fix the problem mentioned above.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDong Chenchen <dongchenchen2@huawei.com>
      Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f99cd562
    • Dinghao Liu's avatar
      qed: Fix a potential use-after-free in qed_cxt_tables_alloc · b65d52ac
      Dinghao Liu authored
      qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
      free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
      qed_cxt_tables_alloc() accesses the freed pointer on failure
      of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
      which may lead to use-after-free. Fix this issue by setting
      p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b65d52ac
    • Hyunwoo Kim's avatar
      net/rose: Fix Use-After-Free in rose_ioctl · 810c38a3
      Hyunwoo Kim authored
      Because rose_ioctl() accesses sk->sk_receive_queue
      without holding a sk->sk_receive_queue.lock, it can
      cause a race with rose_accept().
      A use-after-free for skb occurs with the following flow.
      ```
      rose_ioctl() -> skb_peek()
      rose_accept() -> skb_dequeue() -> kfree_skb()
      ```
      Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AXSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      810c38a3
    • Hyunwoo Kim's avatar
      atm: Fix Use-After-Free in do_vcc_ioctl · 24e90b9e
      Hyunwoo Kim authored
      Because do_vcc_ioctl() accesses sk->sk_receive_queue
      without holding a sk->sk_receive_queue.lock, it can
      cause a race with vcc_recvmsg().
      A use-after-free for skb occurs with the following flow.
      ```
      do_vcc_ioctl() -> skb_peek()
      vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
      ```
      Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AXSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      24e90b9e
  4. 11 Dec, 2023 6 commits
    • Hariprasad Kelam's avatar
      octeontx2-af: Fix pause frame configuration · e307b5a8
      Hariprasad Kelam authored
      The current implementation's default Pause Forward setting is causing
      unnecessary network traffic. This patch disables Pause Forward to
      address this issue.
      
      Fixes: 1121f6b0 ("octeontx2-af: Priority flow control configuration support")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e307b5a8
    • David S. Miller's avatar
      Merge branch 'octeontx2-fixes' · c3e04142
      David S. Miller authored
      Hariprasad Kelam says:
      
      ====================
      octeontx2: Fix issues with promisc/allmulti mode
      
      When interface is configured in promisc/all multi mode, low network
      performance observed. This series patches address the same.
      
      Patch1: Change the promisc/all multi mcam entry action to unicast if
      there are no trusted vfs associated with PF.
      
      Patch2: Configures RSS flow algorithm in promisc/all multi mcam entries
      to address flow distribution issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3e04142
    • Hariprasad Kelam's avatar
      octeontx2-af: Update RSS algorithm index · 570ba378
      Hariprasad Kelam authored
      The RSS flow algorithm is not set up correctly for promiscuous or all
      multi MCAM entries. This has an impact on flow distribution.
      
      This patch fixes the issue by updating flow algorithm index in above
      mentioned MCAM entries.
      
      Fixes: 967db352 ("octeontx2-af: add support for multicast/promisc packet replication feature")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      570ba378
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix promisc mcam entry action · dbda4368
      Hariprasad Kelam authored
      Current implementation is such that, promisc mcam entry action
      is set as multicast even when there are no trusted VFs. multicast
      action causes the hardware to copy packet data, which reduces
      the performance.
      
      This patch fixes this issue by setting the promisc mcam entry action to
      unicast instead of multicast when there are no trusted VFs. The same
      change is made for the 'allmulti' mcam entry action.
      
      Fixes: ffd2f89a ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbda4368
    • Shinas Rasheed's avatar
      octeon_ep: explicitly test for firmware ready value · 284f7176
      Shinas Rasheed authored
      The firmware ready value is 1, and get firmware ready status
      function should explicitly test for that value. The firmware
      ready value read will be 2 after driver load, and on unbind
      till firmware rewrites the firmware ready back to 0, the value
      seen by driver will be 2, which should be regarded as not ready.
      
      Fixes: 10c073e4 ("octeon_ep: defer probe if firmware not ready")
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      284f7176
    • Vlad Buslov's avatar
      net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table · 125f1c7f
      Vlad Buslov authored
      The referenced change added custom cleanup code to act_ct to delete any
      callbacks registered on the parent block when deleting the
      tcf_ct_flow_table instance. However, the underlying issue is that the
      drivers don't obtain the reference to the tcf_ct_flow_table instance when
      registering callbacks which means that not only driver callbacks may still
      be on the table when deleting it but also that the driver can still have
      pointers to its internal nf_flowtable and can use it concurrently which
      results either warning in netfilter[0] or use-after-free.
      
      Fix the issue by taking a reference to the underlying struct
      tcf_ct_flow_table instance when registering the callback and release the
      reference when unregistering. Expose new API required for such reference
      counting by adding two new callbacks to nf_flowtable_type and implementing
      them for act_ct flowtable_ct type. This fixes the issue by extending the
      lifetime of nf_flowtable until all users have unregistered.
      
      [0]:
      [106170.938634] ------------[ cut here ]------------
      [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm
      ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis
      try overlay mlx5_core
      [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1
      [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
      [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00
      [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202
      [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48
      [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000
      [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700
      [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58
      [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000
      [106170.951616] FS:  0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000
      [106170.952329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0
      [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [106170.954766] Call Trace:
      [106170.955057]  <TASK>
      [106170.955315]  ? __warn+0x79/0x120
      [106170.955648]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.956172]  ? report_bug+0x17c/0x190
      [106170.956537]  ? handle_bug+0x3c/0x60
      [106170.956891]  ? exc_invalid_op+0x14/0x70
      [106170.957264]  ? asm_exc_invalid_op+0x16/0x20
      [106170.957666]  ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core]
      [106170.958172]  ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core]
      [106170.958788]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.959339]  ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core]
      [106170.959854]  ? mapping_remove+0x154/0x1d0 [mlx5_core]
      [106170.960342]  ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core]
      [106170.960927]  mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core]
      [106170.961441]  mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core]
      [106170.962001]  mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core]
      [106170.962524]  mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core]
      [106170.963034]  mlx5e_flow_put+0x73/0xe0 [mlx5_core]
      [106170.963506]  mlx5e_put_flow_list+0x38/0x70 [mlx5_core]
      [106170.964002]  mlx5e_rep_update_flows+0xec/0x290 [mlx5_core]
      [106170.964525]  mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core]
      [106170.965056]  process_one_work+0x13a/0x2c0
      [106170.965443]  worker_thread+0x2e5/0x3f0
      [106170.965808]  ? rescuer_thread+0x410/0x410
      [106170.966192]  kthread+0xc6/0xf0
      [106170.966515]  ? kthread_complete_and_exit+0x20/0x20
      [106170.966970]  ret_from_fork+0x2d/0x50
      [106170.967332]  ? kthread_complete_and_exit+0x20/0x20
      [106170.967774]  ret_from_fork_asm+0x11/0x20
      [106170.970466]  </TASK>
      [106170.970726] ---[ end trace 0000000000000000 ]---
      
      Fixes: 77ac5e40 ("net/sched: act_ct: remove and free nf_table callbacks")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      125f1c7f
  5. 10 Dec, 2023 2 commits
    • Zhipeng Lu's avatar
      octeontx2-af: fix a use-after-free in rvu_nix_register_reporters · 28a7cb04
      Zhipeng Lu authored
      The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl)
      after the create_workqueue fails, and after that free, the rvu_dl will
      be translate back through the following call chain:
      
      rvu_nix_health_reporters_destroy
        |-> rvu_nix_health_reporters_create
             |-> rvu_health_reporters_create
                   |-> rvu_register_dl (label err_dl_health)
      
      Finally. in the err_dl_health label, rvu_dl being freed again in
      rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy.
      In the second calls of rvu_nix_health_reporters_destroy, however,
      it uses rvu_dl->rvu_nix_health_reporter, which is already freed at
      the end of rvu_nix_health_reporters_destroy in the first call.
      
      So this patch prevents the first destroy by instantly returning -ENONMEN
      when create_workqueue fails. In addition, since the failure of
      create_workqueue is the only entrence of label err, it has been
      integrated into the error-handling path of create_workqueue.
      
      Fixes: 5ed66306 ("octeontx2-af: Add devlink health reporters for NIX")
      Signed-off-by: default avatarZhipeng Lu <alexious@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28a7cb04
    • Radu Bulie's avatar
      net: fec: correct queue selection · 9fc95fe9
      Radu Bulie authored
      The old implementation extracted VLAN TCI info from the payload
      before the VLAN tag has been pushed in the payload.
      
      Another problem was that the VLAN TCI was extracted even if the
      packet did not have VLAN protocol header.
      
      This resulted in invalid VLAN TCI and as a consequence a random
      queue was computed.
      
      This patch fixes the above issues and use the VLAN TCI from the
      skb if it is present or VLAN TCI from payload if present. If no
      VLAN header is present queue 0 is selected.
      
      Fixes: 52c4a1a8 ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations")
      Signed-off-by: default avatarRadu Bulie <radu-andrei.bulie@nxp.com>
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fc95fe9
  6. 09 Dec, 2023 6 commits