1. 03 May, 2023 7 commits
  2. 01 May, 2023 11 commits
    • David S. Miller's avatar
      Merge branch 'rxrpc-timeout-fixes' · fb7cba61
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Timeout handling fixes
      
      Here are three patches to fix timeouts handling in AF_RXRPC:
      
       (1) The hard call timeout should be interpreted in seconds, not
           milliseconds.
      
       (2) Allow a waiting call to be aborted (thereby cancelling the call) in
           the case a signal interrupts sendmsg() and leaves it hanging until it
           is granted a channel on a connection.
      
       (3) Kernel-generated calls get the timer started on them even if they're
           still waiting to be attached to a connection.  If the timer expires
           before the wait is complete and a conn is attached, an oops will
           occur.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb7cba61
    • David Howells's avatar
      rxrpc: Fix timeout of a call that hasn't yet been granted a channel · db099c62
      David Howells authored
      afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may
      get stalled in the background waiting for a connection to become
      available); it then calls rxrpc_kernel_set_max_life() to set the timeouts -
      but that starts the call timer so the call timer might then expire before
      we get a connection assigned - leading to the following oops if the call
      stalled:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000000
      	...
      	CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701
      	RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157
      	...
      	Call Trace:
      	 <TASK>
      	 rxrpc_send_ACK+0x50/0x13b
      	 rxrpc_input_call_event+0x16a/0x67d
      	 rxrpc_io_thread+0x1b6/0x45f
      	 ? _raw_spin_unlock_irqrestore+0x1f/0x35
      	 ? rxrpc_input_packet+0x519/0x519
      	 kthread+0xe7/0xef
      	 ? kthread_complete_and_exit+0x1b/0x1b
      	 ret_from_fork+0x22/0x30
      
      Fix this by noting the timeouts in struct rxrpc_call when the call is
      created.  The timer will be started when the first packet is transmitted.
      
      It shouldn't be possible to trigger this directly from userspace through
      AF_RXRPC as sendmsg() will return EBUSY if the call is in the
      waiting-for-conn state if it dropped out of the wait due to a signal.
      
      Fixes: 9d35d880 ("rxrpc: Move client call connection to the I/O thread")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: "David S. Miller" <davem@davemloft.net>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Jakub Kicinski <kuba@kernel.org>
      cc: Paolo Abeni <pabeni@redhat.com>
      cc: linux-afs@lists.infradead.org
      cc: netdev@vger.kernel.org
      cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db099c62
    • David Howells's avatar
      rxrpc: Make it so that a waiting process can be aborted · 0eb362d2
      David Howells authored
      When sendmsg() creates an rxrpc call, it queues it to wait for a connection
      and channel to be assigned and then waits before it can start shovelling
      data as the encrypted DATA packet content includes a summary of the
      connection parameters.
      
      However, sendmsg() may get interrupted before a connection gets assigned
      and further sendmsg() calls will fail with EBUSY until an assignment is
      made.
      
      Fix this so that the call can at least be aborted without failing on
      EBUSY.  We have to be careful here as sendmsg() mustn't be allowed to start
      the call timer if the call doesn't yet have a connection assigned as an
      oops may follow shortly thereafter.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: "David S. Miller" <davem@davemloft.net>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Jakub Kicinski <kuba@kernel.org>
      cc: Paolo Abeni <pabeni@redhat.com>
      cc: linux-afs@lists.infradead.org
      cc: netdev@vger.kernel.org
      cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb362d2
    • David Howells's avatar
      rxrpc: Fix hard call timeout units · 0d098d83
      David Howells authored
      The hard call timeout is specified in the RXRPC_SET_CALL_TIMEOUT cmsg in
      seconds, so fix the point at which sendmsg() applies it to the call to
      convert to jiffies from seconds, not milliseconds.
      
      Fixes: a158bdd3 ("rxrpc: Fix timeout of a call that hasn't yet been granted a channel")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: "David S. Miller" <davem@davemloft.net>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Jakub Kicinski <kuba@kernel.org>
      cc: Paolo Abeni <pabeni@redhat.com>
      cc: linux-afs@lists.infradead.org
      cc: netdev@vger.kernel.org
      cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d098d83
    • Tom Rix's avatar
      net: atlantic: Define aq_pm_ops conditionally on CONFIG_PM · 4f163bf8
      Tom Rix authored
      For s390, gcc with W=1 reports
      drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:458:32: error:
        'aq_pm_ops' defined but not used [-Werror=unused-const-variable=]
        458 | static const struct dev_pm_ops aq_pm_ops = {
            |                                ^~~~~~~~~
      
      The only use of aq_pm_ops is conditional on CONFIG_PM.
      The definition of aq_pm_ops and its functions should also
      be conditional on CONFIG_PM.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f163bf8
    • Andy Moreton's avatar
      sfc: Fix module EEPROM reporting for QSFP modules · 281900a9
      Andy Moreton authored
      The sfc driver does not report QSFP module EEPROM contents correctly
      as only the first page is fetched from hardware.
      
      Commit 0e1a2a3e ("ethtool: Add SFF-8436 and SFF-8636 max EEPROM
      length definitions") added ETH_MODULE_SFF_8436_MAX_LEN for the overall
      size of the EEPROM info, so use that to report the full EEPROM contents.
      
      Fixes: 9b17010d ("sfc: Add ethtool -m support for QSFP modules")
      Signed-off-by: default avatarAndy Moreton <andy.moreton@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      281900a9
    • David S. Miller's avatar
      Merge branch 'r8152-fixes' · f858e2fd
      David S. Miller authored
      Hayes Wang says:
      
      ====================
      r8152: fix 2.5G devices
      
      v3:
      For patch #2, modify the comment.
      
      v2:
      For patch #1, Remove inline for fc_pause_on_auto() and fc_pause_off_auto(),
      and update the commit message.
      
      For patch #2, define the magic value for OCP register 0xa424.
      
      v1:
      These patches are used to fix some issues of RTL8156.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f858e2fd
    • Hayes Wang's avatar
      r8152: move setting r8153b_rx_agg_chg_indicate() · cce8334f
      Hayes Wang authored
      Move setting r8153b_rx_agg_chg_indicate() for 2.5G devices. The
      r8153b_rx_agg_chg_indicate() has to be called after enabling tx/rx.
      Otherwise, the coalescing settings are useless.
      
      Fixes: 195aae32 ("r8152: support new chips")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cce8334f
    • Hayes Wang's avatar
      r8152: fix the poor throughput for 2.5G devices · 61b0ad6f
      Hayes Wang authored
      Fix the poor throughput for 2.5G devices, when changing the speed from
      auto mode to force mode. This patch is used to notify the MAC when the
      mode is changed.
      
      Fixes: 195aae32 ("r8152: support new chips")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61b0ad6f
    • Hayes Wang's avatar
      r8152: fix flow control issue of RTL8156A · 8ceda6d5
      Hayes Wang authored
      The feature of flow control becomes abnormal, if the device sends a
      pause frame and the tx/rx is disabled before sending a release frame. It
      causes the lost of packets.
      
      Set PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY to zeros before disabling the
      tx/rx. And, toggle FC_PATCH_TASK before enabling tx/rx to reset the flow
      control patch and timer. Then, the hardware could clear the state and
      the flow control becomes normal after enabling tx/rx.
      
      Besides, remove inline for fc_pause_on_auto() and fc_pause_off_auto().
      
      Fixes: 195aae32 ("r8152: support new chips")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ceda6d5
    • Victor Nogueira's avatar
      net/sched: act_mirred: Add carrier check · 526f28bd
      Victor Nogueira authored
      There are cases where the device is adminstratively UP, but operationally
      down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps)
      who's cable was pulled out, here is its ip link output:
      
      5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
          link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff
          altname enp179s0f1np1
      
      As you can see, it's administratively UP but operationally down.
      In this case, sending a packet to this port caused a nasty kernel hang (so
      nasty that we were unable to capture it). Aborting a transmit based on
      operational status (in addition to administrative status) fixes the issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      v1->v2: Add fixes tag
      v2->v3: Remove blank line between tags + add change log, suggested by Leon
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      526f28bd
  3. 28 Apr, 2023 7 commits
    • Angelo Dureghello's avatar
      net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpu · 66863178
      Angelo Dureghello authored
      Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu
      processing.
      Signed-off-by: default avatarAngelo Dureghello <angelo.dureghello@timesys.com>
      Fixes: 51c901a7 ("net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66863178
    • Antoine Tenart's avatar
      net: ipv6: fix skb hash for some RST packets · dc6456e9
      Antoine Tenart authored
      The skb hash comes from sk->sk_txhash when using TCP, except for some
      IPv6 RST packets. This is because in tcp_v6_send_reset when not in
      TIME_WAIT the hash is taken from sk->sk_hash, while it should come from
      sk->sk_txhash as those two hashes are not computed the same way.
      
      Packetdrill script to test the above,
      
         0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
        +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > (flowlabel 0x1) S 0:0(0) <...>
      
        // Wrong ack seq, trigger a rst.
        +0 < S. 0:0(0) ack 0 win 4000
      
        // Check the flowlabel matches prior one from SYN.
        +0 > (flowlabel 0x1) R 0:0(0) <...>
      
      Fixes: 9258b8b1 ("ipv6: tcp: send consistent autoflowlabel in RST packets")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc6456e9
    • Andrea Mayer's avatar
      selftests: srv6: make srv6_end_dt46_l3vpn_test more robust · 46ef24c6
      Andrea Mayer authored
      On some distributions, the rp_filter is automatically set (=1) by
      default on a netdev basis (also on VRFs).
      In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
      the table associated with the VRF bound to that tunnel. During lookup
      operations, the rp_filter can lead to packet loss when activated on the
      VRF.
      Therefore, we chose to make this selftest more robust by explicitly
      disabling the rp_filter during tests (as it is automatically set by some
      Linux distributions).
      
      Fixes: 03a0b567 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46ef24c6
    • Cong Wang's avatar
      sit: update dev->needed_headroom in ipip6_tunnel_bind_dev() · c88f8d5c
      Cong Wang authored
      When a tunnel device is bound with the underlying device, its
      dev->needed_headroom needs to be updated properly. IPv4 tunnels
      already do the same in ip_tunnel_bind_dev(). Otherwise we may
      not have enough header room for skb, especially after commit
      b17f709a ("gue: TX support for using remote checksum offload option").
      
      Fixes: 32b8a8e5 ("sit: add IPv4 over IPv4 support")
      Reported-by: default avatarPalash Oswal <oswalpalash@gmail.com>
      Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c88f8d5c
    • Vlad Buslov's avatar
      net/sched: cls_api: remove block_cb from driver_list before freeing · da94a778
      Vlad Buslov authored
      Error handler of tcf_block_bind() frees the whole bo->cb_list on error.
      However, by that time the flow_block_cb instances are already in the driver
      list because driver ndo_setup_tc() callback is called before that up the
      call chain in tcf_block_offload_cmd(). This leaves dangling pointers to
      freed objects in the list and causes use-after-free[0]. Fix it by also
      removing flow_block_cb instances from driver_list before deallocating them.
      
      [0]:
      [  279.868433] ==================================================================
      [  279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0
      [  279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963
      
      [  279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4
      [  279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  279.876295] Call Trace:
      [  279.876882]  <TASK>
      [  279.877413]  dump_stack_lvl+0x33/0x50
      [  279.878198]  print_report+0xc2/0x610
      [  279.878987]  ? flow_block_cb_setup_simple+0x631/0x7c0
      [  279.879994]  kasan_report+0xae/0xe0
      [  279.880750]  ? flow_block_cb_setup_simple+0x631/0x7c0
      [  279.881744]  ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core]
      [  279.883047]  flow_block_cb_setup_simple+0x631/0x7c0
      [  279.884027]  tcf_block_offload_cmd.isra.0+0x189/0x2d0
      [  279.885037]  ? tcf_block_setup+0x6b0/0x6b0
      [  279.885901]  ? mutex_lock+0x7d/0xd0
      [  279.886669]  ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0
      [  279.887844]  ? ingress_init+0x1c0/0x1c0 [sch_ingress]
      [  279.888846]  tcf_block_get_ext+0x61c/0x1200
      [  279.889711]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.890682]  ? clsact_init+0x2b0/0x2b0 [sch_ingress]
      [  279.891701]  qdisc_create+0x401/0xea0
      [  279.892485]  ? qdisc_tree_reduce_backlog+0x470/0x470
      [  279.893473]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.894344]  ? tc_get_qdisc+0xac0/0xac0
      [  279.895213]  ? mutex_lock+0x7d/0xd0
      [  279.896005]  ? __mutex_lock_slowpath+0x10/0x10
      [  279.896910]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.897770]  ? rtnl_calcit.isra.0+0x2b0/0x2b0
      [  279.898672]  ? __sys_sendmsg+0xb5/0x140
      [  279.899494]  ? do_syscall_64+0x3d/0x90
      [  279.900302]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  279.901337]  ? kasan_save_stack+0x2e/0x40
      [  279.902177]  ? kasan_save_stack+0x1e/0x40
      [  279.903058]  ? kasan_set_track+0x21/0x30
      [  279.903913]  ? kasan_save_free_info+0x2a/0x40
      [  279.904836]  ? ____kasan_slab_free+0x11a/0x1b0
      [  279.905741]  ? kmem_cache_free+0x179/0x400
      [  279.906599]  netlink_rcv_skb+0x12c/0x360
      [  279.907450]  ? rtnl_calcit.isra.0+0x2b0/0x2b0
      [  279.908360]  ? netlink_ack+0x1550/0x1550
      [  279.909192]  ? rhashtable_walk_peek+0x170/0x170
      [  279.910135]  ? kmem_cache_alloc_node+0x1af/0x390
      [  279.911086]  ? _copy_from_iter+0x3d6/0xc70
      [  279.912031]  netlink_unicast+0x553/0x790
      [  279.912864]  ? netlink_attachskb+0x6a0/0x6a0
      [  279.913763]  ? netlink_recvmsg+0x416/0xb50
      [  279.914627]  netlink_sendmsg+0x7a1/0xcb0
      [  279.915473]  ? netlink_unicast+0x790/0x790
      [  279.916334]  ? iovec_from_user.part.0+0x4d/0x220
      [  279.917293]  ? netlink_unicast+0x790/0x790
      [  279.918159]  sock_sendmsg+0xc5/0x190
      [  279.918938]  ____sys_sendmsg+0x535/0x6b0
      [  279.919813]  ? import_iovec+0x7/0x10
      [  279.920601]  ? kernel_sendmsg+0x30/0x30
      [  279.921423]  ? __copy_msghdr+0x3c0/0x3c0
      [  279.922254]  ? import_iovec+0x7/0x10
      [  279.923041]  ___sys_sendmsg+0xeb/0x170
      [  279.923854]  ? copy_msghdr_from_user+0x110/0x110
      [  279.924797]  ? ___sys_recvmsg+0xd9/0x130
      [  279.925630]  ? __perf_event_task_sched_in+0x183/0x470
      [  279.926656]  ? ___sys_sendmsg+0x170/0x170
      [  279.927529]  ? ctx_sched_in+0x530/0x530
      [  279.928369]  ? update_curr+0x283/0x4f0
      [  279.929185]  ? perf_event_update_userpage+0x570/0x570
      [  279.930201]  ? __fget_light+0x57/0x520
      [  279.931023]  ? __switch_to+0x53d/0xe70
      [  279.931846]  ? sockfd_lookup_light+0x1a/0x140
      [  279.932761]  __sys_sendmsg+0xb5/0x140
      [  279.933560]  ? __sys_sendmsg_sock+0x20/0x20
      [  279.934436]  ? fpregs_assert_state_consistent+0x1d/0xa0
      [  279.935490]  do_syscall_64+0x3d/0x90
      [  279.936300]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  279.937311] RIP: 0033:0x7f21c814f887
      [  279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887
      [  279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003
      [  279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      [  279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001
      [  279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
      [  279.949690]  </TASK>
      
      [  279.950706] Allocated by task 2960:
      [  279.951471]  kasan_save_stack+0x1e/0x40
      [  279.952338]  kasan_set_track+0x21/0x30
      [  279.953165]  __kasan_kmalloc+0x77/0x90
      [  279.954006]  flow_block_cb_setup_simple+0x3dd/0x7c0
      [  279.955001]  tcf_block_offload_cmd.isra.0+0x189/0x2d0
      [  279.956020]  tcf_block_get_ext+0x61c/0x1200
      [  279.956881]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.957873]  qdisc_create+0x401/0xea0
      [  279.958656]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.959506]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.960392]  netlink_rcv_skb+0x12c/0x360
      [  279.961216]  netlink_unicast+0x553/0x790
      [  279.962044]  netlink_sendmsg+0x7a1/0xcb0
      [  279.962906]  sock_sendmsg+0xc5/0x190
      [  279.963702]  ____sys_sendmsg+0x535/0x6b0
      [  279.964534]  ___sys_sendmsg+0xeb/0x170
      [  279.965343]  __sys_sendmsg+0xb5/0x140
      [  279.966132]  do_syscall_64+0x3d/0x90
      [  279.966908]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  279.968407] Freed by task 2960:
      [  279.969114]  kasan_save_stack+0x1e/0x40
      [  279.969929]  kasan_set_track+0x21/0x30
      [  279.970729]  kasan_save_free_info+0x2a/0x40
      [  279.971603]  ____kasan_slab_free+0x11a/0x1b0
      [  279.972483]  __kmem_cache_free+0x14d/0x280
      [  279.973337]  tcf_block_setup+0x29d/0x6b0
      [  279.974173]  tcf_block_offload_cmd.isra.0+0x226/0x2d0
      [  279.975186]  tcf_block_get_ext+0x61c/0x1200
      [  279.976080]  ingress_init+0x112/0x1c0 [sch_ingress]
      [  279.977065]  qdisc_create+0x401/0xea0
      [  279.977857]  tc_modify_qdisc+0x6f7/0x16d0
      [  279.978695]  rtnetlink_rcv_msg+0x5fe/0x9d0
      [  279.979562]  netlink_rcv_skb+0x12c/0x360
      [  279.980388]  netlink_unicast+0x553/0x790
      [  279.981214]  netlink_sendmsg+0x7a1/0xcb0
      [  279.982043]  sock_sendmsg+0xc5/0x190
      [  279.982827]  ____sys_sendmsg+0x535/0x6b0
      [  279.983703]  ___sys_sendmsg+0xeb/0x170
      [  279.984510]  __sys_sendmsg+0xb5/0x140
      [  279.985298]  do_syscall_64+0x3d/0x90
      [  279.986076]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  279.987532] The buggy address belongs to the object at ffff888147e2bf00
                      which belongs to the cache kmalloc-192 of size 192
      [  279.989747] The buggy address is located 32 bytes inside of
                      freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0)
      
      [  279.992367] The buggy address belongs to the physical page:
      [  279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a
      [  279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [  279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2)
      [  279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001
      [  279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
      [  280.000894] page dumped because: kasan: bad access detected
      
      [  280.002386] Memory state around the buggy address:
      [  280.003338]  ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.004781]  ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [  280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.007700]                                ^
      [  280.008592]  ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [  280.010035]  ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  280.011564] ==================================================================
      
      Fixes: 59094b1e ("net: sched: use flow block API")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da94a778
    • Eric Dumazet's avatar
      tcp: fix skb_copy_ubufs() vs BIG TCP · 7e692df3
      Eric Dumazet authored
      David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy
      using hugepages, and skb length bigger than ~68 KB.
      
      skb_copy_ubufs() assumed it could copy all payload using up to
      MAX_SKB_FRAGS order-0 pages.
      
      This assumption broke when BIG TCP was able to put up to 512 KB per skb.
      
      We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45
      and limit gso_max_size to 180000.
      
      A solution is to use higher order pages if needed.
      
      v2: add missing __GFP_COMP, or we leak memory.
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Reported-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/netdev/c70000f6-baa4-4a05-46d0-4b3e0dc1ccc8@gmail.com/T/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Coco Li <lixiaoyan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e692df3
    • Cosmo Chou's avatar
      net/ncsi: clear Tx enable mode when handling a Config required AEN · 6f75cd16
      Cosmo Chou authored
      ncsi_channel_is_tx() determines whether a given channel should be
      used for Tx or not. However, when reconfiguring the channel by
      handling a Configuration Required AEN, there is a misjudgment that
      the channel Tx has already been enabled, which results in the Enable
      Channel Network Tx command not being sent.
      
      Clear the channel Tx enable flag before reconfiguring the channel to
      avoid the misjudgment.
      
      Fixes: 8d951a75 ("net/ncsi: Configure multi-package, multi-channel modes with failover")
      Signed-off-by: default avatarCosmo Chou <chou.cosmo@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f75cd16
  4. 27 Apr, 2023 14 commits
    • Paolo Abeni's avatar
      Merge branch 'macsec-fixes-for-cn10kb' · 075cafff
      Paolo Abeni authored
      Geetha sowjanya says:
      
      ====================
      Macsec fixes for CN10KB
      
      This patch set has fixes for the issues encountered while
      testing macsec on CN10KB silicon. Below is the description
      of patches:
      
      Patch 1: For each LMAC two MCSX_MCS_TOP_SLAVE_CHANNEL_CFG registers exist
      	 in CN10KB. Bypass has to be disabled in two registers.
      
      Patch 2: Add workaround for errata w.r.t accessing TCAM DATA and MASK registers.
      
      Patch 3: Fixes the parser configuration to allow PTP traffic.
      
      Patch 4: Addresses the IP vector and block level interrupt mask changes.
      
      Patch 5: Fix NULL pointer crashes when rebooting
      
      Patch 6: Since MCS is global block shared by all LMACS the TCAM match
      	 must include macsec DMAC also to distinguish each macsec interface
      
      Patch 7: Before freeing MCS hardware resource to AF clear the stats also.
      
      Patch 8: Stats which share single counter in hardware are tracked in software.
      	 This tracking was based on wrong secy mode params.
      	 Use correct secy mode params
      
      Patch 9: When updating secy mode params, PN number was also reset to
      	 initial values. Hence do not write to PN value register when
      	 updating secy.
      ====================
      
      Link: https://lore.kernel.org/r/20230426062528.20575-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      075cafff
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Do not reset PN while updating secy · 3c99bace
      Subbaraya Sundeep authored
      After creating SecYs, SCs and SAs a SecY can be modified
      to change attributes like validation mode, protect frames
      mode etc. During this SecY update, packet number is reset to
      initial user given value by mistake. Hence do not reset
      PN when updating SecY parameters.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3c99bace
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Fix shared counters logic · 9bdfe610
      Subbaraya Sundeep authored
      Macsec stats like InPktsLate and InPktsDelayed share
      same counter in hardware. If SecY replay_protect is true
      then counter represents InPktsLate otherwise InPktsDelayed.
      This mode change was tracked based on protect_frames
      instead of replay_protect mistakenly. Similarly InPktsUnchecked
      and InPktsOk share same counter and mode change was tracked
      based on validate_check instead of validate_disabled.
      This patch fixes those problems.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9bdfe610
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Clear stats before freeing resource · 815debbb
      Subbaraya Sundeep authored
      When freeing MCS hardware resources like SecY, SC and
      SA the corresponding stats needs to be cleared. Otherwise
      previous stats are shown in newly created macsec interfaces.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      815debbb
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Match macsec ethertype along with DMAC · 57d00d43
      Subbaraya Sundeep authored
      On CN10KB silicon a single hardware macsec block is
      present and offloads macsec operations for all the
      ethernet LMACs. TCAM match with macsec ethertype 0x88e5
      alone at RX side is not sufficient to distinguish all the
      macsec interfaces created on top of netdevs. Hence append
      the DMAC of the macsec interface too. Otherwise the first
      created macsec interface only receives all the macsec traffic.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57d00d43
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Fix NULL pointer dereferences · 699af748
      Subbaraya Sundeep authored
      When system is rebooted after creating macsec interface
      below NULL pointer dereference crashes occurred. This
      patch fixes those crashes by using correct order of teardown
      
      [ 3324.406942] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [ 3324.415726] Mem abort info:
      [ 3324.418510]   ESR = 0x96000006
      [ 3324.421557]   EC = 0x25: DABT (current EL), IL = 32 bits
      [ 3324.426865]   SET = 0, FnV = 0
      [ 3324.429913]   EA = 0, S1PTW = 0
      [ 3324.433047] Data abort info:
      [ 3324.435921]   ISV = 0, ISS = 0x00000006
      [ 3324.439748]   CM = 0, WnR = 0
      ....
      [ 3324.575915] Call trace:
      [ 3324.578353]  cn10k_mdo_del_secy+0x24/0x180
      [ 3324.582440]  macsec_common_dellink+0xec/0x120
      [ 3324.586788]  macsec_notify+0x17c/0x1c0
      [ 3324.590529]  raw_notifier_call_chain+0x50/0x70
      [ 3324.594965]  call_netdevice_notifiers_info+0x34/0x7c
      [ 3324.599921]  rollback_registered_many+0x354/0x5bc
      [ 3324.604616]  unregister_netdevice_queue+0x88/0x10c
      [ 3324.609399]  unregister_netdev+0x20/0x30
      [ 3324.613313]  otx2_remove+0x8c/0x310
      [ 3324.616794]  pci_device_shutdown+0x30/0x70
      [ 3324.620882]  device_shutdown+0x11c/0x204
      
      [  966.664930] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  966.673712] Mem abort info:
      [  966.676497]   ESR = 0x96000006
      [  966.679543]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  966.684848]   SET = 0, FnV = 0
      [  966.687895]   EA = 0, S1PTW = 0
      [  966.691028] Data abort info:
      [  966.693900]   ISV = 0, ISS = 0x00000006
      [  966.697729]   CM = 0, WnR = 0
      [  966.833467] Call trace:
      [  966.835904]  cn10k_mdo_stop+0x20/0xa0
      [  966.839557]  macsec_dev_stop+0xe8/0x11c
      [  966.843384]  __dev_close_many+0xbc/0x140
      [  966.847298]  dev_close_many+0x84/0x120
      [  966.851039]  rollback_registered_many+0x114/0x5bc
      [  966.855735]  unregister_netdevice_many.part.0+0x14/0xa0
      [  966.860952]  unregister_netdevice_many+0x18/0x24
      [  966.865560]  macsec_notify+0x1ac/0x1c0
      [  966.869303]  raw_notifier_call_chain+0x50/0x70
      [  966.873738]  call_netdevice_notifiers_info+0x34/0x7c
      [  966.878694]  rollback_registered_many+0x354/0x5bc
      [  966.883390]  unregister_netdevice_queue+0x88/0x10c
      [  966.888173]  unregister_netdev+0x20/0x30
      [  966.892090]  otx2_remove+0x8c/0x310
      [  966.895571]  pci_device_shutdown+0x30/0x70
      [  966.899660]  device_shutdown+0x11c/0x204
      [  966.903574]  __do_sys_reboot+0x208/0x290
      [  966.907487]  __arm64_sys_reboot+0x20/0x30
      [  966.911489]  el0_svc_handler+0x80/0x1c0
      [  966.915316]  el0_svc+0x8/0x180
      [  966.918362] Code: f9400000 f9400a64 91220014 f94b3403 (f9400060)
      [  966.924448] ---[ end trace 341778e799c3d8d7 ]---
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      699af748
    • Geetha sowjanya's avatar
      octeontx2-af: mcs: Fix MCS block interrupt · b8aebeaa
      Geetha sowjanya authored
      On CN10KB, MCS IP vector number, BBE and PAB interrupt mask
      got changed to support more block level interrupts.
      To address this changes, this patch fixes the bbe and pab
      interrupt handlers.
      
      Fixes: 6c635f78 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b8aebeaa
    • Geetha sowjanya's avatar
      octeontx2-af: mcs: Config parser to skip 8B header · 65cdc2b6
      Geetha sowjanya authored
      When ptp timestamp is enabled in RPM, RPM will append 8B
      timestamp header for all RX traffic. MCS need to skip these
      8 bytes header while parsing the packet header, so that
      correct tcam key is created for lookup.
      This patch fixes the mcs parser configuration to skip this
      8B header for ptp packets.
      
      Fixes: ca7f49ff ("octeontx2-af: cn10k: Introduce driver for macsec block.")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      65cdc2b6
    • Subbaraya Sundeep's avatar
      octeontx2-af: mcs: Write TCAM_DATA and TCAM_MASK registers at once · b5161219
      Subbaraya Sundeep authored
      As per hardware errata on CN10KB, all the four TCAM_DATA
      and TCAM_MASK registers has to be written at once otherwise
      write to individual registers will fail. Hence write to all
      TCAM_DATA registers and then to all TCAM_MASK registers.
      
      Fixes: cfc14181 ("octeontx2-af: cn10k: mcs: Manage the MCS block hardware resources")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b5161219
    • Geetha sowjanya's avatar
      octeonxt2-af: mcs: Fix per port bypass config · c222b292
      Geetha sowjanya authored
      For each lmac port, MCS has two MCS_TOP_SLAVE_CHANNEL_CONFIGX
      registers. For CN10KB both register need to be configured for the
      port level mcs bypass to work. This patch also sets bitmap
      of flowid/secy entry reserved for default bypass so that these
      entries can be shown in debugfs.
      
      Fixes: bd69476e ("octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c222b292
    • John Hickey's avatar
      ixgbe: Fix panic during XDP_TX with > 64 CPUs · c23ae509
      John Hickey authored
      Commit 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus")
      adds support to allow XDP programs to run on systems with more than
      64 CPUs by locking the XDP TX rings and indexing them using cpu % 64
      (IXGBE_MAX_XDP_QS).
      
      Upon trying this out patch on a system with more than 64 cores,
      the kernel paniced with an array-index-out-of-bounds at the return in
      ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx
      was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS.  An example
      splat:
      
       ==========================================================================
       UBSAN: array-index-out-of-bounds in
       /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
       index 65 is out of range for type 'ixgbe_ring *[64]'
       ==========================================================================
       BUG: kernel NULL pointer dereference, address: 0000000000000058
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP NOPTI
       CPU: 65 PID: 408 Comm: ksoftirqd/65
       Tainted: G          IOE     5.15.0-48-generic #54~20.04.1-Ubuntu
       Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
       RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
       Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
       00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7
       47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
       RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
       RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
       RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
       RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
       R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
       R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
       FS:  0000000000000000(0000) GS:ffff92b9dfc00000(0000)
       knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        <TASK>
        ixgbe_poll+0x103e/0x1280 [ixgbe]
        ? sched_clock_cpu+0x12/0xe0
        __napi_poll+0x30/0x160
        net_rx_action+0x11c/0x270
        __do_softirq+0xda/0x2ee
        run_ksoftirqd+0x2f/0x50
        smpboot_thread_fn+0xb7/0x150
        ? sort_range+0x30/0x30
        kthread+0x127/0x150
        ? set_kthread_struct+0x50/0x50
        ret_from_fork+0x1f/0x30
        </TASK>
      
      I think this is how it happens:
      
      Upon loading the first XDP program on a system with more than 64 CPUs,
      ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup.  However,
      immediately after this, the rings are reconfigured by ixgbe_setup_tc.
      ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
      ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
      ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
      it is non-zero.  Commenting out the decrement in ixgbe_free_q_vector
      stopped my system from panicing.
      
      I suspect to make the original patch work, I would need to load an XDP
      program and then replace it in order to get ixgbe_xdp_locking_key back
      above 0 since ixgbe_setup_tc is only called when transitioning between
      XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
      incremented every time ixgbe_xdp_setup is called.
      
      Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
      becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
      with more than 64 CPUs.
      
      Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied
      to the number of CPUs present, there is no reason to disable it upon
      unloading an XDP program.  To avoid confusion, I have moved enabling
      ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path.
      
      Fixes: 4fe81585 ("ixgbe: let the xdpdrv work with more than 64 cpus")
      Signed-off-by: default avatarJohn Hickey <jjh@daedalian.us>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c23ae509
    • Pedro Tammela's avatar
      net/sched: act_pedit: free pedit keys on bail from offset check · 1b483d9f
      Pedro Tammela authored
      Ido Schimmel reports a memleak on a syzkaller instance:
         BUG: memory leak
         unreferenced object 0xffff88803d45e400 (size 1024):
           comm "syz-executor292", pid 563, jiffies 4295025223 (age 51.781s)
           hex dump (first 32 bytes):
             28 bd 70 00 fb db df 25 02 00 14 1f ff 02 00 02  (.p....%........
             00 32 00 00 1f 00 00 00 ac 14 14 3e 08 00 07 00  .2.........>....
           backtrace:
             [<ffffffff81bd0f2c>] kmemleak_alloc_recursive include/linux/kmemleak.h:42 [inline]
             [<ffffffff81bd0f2c>] slab_post_alloc_hook mm/slab.h:772 [inline]
             [<ffffffff81bd0f2c>] slab_alloc_node mm/slub.c:3452 [inline]
             [<ffffffff81bd0f2c>] __kmem_cache_alloc_node+0x25c/0x320 mm/slub.c:3491
             [<ffffffff81a865d9>] __do_kmalloc_node mm/slab_common.c:966 [inline]
             [<ffffffff81a865d9>] __kmalloc+0x59/0x1a0 mm/slab_common.c:980
             [<ffffffff83aa85c3>] kmalloc include/linux/slab.h:584 [inline]
             [<ffffffff83aa85c3>] tcf_pedit_init+0x793/0x1ae0 net/sched/act_pedit.c:245
             [<ffffffff83a90623>] tcf_action_init_1+0x453/0x6e0 net/sched/act_api.c:1394
             [<ffffffff83a90e58>] tcf_action_init+0x5a8/0x950 net/sched/act_api.c:1459
             [<ffffffff83a96258>] tcf_action_add+0x118/0x4e0 net/sched/act_api.c:1985
             [<ffffffff83a96997>] tc_ctl_action+0x377/0x490 net/sched/act_api.c:2044
             [<ffffffff83920a8d>] rtnetlink_rcv_msg+0x46d/0xd70 net/core/rtnetlink.c:6395
             [<ffffffff83b24305>] netlink_rcv_skb+0x185/0x490 net/netlink/af_netlink.c:2575
             [<ffffffff83901806>] rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6413
             [<ffffffff83b21cae>] netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
             [<ffffffff83b21cae>] netlink_unicast+0x5be/0x8a0 net/netlink/af_netlink.c:1365
             [<ffffffff83b2293f>] netlink_sendmsg+0x9af/0xed0 net/netlink/af_netlink.c:1942
             [<ffffffff8380c39f>] sock_sendmsg_nosec net/socket.c:724 [inline]
             [<ffffffff8380c39f>] sock_sendmsg net/socket.c:747 [inline]
             [<ffffffff8380c39f>] ____sys_sendmsg+0x3ef/0xaa0 net/socket.c:2503
             [<ffffffff838156d2>] ___sys_sendmsg+0x122/0x1c0 net/socket.c:2557
             [<ffffffff8381594f>] __sys_sendmsg+0x11f/0x200 net/socket.c:2586
             [<ffffffff83815ab0>] __do_sys_sendmsg net/socket.c:2595 [inline]
             [<ffffffff83815ab0>] __se_sys_sendmsg net/socket.c:2593 [inline]
             [<ffffffff83815ab0>] __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2593
      
      The recently added static offset check missed a free to the key buffer when
      bailing out on error.
      
      Fixes: e1201bc7 ("net/sched: act_pedit: check static offsets a priori")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20230425144725.669262-1-pctammela@mojatatu.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1b483d9f
    • Ivan Vecera's avatar
      net/sched: flower: Fix wrong handle assignment during filter change · 32eff6ba
      Ivan Vecera authored
      Commit 08a0063d ("net/sched: flower: Move filter handle initialization
      earlier") moved filter handle initialization but an assignment of
      the handle to fnew->handle is done regardless of fold value. This is wrong
      because if fold != NULL (so fold->handle == handle) no new handle is
      allocated and passed handle is assigned to fnew->handle. Then if any
      subsequent action in fl_change() fails then the handle value is
      removed from IDR that is incorrect as we will have still valid old filter
      instance with handle that is not present in IDR.
      Fix this issue by moving the assignment so it is done only when passed
      fold == NULL.
      
      Prior the patch:
      [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
      Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
      exit: 123
      exit: 0
      RTNETLINK answers: Invalid argument
      We have an error talking to the kernel
      Command failed tmp/replace_6:1885
      
      All test results:
      
      1..1
      not ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
              Command exited with 123, expected 0
      RTNETLINK answers: Invalid argument
      We have an error talking to the kernel
      Command failed tmp/replace_6:1885
      
      After the patch:
      [root@machine tc-testing]# ./tdc.py -d enp1s0f0np0 -e 14be
      Test 14be: Concurrently replace same range of 100k flower filters from 10 tc instances
      
      All test results:
      
      1..1
      ok 1 14be - Concurrently replace same range of 100k flower filters from 10 tc instances
      
      Fixes: 08a0063d ("net/sched: flower: Move filter handle initialization earlier")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230425140604.169881-1-ivecera@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      32eff6ba
    • David Howells's avatar
      rxrpc: Fix potential data race in rxrpc_wait_to_be_connected() · 2b5fdc0f
      David Howells authored
      Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to
      see if it should exit the loop without first checking the call state.  This
      is probably safe as if call->error is set, the call is dead anyway, but we
      should probably wait for the call state to have been set to completion
      first, lest it cause surprise on the way out.
      
      Fix this by only accessing call->error if the call is complete.  We don't
      actually need to access the error inside the loop as we'll do that after.
      
      This caused the following report:
      
          BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion
      
          write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1:
           rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22
           rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479
           rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714
           rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline]
           rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350
           rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464
           rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461
           kthread+0x1ac/0x1e0 kernel/kthread.c:376
           ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
          read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0:
           rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296
           rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726
           rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565
           sock_sendmsg_nosec net/socket.c:724 [inline]
           sock_sendmsg net/socket.c:747 [inline]
           ____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
           ___sys_sendmsg net/socket.c:2555 [inline]
           __sys_sendmmsg+0x263/0x500 net/socket.c:2641
           __do_sys_sendmmsg net/socket.c:2670 [inline]
           __se_sys_sendmmsg net/socket.c:2667 [inline]
           __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667
           do_syscall_x64 arch/x86/entry/common.c:50 [inline]
           do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
           entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
          value changed: 0x00000000 -> 0xffffffea
      
      Fixes: 9d35d880 ("rxrpc: Move client call connection to the I/O thread")
      Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: Dmitry Vyukov <dvyukov@google.com>
      cc: "David S. Miller" <davem@davemloft.net>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Jakub Kicinski <kuba@kernel.org>
      cc: Paolo Abeni <pabeni@redhat.com>
      cc: linux-afs@lists.infradead.org
      cc: linux-fsdevel@vger.kernel.org
      cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.ukSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2b5fdc0f
  5. 26 Apr, 2023 1 commit
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 6e98b09d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
           default value allows for better BIG TCP performances
      
         - Reduce compound page head access for zero-copy data transfers
      
         - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
           possible
      
         - Threaded NAPI improvements, adding defer skb free support and
           unneeded softirq avoidance
      
         - Address dst_entry reference count scalability issues, via false
           sharing avoidance and optimize refcount tracking
      
         - Add lockless accesses annotation to sk_err[_soft]
      
         - Optimize again the skb struct layout
      
         - Extends the skb drop reasons to make it usable by multiple
           subsystems
      
         - Better const qualifier awareness for socket casts
      
        BPF:
      
         - Add skb and XDP typed dynptrs which allow BPF programs for more
           ergonomic and less brittle iteration through data and
           variable-sized accesses
      
         - Add a new BPF netfilter program type and minimal support to hook
           BPF programs to netfilter hooks such as prerouting or forward
      
         - Add more precise memory usage reporting for all BPF map types
      
         - Adds support for using {FOU,GUE} encap with an ipip device
           operating in collect_md mode and add a set of BPF kfuncs for
           controlling encap params
      
         - Allow BPF programs to detect at load time whether a particular
           kfunc exists or not, and also add support for this in light
           skeleton
      
         - Bigger batch of BPF verifier improvements to prepare for upcoming
           BPF open-coded iterators allowing for less restrictive looping
           capabilities
      
         - Rework RCU enforcement in the verifier, add kptr_rcu and enforce
           BPF programs to NULL-check before passing such pointers into kfunc
      
         - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
           in local storage maps
      
         - Enable RCU semantics for task BPF kptrs and allow referenced kptr
           tasks to be stored in BPF maps
      
         - Add support for refcounted local kptrs to the verifier for allowing
           shared ownership, useful for adding a node to both the BPF list and
           rbtree
      
         - Add BPF verifier support for ST instructions in
           convert_ctx_access() which will help new -mcpu=v4 clang flag to
           start emitting them
      
         - Add ARM32 USDT support to libbpf
      
         - Improve bpftool's visual program dump which produces the control
           flow graph in a DOT format by adding C source inline annotations
      
        Protocols:
      
         - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
           indicates the provenance of the IP address
      
         - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
      
         - Add the handshake upcall mechanism, allowing the user-space to
           implement generic TLS handshake on kernel's behalf
      
         - Bridge: support per-{Port, VLAN} neighbor suppression, increasing
           resilience to nodes failures
      
         - SCTP: add support for Fair Capacity and Weighted Fair Queueing
           schedulers
      
         - MPTCP: delay first subflow allocation up to its first usage. This
           will allow for later better LSM interaction
      
         - xfrm: Remove inner/outer modes from input/output path. These are
           not needed anymore
      
         - WiFi:
            - reduced neighbor report (RNR) handling for AP mode
            - HW timestamping support
            - support for randomized auth/deauth TA for PASN privacy
            - per-link debugfs for multi-link
            - TC offload support for mac80211 drivers
            - mac80211 mesh fast-xmit and fast-rx support
            - enable Wi-Fi 7 (EHT) mesh support
      
        Netfilter:
      
         - Add nf_tables 'brouting' support, to force a packet to be routed
           instead of being bridged
      
         - Update bridge netfilter and ovs conntrack helpers to handle IPv6
           Jumbo packets properly, i.e. fetch the packet length from
           hop-by-hop extension header. This is needed for BIT TCP support
      
         - The iptables 32bit compat interface isn't compiled in by default
           anymore
      
         - Move ip(6)tables builtin icmp matches to the udptcp one. This has
           the advantage that icmp/icmpv6 match doesn't load the
           iptables/ip6tables modules anymore when iptables-nft is used
      
         - Extended netlink error report for netdevice in flowtables and
           netdev/chains. Allow for incrementally add/delete devices to netdev
           basechain. Allow to create netdev chain without device
      
        Driver API:
      
         - Remove redundant Device Control Error Reporting Enable, as PCI core
           has already error reporting enabled at enumeration time
      
         - Move Multicast DB netlink handlers to core, allowing devices other
           then bridge to use them
      
         - Allow the page_pool to directly recycle the pages from safely
           localized NAPI
      
         - Implement lockless TX queue stop/wake combo macros, allowing for
           further code de-duplication and sanitization
      
         - Add YNL support for user headers and struct attrs
      
         - Add partial YNL specification for devlink
      
         - Add partial YNL specification for ethtool
      
         - Add tc-mqprio and tc-taprio support for preemptible traffic classes
      
         - Add tx push buf len param to ethtool, specifies the maximum number
           of bytes of a transmitted packet a driver can push directly to the
           underlying device
      
         - Add basic LED support for switch/phy
      
         - Add NAPI documentation, stop relaying on external links
      
         - Convert dsa_master_ioctl() to netdev notifier. This is a
           preparatory work to make the hardware timestamping layer selectable
           by user space
      
         - Add transceiver support and improve the error messages for CAN-FD
           controllers
      
        New hardware / drivers:
      
         - Ethernet:
            - AMD/Pensando core device support
            - MediaTek MT7981 SoC
            - MediaTek MT7988 SoC
            - Broadcom BCM53134 embedded switch
            - Texas Instruments CPSW9G ethernet switch
            - Qualcomm EMAC3 DWMAC ethernet
            - StarFive JH7110 SoC
            - NXP CBTX ethernet PHY
      
         - WiFi:
            - Apple M1 Pro/Max devices
            - RealTek rtl8710bu/rtl8188gu
            - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
      
         - Bluetooth:
            - Realtek RTL8821CS, RTL8851B, RTL8852BS
            - Mediatek MT7663, MT7922
            - NXP w8997
            - Actions Semi ATS2851
            - QTI WCN6855
            - Marvell 88W8997
      
         - Can:
            - STMicroelectronics bxcan stm32f429
      
        Drivers:
      
         - Ethernet NICs:
            - Intel (1G, icg):
               - add tracking and reporting of QBV config errors
               - add support for configuring max SDU for each Tx queue
            - Intel (100G, ice):
               - refactor mailbox overflow detection to support Scalable IOV
               - GNSS interface optimization
            - Intel (i40e):
               - support XDP multi-buffer
            - nVidia/Mellanox:
               - add the support for linux bridge multicast offload
               - enable TC offload for egress and engress MACVLAN over bond
               - add support for VxLAN GBP encap/decap flows offload
               - extend packet offload to fully support libreswan
               - support tunnel mode in mlx5 IPsec packet offload
               - extend XDP multi-buffer support
               - support MACsec VLAN offload
               - add support for dynamic msix vectors allocation
               - drop RX page_cache and fully use page_pool
               - implement thermal zone to report NIC temperature
            - Netronome/Corigine:
               - add support for multi-zone conntrack offload
            - Solarflare/Xilinx:
               - support offloading TC VLAN push/pop actions to the MAE
               - support TC decap rules
               - support unicast PTP
      
         - Other NICs:
            - Broadcom (bnxt): enforce software based freq adjustments only on
              shared PHC NIC
            - RealTek (r8169): refactor to addess ASPM issues during NAPI poll
            - Micrel (lan8841): add support for PTP_PF_PEROUT
            - Cadence (macb): enable PTP unicast
            - Engleder (tsnep): add XDP socket zero-copy support
            - virtio-net: implement exact header length guest feature
            - veth: add page_pool support for page recycling
            - vxlan: add MDB data path support
            - gve: add XDP support for GQI-QPL format
            - geneve: accept every ethertype
            - macvlan: allow some packets to bypass broadcast queue
            - mana: add support for jumbo frame
      
         - Ethernet high-speed switches:
            - Microchip (sparx5): Add support for TC flower templates
      
         - Ethernet embedded switches:
            - Broadcom (b54):
               - configure 6318 and 63268 RGMII ports
            - Marvell (mv88e6xxx):
               - faster C45 bus scan
            - Microchip:
               - lan966x:
                  - add support for IS1 VCAP
                  - better TX/RX from/to CPU performances
               - ksz9477: add ETS Qdisc support
               - ksz8: enhance static MAC table operations and error handling
               - sama7g5: add PTP capability
            - NXP (ocelot):
               - add support for external ports
               - add support for preemptible traffic classes
            - Texas Instruments:
               - add CPSWxG SGMII support for J7200 and J721E
      
         - Intel WiFi (iwlwifi):
            - preparation for Wi-Fi 7 EHT and multi-link support
            - EHT (Wi-Fi 7) sniffer support
            - hardware timestamping support for some devices/firwmares
            - TX beacon protection on newer hardware
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - MU-MIMO parameters support
            - ack signal support for management packets
      
         - RealTek WiFi (rtw88):
            - SDIO bus support
            - better support for some SDIO devices (e.g. MAC address from
              efuse)
      
         - RealTek WiFi (rtw89):
            - HW scan support for 8852b
            - better support for 6 GHz scanning
            - support for various newer firmware APIs
            - framework firmware backwards compatibility
      
         - MediaTek WiFi (mt76):
            - P2P support
            - mesh A-MSDU support
            - EHT (Wi-Fi 7) support
            - coredump support"
      
      * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
        net: phy: hide the PHYLIB_LEDS knob
        net: phy: marvell-88x2222: remove unnecessary (void*) conversions
        tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
        net: amd: Fix link leak when verifying config failed
        net: phy: marvell: Fix inconsistent indenting in led_blink_set
        lan966x: Don't use xdp_frame when action is XDP_TX
        tsnep: Add XDP socket zero-copy TX support
        tsnep: Add XDP socket zero-copy RX support
        tsnep: Move skb receive action to separate function
        tsnep: Add functions for queue enable/disable
        tsnep: Rework TX/RX queue initialization
        tsnep: Replace modulo operation with mask
        net: phy: dp83867: Add led_brightness_set support
        net: phy: Fix reading LED reg property
        drivers: nfc: nfcsim: remove return value check of `dev_dir`
        net: phy: dp83867: Remove unnecessary (void*) conversions
        net: ethtool: coalesce: try to make user settings stick twice
        net: mana: Check if netdev/napi_alloc_frag returns single page
        net: mana: Rename mana_refill_rxoob and remove some empty lines
        net: veth: add page_pool stats
        ...
      6e98b09d