1. 20 Mar, 2024 7 commits
    • Subbaraya Sundeep's avatar
      octeontx2-pf: Use default max_active works instead of one · 7558ce0d
      Subbaraya Sundeep authored
      Only one execution context for the workqueue used for PF and
      VFs mailbox communication is incorrect since multiple works are
      queued simultaneously by all the VFs and PF link UP messages.
      Hence use default number of execution contexts by passing zero
      as max_active to alloc_workqueue function. With this fix in place,
      modify UP messages also to wait until completion.
      
      Fixes: d424b6c0 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7558ce0d
    • Subbaraya Sundeep's avatar
      octeontx2-pf: Wait till detach_resources msg is complete · cbf2f249
      Subbaraya Sundeep authored
      During VF driver remove, a message is sent to detach VF
      resources to PF but VF is not waiting until message is
      complete. Also mailbox interrupts need to be turned off
      after the detach resource message is complete. This patch
      fixes that problem.
      
      Fixes: 05fcc9e0 ("octeontx2-pf: Attach NIX and NPA block LFs")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbf2f249
    • Subbaraya Sundeep's avatar
      octeontx2: Detect the mbox up or down message via register · a88e0f93
      Subbaraya Sundeep authored
      A single line of interrupt is used to receive up notifications
      and down reply messages from AF to PF (similarly from PF to its VF).
      PF acts as bridge and forwards VF messages to AF and sends respsones
      back from AF to VF. When an async event like link event is received
      by up message when PF is in middle of forwarding VF message then
      mailbox errors occur because PF state machine is corrupted.
      Since VF is a separate driver or VF driver can be in a VM it is
      not possible to serialize from the start of communication at VF.
      Hence to differentiate between type of messages at PF this patch makes
      sender to set mbox data register with distinct values for up and down
      messages. Sender also checks whether previous interrupt is received
      before triggering current interrupt by waiting for mailbox data register
      to become zero.
      
      Fixes: 5a6d7c9d ("octeontx2-pf: Mailbox communication with AF")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a88e0f93
    • Jakub Kicinski's avatar
      Merge tag 'ipsec-2024-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 94e3ca2f
      Jakub Kicinski authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2024-03-19
      
      1) Fix possible page_pool leak triggered by esp_output.
         From Dragos Tatulea.
      
      2) Fix UDP encapsulation in software GSO path.
         From Leon Romanovsky.
      
      * tag 'ipsec-2024-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
        xfrm: Allow UDP encapsulation only in offload modes
        net: esp: fix bad handling of pages from page_pool
      ====================
      
      Link: https://lore.kernel.org/r/20240319110151.409825-1-steffen.klassert@secunet.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      94e3ca2f
    • Jiri Pirko's avatar
      devlink: fix port new reply cmd type · 78a2f5e6
      Jiri Pirko authored
      Due to a c&p error, port new reply fills-up cmd with wrong value,
      any other existing port command replies and notifications.
      
      Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.
      
      Skimmed through devlink userspace implementations, none of them cares
      about this cmd value.
      Reported-by: default avatarChenyuan Yang <chenyuan0y@gmail.com>
      Closes: https://lore.kernel.org/all/ZfZcDxGV3tSy4qsV@cy-server/
      Fixes: cd76dcd6 ("devlink: Support add and delete devlink port")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/20240318091908.2736542-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78a2f5e6
    • Kuniyuki Iwashima's avatar
      tcp: Clear req->syncookie in reqsk_alloc(). · 956c0d61
      Kuniyuki Iwashima authored
      syzkaller reported a read of uninit req->syncookie. [0]
      
      Originally, req->syncookie was used only in tcp_conn_request()
      to indicate if we need to encode SYN cookie in SYN+ACK, so the
      field remains uninitialised in other places.
      
      The commit 695751e3 ("bpf: tcp: Handle BPF SYN Cookie in
      cookie_v[46]_check().") added another meaning in ACK path;
      req->syncookie is set true if SYN cookie is validated by BPF
      kfunc.
      
      After the change, cookie_v[46]_check() always read req->syncookie,
      but it is not initialised in the normal SYN cookie case as reported
      by KMSAN.
      
      Let's make sure we always initialise req->syncookie in reqsk_alloc().
      
      [0]:
      BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
       net/ipv4/syncookies.c:477
       cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
       tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
       tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
       tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
       ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:314 [inline]
       ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:460 [inline]
       ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
       NF_HOOK include/linux/netfilter.h:314 [inline]
       ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
       __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
       __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
       process_backlog+0x480/0x8b0 net/core/dev.c:5981
       __napi_poll+0xe7/0x980 net/core/dev.c:6632
       napi_poll net/core/dev.c:6701 [inline]
       net_rx_action+0x89d/0x1820 net/core/dev.c:6813
       __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
       do_softirq+0x9a/0x100 kernel/softirq.c:455
       __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
       local_bh_enable include/linux/bottom_half.h:33 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
       __dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
       dev_queue_xmit include/linux/netdevice.h:3091 [inline]
       neigh_hh_output include/net/neighbour.h:526 [inline]
       neigh_output include/net/neighbour.h:540 [inline]
       ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
       __ip_finish_output+0x287/0x810
       ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
       NF_HOOK_COND include/linux/netfilter.h:303 [inline]
       ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
       dst_output include/net/dst.h:450 [inline]
       ip_local_out net/ipv4/ip_output.c:129 [inline]
       __ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
       ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
       __tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
       tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
       tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
       __tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
       tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
       tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
       inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
       __sys_shutdown_sock net/socket.c:2425 [inline]
       __sys_shutdown net/socket.c:2437 [inline]
       __do_sys_shutdown net/socket.c:2445 [inline]
       __se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
       __x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
       do_syscall_64+0xd5/0x1f0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Uninit was stored to memory at:
       reqsk_alloc include/net/request_sock.h:148 [inline]
       inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
       cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
       cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
       cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
       tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
       tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
       tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
       ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:314 [inline]
       ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:460 [inline]
       ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
       NF_HOOK include/linux/netfilter.h:314 [inline]
       ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
       __netif_receive_skb_one_core net/core/dev.c:5538 [inline]
       __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
       process_backlog+0x480/0x8b0 net/core/dev.c:5981
       __napi_poll+0xe7/0x980 net/core/dev.c:6632
       napi_poll net/core/dev.c:6701 [inline]
       net_rx_action+0x89d/0x1820 net/core/dev.c:6813
       __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
      
      Uninit was created at:
       __alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
       __alloc_pages_node include/linux/gfp.h:238 [inline]
       alloc_pages_node include/linux/gfp.h:261 [inline]
       alloc_slab_page mm/slub.c:2175 [inline]
       allocate_slab mm/slub.c:2338 [inline]
       new_slab+0x2de/0x1400 mm/slub.c:2391
       ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
       __slab_alloc mm/slub.c:3610 [inline]
       __slab_alloc_node mm/slub.c:3663 [inline]
       slab_alloc_node mm/slub.c:3835 [inline]
       kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
       reqsk_alloc include/net/request_sock.h:131 [inline]
       inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
       tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
       tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
       tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
       tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
       tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
       ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:314 [inline]
       ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
       ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
       ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
       __netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
       __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
       __netif_receive_skb_list net/core/dev.c:5681 [inline]
       netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
       gro_normal_list include/net/gro.h:438 [inline]
       napi_complete_done+0x425/0x880 net/core/dev.c:6113
       virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
       virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
       __napi_poll+0xe7/0x980 net/core/dev.c:6632
       napi_poll net/core/dev.c:6701 [inline]
       net_rx_action+0x89d/0x1820 net/core/dev.c:6813
       __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
      
      CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted 6.8.0-syzkaller-05562-g61387b8d #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
      
      Fixes: 695751e3 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Closes: https://lore.kernel.org/bpf/CANn89iKdN9c+C_2JAUbc+VY3DDQjAQukMtiBbormAmAk9CdvQA@mail.gmail.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://lore.kernel.org/r/20240315224710.55209-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      956c0d61
    • Thinh Tran's avatar
      net/bnx2x: Prevent access to a freed page in page_pool · d27e2da9
      Thinh Tran authored
      Fix race condition leading to system crash during EEH error handling
      
      During EEH error recovery, the bnx2x driver's transmit timeout logic
      could cause a race condition when handling reset tasks. The
      bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(),
      which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload()
      SGEs are freed using bnx2x_free_rx_sge_range(). However, this could
      overlap with the EEH driver's attempt to reset the device using
      bnx2x_io_slot_reset(), which also tries to free SGEs. This race
      condition can result in system crashes due to accessing freed memory
      locations in bnx2x_free_rx_sge()
      
      799  static inline void bnx2x_free_rx_sge(struct bnx2x *bp,
      800				struct bnx2x_fastpath *fp, u16 index)
      801  {
      802	struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
      803     struct page *page = sw_buf->page;
      ....
      where sw_buf was set to NULL after the call to dma_unmap_page()
      by the preceding thread.
      
          EEH: Beginning: 'slot_reset'
          PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
          bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing...
          bnx2x 0011:01:00.0: enabling device (0140 -> 0142)
          bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload
          Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
          BUG: Kernel NULL pointer dereference on read at 0x00000000
          Faulting instruction address: 0xc0080000025065fc
          Oops: Kernel access of bad area, sig: 11 [#1]
          .....
          Call Trace:
          [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable)
          [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0
          [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550
          [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60
          [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170
          [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0
          [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
      
      To solve this issue, we need to verify page pool allocations before
      freeing.
      
      Fixes: 4cace675 ("bnx2x: Alloc 4k fragment for each rx ring buffer element")
      Signed-off-by: default avatarThinh Tran <thinhtr@linux.ibm.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240315205535.1321-1-thinhtr@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d27e2da9
  2. 19 Mar, 2024 14 commits
  3. 18 Mar, 2024 9 commits
    • Abhishek Chauhan's avatar
      Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets" · 35c3e279
      Abhishek Chauhan authored
      This reverts commit 885c36e5.
      
      The patch currently broke the bpf selftest test_tc_dtime because
      uapi field __sk_buff->tstamp_type depends on skb->mono_delivery_time which
      does not necessarily mean mono with the original fix as the bit was re-used
      for userspace timestamp as well to avoid tstamp reset in the forwarding
      path. To solve this we need to keep mono_delivery_time as is and
      introduce another bit called user_delivery_time and fall back to the
      initial proposal of setting the user_delivery_time bit based on
      sk_clockid set from userspace.
      
      Fixes: 885c36e5 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets")
      Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/Signed-off-by: default avatarAbhishek Chauhan <quic_abchauha@quicinc.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35c3e279
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: prevent possible incorrect XTAL frequency selection · f490c492
      Arınç ÜNAL authored
      On MT7530, the HT_XTAL_FSEL field of the HWTRAP register stores a 2-bit
      value that represents the frequency of the crystal oscillator connected to
      the switch IC. The field is populated by the state of the ESW_P4_LED_0 and
      ESW_P4_LED_0 pins, which is done right after reset is deasserted.
      
        ESW_P4_LED_0    ESW_P3_LED_0    Frequency
        -----------------------------------------
        0               0               Reserved
        0               1               20MHz
        1               0               40MHz
        1               1               25MHz
      
      On MT7531, the XTAL25 bit of the STRAP register stores this. The LAN0LED0
      pin is used to populate the bit. 25MHz when the pin is high, 40MHz when
      it's low.
      
      These pins are also used with LEDs, therefore, their state can be set to
      something other than the bootstrapping configuration. For example, a link
      may be established on port 3 before the DSA subdriver takes control of the
      switch which would set ESW_P3_LED_0 to high.
      
      Currently on mt7530_setup() and mt7531_setup(), 1000 - 1100 usec delay is
      described between reset assertion and deassertion. Some switch ICs in real
      life conditions cannot always have these pins set back to the bootstrapping
      configuration before reset deassertion in this amount of delay. This causes
      wrong crystal frequency to be selected which puts the switch in a
      nonfunctional state after reset deassertion.
      
      The tests below are conducted on an MT7530 with a 40MHz crystal oscillator
      by Justin Swartz.
      
      With a cable from an active peer connected to port 3 before reset, an
      incorrect crystal frequency (0b11 = 25MHz) is selected:
      
                            [1]                  [3]     [5]
                            :                    :       :
                    _____________________________         __________________
      ESW_P4_LED_0                               |_______|
                    _____________________________
      ESW_P3_LED_0                               |__________________________
      
                             :                  : :     :
                             :                  : [4]...:
                             :                  :
                             [2]................:
      
      [1] Reset is asserted.
      [2] Period of 1000 - 1100 usec.
      [3] Reset is deasserted.
      [4] Period of 315 usec. HWTRAP register is populated with incorrect
          XTAL frequency.
      [5] Signals reflect the bootstrapped configuration.
      
      Increase the delay between reset_control_assert() and
      reset_control_deassert(), and gpiod_set_value_cansleep(priv->reset, 0) and
      gpiod_set_value_cansleep(priv->reset, 1) to 5000 - 5100 usec. This amount
      ensures a higher possibility that the switch IC will have these pins back
      to the bootstrapping configuration before reset deassertion.
      
      With a cable from an active peer connected to port 3 before reset, the
      correct crystal frequency (0b10 = 40MHz) is selected:
      
                            [1]        [2-1]     [3]     [5]
                            :          :         :       :
                    _____________________________         __________________
      ESW_P4_LED_0                               |_______|
                    ___________________           _______
      ESW_P3_LED_0                     |_________|       |__________________
      
                             :          :       : :     :
                             :          [2-2]...: [4]...:
                             [2]................:
      
      [1] Reset is asserted.
      [2] Period of 5000 - 5100 usec.
      [2-1] ESW_P3_LED_0 goes low.
      [2-2] Remaining period of 5000 - 5100 usec.
      [3] Reset is deasserted.
      [4] Period of 310 usec. HWTRAP register is populated with bootstrapped
          XTAL frequency.
      [5] Signals reflect the bootstrapped configuration.
      
      ESW_P3_LED_0 low period before reset deassertion:
      
                    5000 usec
                  - 5100 usec
          TEST     RESET HOLD
             #         (usec)
        ---------------------
             1           5410
             2           5440
             3           4375
             4           5490
             5           5475
             6           4335
             7           4370
             8           5435
             9           4205
            10           4335
            11           3750
            12           3170
            13           4395
            14           4375
            15           3515
            16           4335
            17           4220
            18           4175
            19           4175
            20           4350
      
           Min           3170
           Max           5490
      
        Median       4342.500
           Avg       4466.500
      
      Revert commit 2920dd92 ("net: dsa: mt7530: disable LEDs before reset").
      Changing the state of pins via reset assertion is simpler and more
      efficient than doing so by setting the LED controller off.
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Fixes: c288575f ("net: dsa: mt7530: Add the support of MT7531 switch")
      Co-developed-by: default avatarJustin Swartz <justin.swartz@risingedge.co.za>
      Signed-off-by: default avatarJustin Swartz <justin.swartz@risingedge.co.za>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f490c492
    • David S. Miller's avatar
      Merge branch 'veth-xdp-gro' · ba77f6e2
      David S. Miller authored
      Ignat Korchagin says:
      
      ====================
      net: veth: ability to toggle GRO and XDP independently
      
      It is rather confusing that GRO is automatically enabled, when an XDP program
      is attached to a veth interface. Moreover, it is not possible to disable GRO
      on a veth, if an XDP program is attached (which might be desirable in some use
      cases).
      
      Make GRO and XDP independent for a veth interface.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba77f6e2
    • Ignat Korchagin's avatar
      selftests: net: veth: test the ability to independently manipulate GRO and XDP · ba5a6476
      Ignat Korchagin authored
      We should be able to independently flip either XDP or GRO states and toggling
      one should not affect the other.
      
      Adjust other tests as well that had implicit expectation that GRO would be
      automatically enabled.
      Signed-off-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba5a6476
    • Ignat Korchagin's avatar
      net: veth: do not manipulate GRO when using XDP · d7db7775
      Ignat Korchagin authored
      Commit d3256efd ("veth: allow enabling NAPI even without XDP") tried to fix
      the fact that GRO was not possible without XDP, because veth did not use NAPI
      without XDP. However, it also introduced the behaviour that GRO is always
      enabled, when XDP is enabled.
      
      While it might be desired for most cases, it is confusing for the user at best
      as the GRO flag suddenly changes, when an XDP program is attached. It also
      introduces some complexities in state management as was partially addressed in
      commit fe9f8013 ("net: veth: clear GRO when clearing XDP even when down").
      
      But the biggest problem is that it is not possible to disable GRO at all, when
      an XDP program is attached, which might be needed for some use cases.
      
      Fix this by not touching the GRO flag on XDP enable/disable as the code already
      supports switching to NAPI if either GRO or XDP is requested.
      
      Link: https://lore.kernel.org/lkml/20240311124015.38106-1-ignat@cloudflare.com/
      Fixes: d3256efd ("veth: allow enabling NAPI even without XDP")
      Fixes: fe9f8013 ("net: veth: clear GRO when clearing XDP even when down")
      Signed-off-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7db7775
    • Leon Romanovsky's avatar
      xfrm: Allow UDP encapsulation only in offload modes · 773bb766
      Leon Romanovsky authored
      The missing check of x->encap caused to the situation where GSO packets
      were created with UDP encapsulation.
      
      As a solution return the encap check for non-offloaded SA.
      
      Fixes: 983a73da ("xfrm: Pass UDP encapsulation in TX packet offload")
      Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.comReported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      773bb766
    • Dragos Tatulea's avatar
      net: esp: fix bad handling of pages from page_pool · c3198822
      Dragos Tatulea authored
      When the skb is reorganized during esp_output (!esp->inline), the pages
      coming from the original skb fragments are supposed to be released back
      to the system through put_page. But if the skb fragment pages are
      originating from a page_pool, calling put_page on them will trigger a
      page_pool leak which will eventually result in a crash.
      
      This leak can be easily observed when using CONFIG_DEBUG_VM and doing
      ipsec + gre (non offloaded) forwarding:
      
        BUG: Bad page state in process ksoftirqd/16  pfn:1451b6
        page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6
        flags: 0x200000000000000(node=0|zone=2)
        page_type: 0xffffffff()
        raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000
        raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000
        page dumped because: page_pool leak
        Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core]
        CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         <TASK>
         dump_stack_lvl+0x36/0x50
         bad_page+0x70/0xf0
         free_unref_page_prepare+0x27a/0x460
         free_unref_page+0x38/0x120
         esp_ssg_unref.isra.0+0x15f/0x200
         esp_output_tail+0x66d/0x780
         esp_xmit+0x2c5/0x360
         validate_xmit_xfrm+0x313/0x370
         ? validate_xmit_skb+0x1d/0x330
         validate_xmit_skb_list+0x4c/0x70
         sch_direct_xmit+0x23e/0x350
         __dev_queue_xmit+0x337/0xba0
         ? nf_hook_slow+0x3f/0xd0
         ip_finish_output2+0x25e/0x580
         iptunnel_xmit+0x19b/0x240
         ip_tunnel_xmit+0x5fb/0xb60
         ipgre_xmit+0x14d/0x280 [ip_gre]
         dev_hard_start_xmit+0xc3/0x1c0
         __dev_queue_xmit+0x208/0xba0
         ? nf_hook_slow+0x3f/0xd0
         ip_finish_output2+0x1ca/0x580
         ip_sublist_rcv_finish+0x32/0x40
         ip_sublist_rcv+0x1b2/0x1f0
         ? ip_rcv_finish_core.constprop.0+0x460/0x460
         ip_list_rcv+0x103/0x130
         __netif_receive_skb_list_core+0x181/0x1e0
         netif_receive_skb_list_internal+0x1b3/0x2c0
         napi_gro_receive+0xc8/0x200
         gro_cell_poll+0x52/0x90
         __napi_poll+0x25/0x1a0
         net_rx_action+0x28e/0x300
         __do_softirq+0xc3/0x276
         ? sort_range+0x20/0x20
         run_ksoftirqd+0x1e/0x30
         smpboot_thread_fn+0xa6/0x130
         kthread+0xcd/0x100
         ? kthread_complete_and_exit+0x20/0x20
         ret_from_fork+0x31/0x50
         ? kthread_complete_and_exit+0x20/0x20
         ret_from_fork_asm+0x11/0x20
         </TASK>
      
      The suggested fix is to introduce a new wrapper (skb_page_unref) that
      covers page refcounting for page_pool pages as well.
      
      Cc: stable@vger.kernel.org
      Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling")
      Reported-and-tested-by: default avatarAnatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz>
      Reported-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.comSigned-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarMina Almasry <almasrymina@google.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      c3198822
    • Eric Dumazet's avatar
      packet: annotate data-races around ignore_outgoing · 6ebfad33
      Eric Dumazet authored
      ignore_outgoing is read locklessly from dev_queue_xmit_nit()
      and packet_getsockopt()
      
      Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
      
      syzbot reported:
      
      BUG: KCSAN: data-race in dev_queue_xmit_nit / packet_setsockopt
      
      write to 0xffff888107804542 of 1 bytes by task 22618 on cpu 0:
       packet_setsockopt+0xd83/0xfd0 net/packet/af_packet.c:4003
       do_sock_setsockopt net/socket.c:2311 [inline]
       __sys_setsockopt+0x1d8/0x250 net/socket.c:2334
       __do_sys_setsockopt net/socket.c:2343 [inline]
       __se_sys_setsockopt net/socket.c:2340 [inline]
       __x64_sys_setsockopt+0x66/0x80 net/socket.c:2340
       do_syscall_64+0xd3/0x1d0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      read to 0xffff888107804542 of 1 bytes by task 27 on cpu 1:
       dev_queue_xmit_nit+0x82/0x620 net/core/dev.c:2248
       xmit_one net/core/dev.c:3527 [inline]
       dev_hard_start_xmit+0xcc/0x3f0 net/core/dev.c:3547
       __dev_queue_xmit+0xf24/0x1dd0 net/core/dev.c:4335
       dev_queue_xmit include/linux/netdevice.h:3091 [inline]
       batadv_send_skb_packet+0x264/0x300 net/batman-adv/send.c:108
       batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127
       batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:392 [inline]
       batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline]
       batadv_iv_send_outstanding_bat_ogm_packet+0x3f0/0x4b0 net/batman-adv/bat_iv_ogm.c:1700
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0x465/0x990 kernel/workqueue.c:3335
       worker_thread+0x526/0x730 kernel/workqueue.c:3416
       kthread+0x1d1/0x210 kernel/kthread.c:388
       ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 27 Comm: kworker/u8:1 Tainted: G        W          6.8.0-syzkaller-08073-g480e035f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
      Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
      
      Fixes: fa788d98 ("packet: add sockopt to ignore outgoing packets")
      Reported-by: syzbot+c669c1136495a2e7c31f@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/netdev/CANn89i+Z7MfbkBLOv=p7KZ7=K1rKHO4P1OL5LYDCtBiyqsa9oQ@mail.gmail.com/T/#tSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ebfad33
    • Herve Codina's avatar
      net: wan: fsl_qmc_hdlc: Fix module compilation · badc9e33
      Herve Codina authored
      The fsl_qmc_driver does not compile as module:
        error: ‘qmc_hdlc_driver’ undeclared here (not in a function);
          405 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
              |                         ^~~~~~~~~~~~~~~
      
      Fix the typo.
      
      Fixes: b40f00ecd463 ("net: wan: Add support for QMC HDLC")
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Closes: https://lore.kernel.org/linux-kernel/87ttl93f7i.fsf@mail.lhotse/Signed-off-by: default avatarHerve Codina <herve.codina@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      badc9e33
  4. 15 Mar, 2024 2 commits
  5. 14 Mar, 2024 8 commits