1. 03 Sep, 2024 16 commits
    • Larysa Zaremba's avatar
      ice: remove ICE_CFG_BUSY locking from AF_XDP code · 7e3b407c
      Larysa Zaremba authored
      Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing,
      because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF
      state, not VSI one. Therefore it does not protect the queue pair from
      e.g. reset.
      
      Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena().
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7e3b407c
    • Larysa Zaremba's avatar
      ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset · d8c40b9d
      Larysa Zaremba authored
      Consider the following scenario:
      
      .ndo_bpf()		| ice_prepare_for_reset()		|
      ________________________|_______________________________________|
      rtnl_lock()		|					|
      ice_down()		|					|
      			| test_bit(ICE_VSI_DOWN) - true		|
      			| ice_dis_vsi() returns			|
      ice_up()		|					|
      			| proceeds to rebuild a running VSI	|
      
      .ndo_bpf() is not the only rtnl-locked callback that toggles the interface
      to apply new configuration. Another example is .set_channels().
      
      To avoid the race condition above, act only after reading ICE_VSI_DOWN
      under rtnl_lock.
      
      Fixes: 0f9d5027 ("ice: Refactor VSI allocation, deletion and rebuild flow")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d8c40b9d
    • Larysa Zaremba's avatar
      ice: check for XDP rings instead of bpf program when unconfiguring · f50c6876
      Larysa Zaremba authored
      If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
      VSI without applying new ring configuration. When unconfiguring the VSI, we
      can encounter the state in which there is an XDP program but no XDP rings
      to destroy or there will be XDP rings that need to be destroyed, but no XDP
      program to indicate their presence.
      
      When unconfiguring, rely on the presence of XDP rings rather then XDP
      program, as they better represent the current state that has to be
      destroyed.
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f50c6876
    • Larysa Zaremba's avatar
      ice: protect XDP configuration with a mutex · 2504b840
      Larysa Zaremba authored
      The main threat to data consistency in ice_xdp() is a possible asynchronous
      PF reset. It can be triggered by a user or by TX timeout handler.
      
      XDP setup and PF reset code access the same resources in the following
      sections:
      * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
      * ice_vsi_rebuild() for the PF VSI - not protected
      * ice_vsi_open() - already rtnl-locked
      
      With an unfortunate timing, such accesses can result in a crash such as the
      one below:
      
      [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
      [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
      [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
      [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
      [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
      [ +0.394718] ice 0000:b1:00.0: PTP reset successful
      [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
      [ +0.000045] #PF: supervisor read access in kernel mode
      [ +0.000023] #PF: error_code(0x0000) - not-present page
      [ +0.000023] PGD 0 P4D 0
      [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
      [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
      [ +0.000036] Workqueue: ice ice_service_task [ice]
      [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
      [...]
      [ +0.000013] Call Trace:
      [ +0.000016] <TASK>
      [ +0.000014] ? __die+0x1f/0x70
      [ +0.000029] ? page_fault_oops+0x171/0x4f0
      [ +0.000029] ? schedule+0x3b/0xd0
      [ +0.000027] ? exc_page_fault+0x7b/0x180
      [ +0.000022] ? asm_exc_page_fault+0x22/0x30
      [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
      [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
      [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
      [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
      [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
      [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
      [ +0.000145] ice_rebuild+0x18c/0x840 [ice]
      [ +0.000145] ? delay_tsc+0x4a/0xc0
      [ +0.000022] ? delay_tsc+0x92/0xc0
      [ +0.000020] ice_do_reset+0x140/0x180 [ice]
      [ +0.000886] ice_service_task+0x404/0x1030 [ice]
      [ +0.000824] process_one_work+0x171/0x340
      [ +0.000685] worker_thread+0x277/0x3a0
      [ +0.000675] ? preempt_count_add+0x6a/0xa0
      [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
      [ +0.000679] ? __pfx_worker_thread+0x10/0x10
      [ +0.000653] kthread+0xf0/0x120
      [ +0.000635] ? __pfx_kthread+0x10/0x10
      [ +0.000616] ret_from_fork+0x2d/0x50
      [ +0.000612] ? __pfx_kthread+0x10/0x10
      [ +0.000604] ret_from_fork_asm+0x1b/0x30
      [ +0.000604] </TASK>
      
      The previous way of handling this through returning -EBUSY is not viable,
      particularly when destroying AF_XDP socket, because the kernel proceeds
      with removal anyway.
      
      There is plenty of code between those calls and there is no need to create
      a large critical section that covers all of them, same as there is no need
      to protect ice_vsi_rebuild() with rtnl_lock().
      
      Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
      
      Leaving unprotected sections in between would result in two states that
      have to be considered:
      1. when the VSI is closed, but not yet rebuild
      2. when VSI is already rebuild, but not yet open
      
      The latter case is actually already handled through !netif_running() case,
      we just need to adjust flag checking a little. The former one is not as
      trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
      hardware interaction happens, this can make adding/deleting rings exit
      with an error. Luckily, VSI rebuild is pending and can apply new
      configuration for us in a managed fashion.
      
      Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
      indicate that ice_xdp() can just hot-swap the program.
      
      Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
      consistent by deconfiguring VSI when coalesce allocation fails.
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Fixes: efc2214b ("ice: Add support for XDP")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarChandan Kumar Rout <chandanx.rout@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2504b840
    • Larysa Zaremba's avatar
      ice: move netif_queue_set_napi to rtnl-protected sections · 2a5dc090
      Larysa Zaremba authored
      Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
      not rtnl-locked when called from the reset. This creates the need to take
      the rtnl_lock just for a single function and complicates the
      synchronization with .ndo_bpf. At the same time, there no actual need to
      fill napi-to-queue information at this exact point.
      
      Fill napi-to-queue information when opening the VSI and clear it when the
      VSI is being closed. Those routines are already rtnl-locked.
      
      Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
      XDP queues, as this leads to out-of-bounds writes, such as one below.
      
      [  +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
      [  +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
      [  +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
      [  +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
      [  +0.000003] Call Trace:
      [  +0.000003]  <TASK>
      [  +0.000002]  dump_stack_lvl+0x60/0x80
      [  +0.000007]  print_report+0xce/0x630
      [  +0.000007]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [  +0.000007]  ? __virt_addr_valid+0x1c9/0x2c0
      [  +0.000005]  ? netif_queue_set_napi+0x1c2/0x1e0
      [  +0.000003]  kasan_report+0xe9/0x120
      [  +0.000004]  ? netif_queue_set_napi+0x1c2/0x1e0
      [  +0.000004]  netif_queue_set_napi+0x1c2/0x1e0
      [  +0.000005]  ice_vsi_close+0x161/0x670 [ice]
      [  +0.000114]  ice_dis_vsi+0x22f/0x270 [ice]
      [  +0.000095]  ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
      [  +0.000086]  ice_prepare_for_reset+0x299/0x750 [ice]
      [  +0.000087]  pci_dev_save_and_disable+0x82/0xd0
      [  +0.000006]  pci_reset_function+0x12d/0x230
      [  +0.000004]  reset_store+0xa0/0x100
      [  +0.000006]  ? __pfx_reset_store+0x10/0x10
      [  +0.000002]  ? __pfx_mutex_lock+0x10/0x10
      [  +0.000004]  ? __check_object_size+0x4c1/0x640
      [  +0.000007]  kernfs_fop_write_iter+0x30b/0x4a0
      [  +0.000006]  vfs_write+0x5d6/0xdf0
      [  +0.000005]  ? fd_install+0x180/0x350
      [  +0.000005]  ? __pfx_vfs_write+0x10/0xA10
      [  +0.000004]  ? do_fcntl+0x52c/0xcd0
      [  +0.000004]  ? kasan_save_track+0x13/0x60
      [  +0.000003]  ? kasan_save_free_info+0x37/0x60
      [  +0.000006]  ksys_write+0xfa/0x1d0
      [  +0.000003]  ? __pfx_ksys_write+0x10/0x10
      [  +0.000002]  ? __x64_sys_fcntl+0x121/0x180
      [  +0.000004]  ? _raw_spin_lock+0x87/0xe0
      [  +0.000005]  do_syscall_64+0x80/0x170
      [  +0.000007]  ? _raw_spin_lock+0x87/0xe0
      [  +0.000004]  ? __pfx__raw_spin_lock+0x10/0x10
      [  +0.000003]  ? file_close_fd_locked+0x167/0x230
      [  +0.000005]  ? syscall_exit_to_user_mode+0x7d/0x220
      [  +0.000005]  ? do_syscall_64+0x8c/0x170
      [  +0.000004]  ? do_syscall_64+0x8c/0x170
      [  +0.000003]  ? do_syscall_64+0x8c/0x170
      [  +0.000003]  ? fput+0x1a/0x2c0
      [  +0.000004]  ? filp_close+0x19/0x30
      [  +0.000004]  ? do_dup2+0x25a/0x4c0
      [  +0.000004]  ? __x64_sys_dup2+0x6e/0x2e0
      [  +0.000002]  ? syscall_exit_to_user_mode+0x7d/0x220
      [  +0.000004]  ? do_syscall_64+0x8c/0x170
      [  +0.000003]  ? __count_memcg_events+0x113/0x380
      [  +0.000005]  ? handle_mm_fault+0x136/0x820
      [  +0.000005]  ? do_user_addr_fault+0x444/0xa80
      [  +0.000004]  ? clear_bhb_loop+0x25/0x80
      [  +0.000004]  ? clear_bhb_loop+0x25/0x80
      [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [  +0.000005] RIP: 0033:0x7f2033593154
      
      Fixes: 080b0c8d ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
      Fixes: 91fdbce7 ("ice: Add support in the driver for associating queue with napi")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2a5dc090
    • Paolo Abeni's avatar
      Merge branch 'ptp-ocp-fix-serial-port-information-export' · cfd433ce
      Paolo Abeni authored
      Vadim Fedorenko says:
      
      ====================
      ptp: ocp: fix serial port information export
      
      Starting v6.8 the serial port subsystem changed the hierarchy of devices
      and symlinks are not working anymore. Previous discussion made it clear
      that the idea of symlinks for tty devices was wrong by design [1].
      This series implements additional attributes to expose the information
      and removes symlinks for tty devices.
      
      [1] https://lore.kernel.org/netdev/2024060503-subsonic-pupil-bbee@gregkh/
      
      v6 -> v7:
      - fix issues with applying patches
      v5 -> v6:
      - split conversion to array to separate patch per Jiri's feedback
      - move changelog to cover letter
      v4 -> v5:
      - remove unused variable in ptp_ocp_tty_show
      v3 -> v4:
      - re-organize info printing to use ptp_ocp_tty_port_name()
      - keep uintptr_t to be consistent with other code
      v2 -> v3:
      - replace serial ports definitions with array and enum for index
      - replace pointer math with direct array access
      - nit in documentation spelling
      v1 -> v2:
      - add Documentation/ABI changes
      ====================
      
      Link: https://patch.msgid.link/20240829183603.1156671-1-vadfed@meta.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cfd433ce
    • Vadim Fedorenko's avatar
      docs: ABI: update OCP TimeCard sysfs entries · 40bec579
      Vadim Fedorenko authored
      Update documentation according to the changes in the driver.
      
      New attributes group tty is exposed and ttyGNSS, ttyGNSS2, ttyMAC and
      ttyNMEA are moved to this group. Also, these attributes are no more
      links to the devices but rather simple text files containing names of
      tty devices.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      40bec579
    • Vadim Fedorenko's avatar
      ptp: ocp: adjust sysfs entries to expose tty information · 82ace0c8
      Vadim Fedorenko authored
      Implement additional attribute group to expose serial port information.
      Fixes tag points to the commit which introduced the change in serial
      port subsystem and made it impossible to use symlinks.
      
      Fixes: b286f4e8 ("serial: core: Move tty and serdev to be children of serial core port device")
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      82ace0c8
    • Vadim Fedorenko's avatar
      ptp: ocp: convert serial ports to array · d7875b4b
      Vadim Fedorenko authored
      Simplify serial port management code by using array of ports and helpers
      to get the name of the port. This change is needed to make the next
      patch simplier.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@meta.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d7875b4b
    • Jinjie Ruan's avatar
      net: phy: Fix missing of_node_put() for leds · 2560db6e
      Jinjie Ruan authored
      The call of of_get_child_by_name() will cause refcount incremented
      for leds, if it succeeds, it should call of_node_put() to decrease
      it, fix it.
      
      Fixes: 01e5b728 ("net: phy: Add a binding for PHY LEDs")
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20240830022025.610844-1-ruanjinjie@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2560db6e
    • Paolo Abeni's avatar
      Merge branch 'net-ethernet-ti-am65-cpsw-fix-xdp-implementation' · c2eb0626
      Paolo Abeni authored
      Roger Quadros says:
      
      ====================
      net: ethernet: ti: am65-cpsw: Fix XDP implementation
      
      The XDP implementation on am65-cpsw driver is broken in many ways
      and this series fixes it.
      
      Below are the current issues that are being fixed:
      
      1)  The following XDP_DROP test from [1] stalls the interface after
          250 packets.
          ~# xdb-bench drop -m native eth0
          This is because new RX requests are never queued. Fix that.
      
      2)  The below XDP_TX test from [1] fails with a warning
          [  499.947381] XDP_WARN: xdp_update_frame_from_buff(line:277): Driver BUG: missing reserved tailroom
          ~# xdb-bench tx -m native eth0
          Fix that by using PAGE_SIZE during xdp_init_buf().
      
      3)  In XDP_REDIRECT case only 1 packet was processed in rx_poll.
          Fix it to process up to budget packets.
          ~# ./xdp-bench redirect -m native eth0 eth0
      
      4)  If number of TX queues are set to 1 we get a NULL pointer
          dereference during XDP_TX.
          ~# ethtool -L eth0 tx 1
          ~# ./xdp-trafficgen udp -A <ipv6-src> -a <ipv6-dst> eth0 -t 2
          Transmitting on eth0 (ifindex 2)
          [  241.135257] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
      
      5)  Net statistics is broken for XDP_TX and XDP_REDIRECT
      
      [1] xdp-tools suite https://github.com/xdp-project/xdp-toolsSigned-off-by: default avatarRoger Quadros <rogerq@kernel.org>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarJulien Panis <jpanis@baylibre.com>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      ---
      ====================
      
      Link: https://patch.msgid.link/20240829-am65-cpsw-xdp-v1-0-ff3c81054a5e@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c2eb0626
    • Roger Quadros's avatar
      net: ethernet: ti: am65-cpsw: Fix RX statistics for XDP_TX and XDP_REDIRECT · 624d3291
      Roger Quadros authored
      We are not using ndev->stats for rx_packets and rx_bytes anymore.
      Instead, we use per CPU stats which are collated in
      am65_cpsw_nuss_ndo_get_stats().
      
      Fix RX statistics for XDP_TX and XDP_REDIRECT cases.
      
      Fixes: 8acacc40 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
      Signed-off-by: default avatarRoger Quadros <rogerq@kernel.org>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarJulien Panis <jpanis@baylibre.com>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      624d3291
    • Roger Quadros's avatar
      net: ethernet: ti: am65-cpsw: Fix NULL dereference on XDP_TX · 0a50c352
      Roger Quadros authored
      If number of TX queues are set to 1 we get a NULL pointer
      dereference during XDP_TX.
      
      ~# ethtool -L eth0 tx 1
      ~# ./xdp-trafficgen udp -A <ipv6-src> -a <ipv6-dst> eth0 -t 2
      Transmitting on eth0 (ifindex 2)
      [  241.135257] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
      
      Fix this by using actual TX queues instead of max TX queues
      when picking the TX channel in am65_cpsw_ndo_xdp_xmit().
      
      Fixes: 8acacc40 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
      Signed-off-by: default avatarRoger Quadros <rogerq@kernel.org>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarJulien Panis <jpanis@baylibre.com>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0a50c352
    • Roger Quadros's avatar
      net: ethernet: ti: am65-cpsw: fix XDP_DROP, XDP_TX and XDP_REDIRECT · 5e24db55
      Roger Quadros authored
      The following XDP_DROP test from [1] stalls the interface after
      250 packets.
      ~# xdb-bench drop -m native eth0
      This is because new RX requests are never queued. Fix that.
      
      The below XDP_TX test from [1] fails with a warning
      [  499.947381] XDP_WARN: xdp_update_frame_from_buff(line:277): Driver BUG: missing reserved tailroom
      ~# xdb-bench tx -m native eth0
      Fix that by using PAGE_SIZE during xdp_init_buf().
      
      In XDP_REDIRECT case only 1 packet was processed in rx_poll.
      Fix it to process up to budget packets.
      
      Fix all XDP error cases to call trace_xdp_exception() and drop the packet
      in am65_cpsw_run_xdp().
      
      [1] xdp-tools suite https://github.com/xdp-project/xdp-tools
      
      Fixes: 8acacc40 ("net: ethernet: ti: am65-cpsw: Add minimal XDP support")
      Signed-off-by: default avatarRoger Quadros <rogerq@kernel.org>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarJulien Panis <jpanis@baylibre.com>
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5e24db55
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2024-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 5517ae24
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - qca: If memdump doesn't work, re-enable IBS
       - MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT
       - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE"
       - MGMT: Ignore keys being loaded with invalid type
      
      * tag 'for-net-2024-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: MGMT: Ignore keys being loaded with invalid type
        Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE"
        Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT
        Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once
        Bluetooth: qca: If memdump doesn't work, re-enable IBS
      ====================
      
      Link: https://patch.msgid.link/20240830220300.1316772-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5517ae24
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.11-20240830' of... · 646f4968
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.11-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2024-08-30
      
      The first patch is by Kuniyuki Iwashima for the CAN BCM protocol that
      adds a missing proc entry removal when a device unregistered.
      
      Simon Horman fixes the cleanup in the error cleanup path of the m_can
      driver's open function.
      
      Markus Schneider-Pargmann contributes 7 fixes for the m_can driver,
      all related to the recently added IRQ coalescing support.
      
      The next 2 patches are by me, target the mcp251xfd driver and fix ring
      and coalescing configuration problems when switching from CAN-CC to
      CAN-FD mode.
      
      Simon Arlott's patch fixes a possible deadlock in the mcp251x driver.
      
      The last patch is by Martin Jocic for the kvaser_pciefd driver and
      fixes a problem with lost IRQs, which result in starvation, under high
      load situations.
      
      * tag 'linux-can-fixes-for-6.11-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: kvaser_pciefd: Use a single write when releasing RX buffers
        can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open
        can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration
        can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode
        can: m_can: Limit coalescing to peripheral instances
        can: m_can: Reset cached active_interrupts on start
        can: m_can: disable_all_interrupts, not clear active_interrupts
        can: m_can: Do not cancel timer from within timer
        can: m_can: Remove m_can_rx_peripheral indirection
        can: m_can: Remove coalesing disable in isr during suspend
        can: m_can: Reset coalescing during suspend/resume
        can: m_can: Release irq on error in m_can_open
        can: bcm: Remove proc entry when dev is unregistered.
      ====================
      
      Link: https://patch.msgid.link/20240830215914.1610393-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      646f4968
  2. 02 Sep, 2024 2 commits
  3. 01 Sep, 2024 3 commits
  4. 30 Aug, 2024 9 commits
    • Luiz Augusto von Dentz's avatar
      Bluetooth: MGMT: Ignore keys being loaded with invalid type · 1e9683c9
      Luiz Augusto von Dentz authored
      Due to 59b047bc there could be keys stored
      with the wrong address type so this attempt to detect it and ignore them
      instead of just failing to load all keys.
      
      Cc: stable@vger.kernel.org
      Link: https://github.com/bluez/bluez/issues/875
      Fixes: 59b047bc ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      1e9683c9
    • Luiz Augusto von Dentz's avatar
      Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE" · 532f8bcd
      Luiz Augusto von Dentz authored
      This reverts commit 59b047bc which
      breaks compatibility with commands like:
      
      bluetoothd[46328]: @ MGMT Command: Load.. (0x0013) plen 74  {0x0001} [hci0]
              Keys: 2
              BR/EDR Address: C0:DC:DA:A5:E5:47 (Samsung Electronics Co.,Ltd)
              Key type: Authenticated key from P-256 (0x03)
              Central: 0x00
              Encryption size: 16
              Diversifier[2]: 0000
              Randomizer[8]: 0000000000000000
              Key[16]: 6ed96089bd9765be2f2c971b0b95f624
              LE Address: D7:2A:DE:1E:73:A2 (Static)
              Key type: Unauthenticated key from P-256 (0x02)
              Central: 0x00
              Encryption size: 16
              Diversifier[2]: 0000
              Randomizer[8]: 0000000000000000
              Key[16]: 87dd2546ededda380ffcdc0a8faa4597
      @ MGMT Event: Command Status (0x0002) plen 3                {0x0001} [hci0]
            Load Long Term Keys (0x0013)
              Status: Invalid Parameters (0x0d)
      
      Cc: stable@vger.kernel.org
      Link: https://github.com/bluez/bluez/issues/875
      Fixes: 59b047bc ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      532f8bcd
    • Luiz Augusto von Dentz's avatar
      Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT · 227a0cdf
      Luiz Augusto von Dentz authored
      MGMT_OP_DISCONNECT can be called while mgmt_device_connected has not
      been called yet, which will cause the connection procedure to be
      aborted, so mgmt_device_disconnected shall still respond with command
      complete to MGMT_OP_DISCONNECT and just not emit
      MGMT_EV_DEVICE_DISCONNECTED since MGMT_EV_DEVICE_CONNECTED was never
      sent.
      
      To fix this MGMT_OP_DISCONNECT is changed to work similarly to other
      command which do use hci_cmd_sync_queue and then use hci_conn_abort to
      disconnect and returns the result, in order for hci_conn_abort to be
      used from hci_cmd_sync context it now uses hci_cmd_sync_run_once.
      
      Link: https://github.com/bluez/bluez/issues/932
      Fixes: 12d4a3b2 ("Bluetooth: Move check for MGMT_CONNECTED flag into mgmt.c")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      227a0cdf
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_once · c898f6d7
      Luiz Augusto von Dentz authored
      This introduces hci_cmd_sync_run/hci_cmd_sync_run_once which acts like
      hci_cmd_sync_queue/hci_cmd_sync_queue_once but runs immediately when
      already on hdev->cmd_sync_work context.
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      c898f6d7
    • Douglas Anderson's avatar
      Bluetooth: qca: If memdump doesn't work, re-enable IBS · 8ae22de9
      Douglas Anderson authored
      On systems in the field, we are seeing this sometimes in the kernel logs:
        Bluetooth: qca_controller_memdump() hci0: hci_devcd_init Return:-95
      
      This means that _something_ decided that it wanted to get a memdump
      but then hci_devcd_init() returned -EOPNOTSUPP (AKA -95).
      
      The cleanup code in qca_controller_memdump() when we get back an error
      from hci_devcd_init() undoes most things but forgets to clear
      QCA_IBS_DISABLED. One side effect of this is that, during the next
      suspend, qca_suspend() will always get a timeout.
      
      Let's fix it so that we clear the bit.
      
      Fixes: 06d3fdfc ("Bluetooth: hci_qca: Add qcom devcoredump support")
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      8ae22de9
    • Martin Jocic's avatar
      can: kvaser_pciefd: Use a single write when releasing RX buffers · dd885d90
      Martin Jocic authored
      Kvaser's PCIe cards uses the KCAN FPGA IP block which has dual 4K
      buffers for incoming messages shared by all (currently up to eight)
      channels. While the driver processes messages in one buffer, new
      incoming messages are stored in the other and so on.
      
      The design of KCAN is such that a buffer must be fully read and then
      released. Releasing a buffer will make the FPGA switch buffers. If the
      other buffer contains at least one incoming message the FPGA will also
      instantly issue a new interrupt, if not the interrupt will be issued
      after receiving the first new message.
      
      With IRQx interrupts, it takes a little time for the interrupt to
      happen, enough for any previous ISR call to do it's business and
      return, but MSI interrupts are way faster so this time is reduced to
      almost nothing.
      
      So with MSI, releasing the buffer HAS to be the very last action of
      the ISR before returning, otherwise the new interrupt might be
      "masked" by the kernel because the previous ISR call hasn't returned.
      And the interrupts are edge-triggered so we cannot loose one, or the
      ping-pong reading process will stop.
      
      This is why this patch modifies the driver to use a single write to
      the SRB_CMD register before returning.
      Signed-off-by: default avatarMartin Jocic <martin.jocic@kvaser.com>
      Reviewed-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Link: https://patch.msgid.link/20240830153113.2081440-1-martin.jocic@kvaser.com
      Fixes: 26ad340e ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      dd885d90
    • Cong Wang's avatar
      tcp_bpf: fix return value of tcp_bpf_sendmsg() · fe1910f9
      Cong Wang authored
      When we cork messages in psock->cork, the last message triggers the
      flushing will result in sending a sk_msg larger than the current
      message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
      negative at least in the following case:
      
      468         case __SK_DROP:
      469         default:
      470                 sk_msg_free_partial(sk, msg, tosend);
      471                 sk_msg_apply_bytes(psock, tosend);
      472                 *copied -= (tosend + delta); // <==== HERE
      473                 return -EACCES;
      
      Therefore, it could lead to the following BUG with a proper value of
      'copied' (thanks to syzbot). We should not use negative 'copied' as a
      return value here.
      
        ------------[ cut here ]------------
        kernel BUG at net/socket.c:733!
        Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b4328 #0
        Hardware name: linux,dummy-virt (DT)
        pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
        pc : sock_sendmsg_nosec net/socket.c:733 [inline]
        pc : sock_sendmsg_nosec net/socket.c:728 [inline]
        pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
        lr : sock_sendmsg_nosec net/socket.c:730 [inline]
        lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
        sp : ffff800088ea3b30
        x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
        x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
        x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
        x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
        x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
        x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
        x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
        x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
        x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
        x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
        Call trace:
         sock_sendmsg_nosec net/socket.c:733 [inline]
         __sock_sendmsg+0x5c/0x60 net/socket.c:745
         ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
         ___sys_sendmsg+0xac/0x100 net/socket.c:2651
         __sys_sendmsg+0x84/0xe0 net/socket.c:2680
         __do_sys_sendmsg net/socket.c:2689 [inline]
         __se_sys_sendmsg net/socket.c:2687 [inline]
         __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
         __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
         invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
         el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
         do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
         el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
         el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
         el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
        Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
        ---[ end trace 0000000000000000 ]---
      
      Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
      Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20240821030744.320934-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fe1910f9
    • Jeongjun Park's avatar
      net/smc: prevent NULL pointer dereference in txopt_get · 98d4435e
      Jeongjun Park authored
      Since smc_inet6_prot does not initialize ipv6_pinfo_offset, inet6_create()
      copies an incorrect address value, sk + 0 (offset), to inet_sk(sk)->pinet6.
      
      In addition, since inet_sk(sk)->pinet6 and smc_sk(sk)->clcsock practically
      point to the same address, when smc_create_clcsk() stores the newly
      created clcsock in smc_sk(sk)->clcsock, inet_sk(sk)->pinet6 is corrupted
      into clcsock. This causes NULL pointer dereference and various other
      memory corruptions.
      
      To solve this problem, you need to initialize ipv6_pinfo_offset, add a
      smc6_sock structure, and then add ipv6_pinfo as the second member of
      the smc_sock structure.
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Fixes: d25a92cc ("net/smc: Introduce IPPROTO_SMC")
      Signed-off-by: default avatarJeongjun Park <aha310510@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98d4435e
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 1bb3c548
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-08-28 (igb, ice)
      
      This series contains updates to igb and ice drivers.
      
      Daiwei Li restores writing the TSICR (TimeSync Interrupt Cause)
      register on 82850 devices to workaround a hardware issue for igb.
      
      Dawid detaches netdev device for reset to avoid ethtool accesses during
      reset causing NULL pointer dereferences on ice.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Add netif_device_attach/detach into PF reset flow
        igb: Fix not clearing TimeSync interrupts for 82580
      ====================
      
      Link: https://patch.msgid.link/20240828225444.645154-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1bb3c548
  5. 29 Aug, 2024 10 commits
    • Jakub Kicinski's avatar
      MAINTAINERS: exclude bluetooth and wireless DT bindings from netdev ML · b57d643a
      Jakub Kicinski authored
      We exclude wireless drivers from the netdev@ traffic, to delegate
      it to linux-wireless@, and avoid overwhelming netdev@.
      Bluetooth drivers are implicitly excluded because they live under
      drivers/bluetooth, not drivers/net.
      
      In both cases DT bindings sit under Documentation/devicetree/bindings/net/
      and aren't excluded. So if a patch series touches DT bindings
      netdev@ ends up getting CCed, and these are usually fairly boring
      series.
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240828175821.2960423-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b57d643a
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0dd5dd63
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth, wireless and netfilter.
      
        No known outstanding regressions.
      
        Current release - regressions:
      
         - wifi: iwlwifi: fix hibernation
      
         - eth: ionic: prevent tx_timeout due to frequent doorbell ringing
      
        Previous releases - regressions:
      
         - sched: fix sch_fq incorrect behavior for small weights
      
         - wifi:
            - iwlwifi: take the mutex before running link selection
            - wfx: repair open network AP mode
      
         - netfilter: restore IP sanity checks for netdev/egress
      
         - tcp: fix forever orphan socket caused by tcp_abort
      
         - mptcp: close subflow when receiving TCP+FIN
      
         - bluetooth: fix random crash seen while removing btnxpuart driver
      
        Previous releases - always broken:
      
         - mptcp: more fixes for the in-kernel PM
      
         - eth: bonding: change ipsec_lock from spin lock to mutex
      
         - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
      
        Misc:
      
         - documentation: drop special comment style for net code"
      
      * tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
        nfc: pn533: Add poll mod list filling check
        mailmap: update entry for Sriram Yagnaraman
        selftests: mptcp: join: check re-re-adding ID 0 signal
        mptcp: pm: ADD_ADDR 0 is not a new address
        selftests: mptcp: join: validate event numbers
        mptcp: avoid duplicated SUB_CLOSED events
        selftests: mptcp: join: check re-re-adding ID 0 endp
        mptcp: pm: fix ID 0 endp usage after multiple re-creations
        mptcp: pm: do not remove already closed subflows
        selftests: mptcp: join: no extra msg if no counter
        selftests: mptcp: join: check re-adding init endp with != id
        mptcp: pm: reset MPC endp ID when re-added
        mptcp: pm: skip connecting to already established sf
        mptcp: pm: send ACK on an active subflow
        selftests: mptcp: join: check removing ID 0 endpoint
        mptcp: pm: fix RM_ADDR ID for the initial subflow
        mptcp: pm: reuse ID 0 after delete and re-add
        net: busy-poll: use ktime_get_ns() instead of local_clock()
        sctp: fix association labeling in the duplicate COOKIE-ECHO case
        mptcp: pr_debug: add missing \n at the end
        ...
      0dd5dd63
    • Aleksandr Mishin's avatar
      nfc: pn533: Add poll mod list filling check · febccb39
      Aleksandr Mishin authored
      In case of im_protocols value is 1 and tm_protocols value is 0 this
      combination successfully passes the check
      'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
      But then after pn533_poll_create_mod_list() call in pn533_start_poll()
      poll mod list will remain empty and dev->poll_mod_count will remain 0
      which lead to division by zero.
      
      Normally no im protocol has value 1 in the mask, so this combination is
      not expected by driver. But these protocol values actually come from
      userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
      broken or malicious program may pass a message containing a "bad"
      combination of protocol parameter values so that dev->poll_mod_count
      is not incremented inside pn533_poll_create_mod_list(), thus leading
      to division by zero.
      Call trace looks like:
      nfc_genl_start_poll()
        nfc_start_poll()
          ->start_poll()
          pn533_start_poll()
      
      Add poll mod list filling check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: dfccd0f5 ("NFC: pn533: Add some polling entropy")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Acked-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://patch.msgid.link/20240827084822.18785-1-amishin@t-argos.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      febccb39
    • Paolo Abeni's avatar
      Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 0240bceb
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
      payload from netdev/egress by subtracting skb_network_offset() when
      validating IPv4 packet length, otherwise 'meta l4proto udp' never
      matches.
      
      Patch #2 subtracts skb_network_offset() when validating IPv6 packet
      length for netdev/egress.
      
      netfilter pull request 24-08-28
      
      * tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
        netfilter: nf_tables: restore IP sanity checks for netdev/egress
      ====================
      
      Link: https://patch.msgid.link/20240828214708.619261-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0240bceb
    • Sriram Yagnaraman's avatar
    • Paolo Abeni's avatar
      Merge branch 'mptcp-more-fixes-for-the-in-kernel-pm' · b666a651
      Paolo Abeni authored
      Matthieu Baerts says:
      
      ====================
      mptcp: more fixes for the in-kernel PM
      
      Here is a new batch of fixes for the MPTCP in-kernel path-manager:
      
      Patch 1 ensures the address ID is set to 0 when the path-manager sends
      an ADD_ADDR for the address of the initial subflow. The same fix is
      applied when a new subflow is created re-using this special address. A
      fix for v6.0.
      
      Patch 2 is similar, but for the case where an endpoint is removed: if
      this endpoint was used for the initial address, it is important to send
      a RM_ADDR with this ID set to 0, and look for existing subflows with the
      ID set to 0. A fix for v6.0 as well.
      
      Patch 3 validates the two previous patches.
      
      Patch 4 makes the PM selecting an "active" path to send an address
      notification in an ACK, instead of taking the first path in the list. A
      fix for v5.11.
      
      Patch 5 fixes skipping the establishment of a new subflow if a previous
      subflow using the same pair of addresses is being closed. A fix for
      v5.13.
      
      Patch 6 resets the ID linked to the initial subflow when the linked
      endpoint is re-added, possibly with a different ID. A fix for v6.0.
      
      Patch 7 validates the three previous patches.
      
      Patch 8 is a small fix for the MPTCP Join selftest, when being used with
      older subflows not supporting all MIB counters. A fix for a commit
      introduced in v6.4, but backported up to v5.10.
      
      Patch 9 avoids the PM to try to close the initial subflow multiple
      times, and increment counters while nothing happened. A fix for v5.10.
      
      Patch 10 stops incrementing local_addr_used and add_addr_accepted
      counters when dealing with the address ID 0, because these counters are
      not taking into account the initial subflow, and are then not
      decremented when the linked addresses are removed. A fix for v6.0.
      
      Patch 11 validates the previous patch.
      
      Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
      initial subflow. A fix for v5.12.
      
      Patch 13 validates the previous patch.
      
      Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
      in order to re-create the initial subflow if it has been closed, even if
      the limit for *new* addresses -- not taking into account the address of
      the initial subflow -- has been reached. A fix for v5.10.
      
      Patch 15 validates the previous patch.
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      ---
      Matthieu Baerts (NGI0) (15):
            mptcp: pm: reuse ID 0 after delete and re-add
            mptcp: pm: fix RM_ADDR ID for the initial subflow
            selftests: mptcp: join: check removing ID 0 endpoint
            mptcp: pm: send ACK on an active subflow
            mptcp: pm: skip connecting to already established sf
            mptcp: pm: reset MPC endp ID when re-added
            selftests: mptcp: join: check re-adding init endp with != id
            selftests: mptcp: join: no extra msg if no counter
            mptcp: pm: do not remove already closed subflows
            mptcp: pm: fix ID 0 endp usage after multiple re-creations
            selftests: mptcp: join: check re-re-adding ID 0 endp
            mptcp: avoid duplicated SUB_CLOSED events
            selftests: mptcp: join: validate event numbers
            mptcp: pm: ADD_ADDR 0 is not a new address
            selftests: mptcp: join: check re-re-adding ID 0 signal
      
       net/mptcp/pm.c                                  |   4 +-
       net/mptcp/pm_netlink.c                          |  87 ++++++++++----
       net/mptcp/protocol.c                            |   6 +
       net/mptcp/protocol.h                            |   5 +-
       tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
       tools/testing/selftests/net/mptcp/mptcp_lib.sh  |   4 +
       6 files changed, 209 insertions(+), 50 deletions(-)
      ---
      base-commit: 3a0504d5
      change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817
      
      Best regards,
      ====================
      
      Link: https://patch.msgid.link/20240828-net-mptcp-more-pm-fix-v2-0-7f11b283fff7@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b666a651
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check re-re-adding ID 0 signal · f18fa2ab
      Matthieu Baerts (NGI0) authored
      This test extends "delete re-add signal" to validate the previous
      commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
      is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
      client should still be able to re-create this subflow, even if the
      add_addr_accepted limit has been reached as this special address is not
      considered as a new address.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: d0876b22 ("mptcp: add the incoming RM_ADDR support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f18fa2ab
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: ADD_ADDR 0 is not a new address · 57f86203
      Matthieu Baerts (NGI0) authored
      The ADD_ADDR 0 with the address from the initial subflow should not be
      considered as a new address: this is not something new. If the host
      receives it, it simply means that the address is available again.
      
      When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
      it as new by not incrementing the 'add_addr_accepted' counter. But the
      'accept_addr' might not be set if the limit has already been reached:
      this can be bypassed in this case. But before, it is important to check
      that this ADD_ADDR for the ID 0 is for the same address as the initial
      subflow. If not, it is not something that should happen, and the
      ADD_ADDR can be ignored.
      
      Note that if an ADD_ADDR is received while there is already a subflow
      opened using the same address, this ADD_ADDR is ignored as well. It
      means that if multiple ADD_ADDR for ID 0 are received, there will not be
      any duplicated subflows created by the client.
      
      Fixes: d0876b22 ("mptcp: add the incoming RM_ADDR support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57f86203
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: validate event numbers · 20ccc7c5
      Matthieu Baerts (NGI0) authored
      This test extends "delete and re-add" and "delete re-add signal" to
      validate the previous commit: the number of MPTCP events are checked to
      make sure there are no duplicated or unexpected ones.
      
      A new helper has been introduced to easily check these events. The
      missing events have been added to the lib.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      20ccc7c5
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: avoid duplicated SUB_CLOSED events · d82809b6
      Matthieu Baerts (NGI0) authored
      The initial subflow might have already been closed, but still in the
      connection list. When the worker is instructed to close the subflows
      that have been marked as closed, it might then try to close the initial
      subflow again.
      
       A consequence of that is that the SUB_CLOSED event can be seen twice:
      
        # ip mptcp endpoint
        1.1.1.1 id 1 subflow dev eth0
        2.2.2.2 id 2 subflow dev eth1
      
        # ip mptcp monitor &
        [         CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [     ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [  SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9
      
        # ip mptcp endpoint delete id 1
        [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
      
      The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
      second one from __mptcp_close_subflow().
      
      To avoid doing the post-closed processing twice, the subflow is now
      marked as closed the first time.
      
      Note that it is not enough to check if we are dealing with the first
      subflow and check its sk_state: the subflow might have been reset or
      closed before calling mptcp_close_ssk().
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d82809b6