Commits · 45fd01f2fbf1119d083931b095ad6d0f13443d0e · Kirill Smelkov / linux

21 Apr, 2023 40 commits

net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs · 45fd01f2

Leon Romanovsky authored Apr 20, 2023

ARP discovery code has same logic for RX and TX flows, but with
different source and destination fields. Instead of duplicating
same code in mlx5e_ipsec_init_macs, let's refactor.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45fd01f2

net/mlx5e: Properly release work data structure · 94edec44

Leon Romanovsky authored Apr 20, 2023

There are some flows in which work structure is not allocated at all
and it is needed to be checked prior release of data structure.

 general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
 CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: events xfrm_state_gc_task
 RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core]
 Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89
 RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206
 RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000
 RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050
 RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000
 R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000
 R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118
 FS:  0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:

  ___xfrm_state_destroy+0x3c8/0x5e0
  xfrm_state_gc_task+0xf6/0x140
  ? ___xfrm_state_destroy+0x5e0/0x5e0
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? spin_bug+0x1d0/0x1d0
  worker_thread+0x59d/0xec0
  ? __kthread_parkme+0xd9/0x1d0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30

 Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core]
 ---[ end trace 0000000000000000 ]---

Fixes: 4562116f ("net/mlx5e: Generalize IPsec work structs")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

94edec44

net/mlx5e: Compare all fields in IPv6 address · 3198ae7d

Leon Romanovsky authored Apr 20, 2023

Fix size argument in memcmp to compare whole IPv6 address.

Fixes: b3beba1f ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3198ae7d

net/mlx5e: Don't overwrite extack message returned from IPsec SA validator · 697b3518

Leon Romanovsky authored Apr 20, 2023

Addition of new err_xfrm label caused to error messages be overwritten.
Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
in a default message.

Fixes: aa8bd0c9 ("net/mlx5e: Support IPsec acquire default SA")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

697b3518

net/mlx5e: Fix FW error while setting IPsec policy block action · e239e31a

Leon Romanovsky authored Apr 20, 2023

When trying to set IPsec policy block action the following error is
generated:

 mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
	status bad parameter(0x3), syndrome (0x8708c3), err(-22)

This error means that drop action is not allowed when modify action is
set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.

Fixes: 67212396 ("net/mlx5e: Skip IPsec encryption for TX path without matching policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e239e31a

net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports · 35226750

Yan Wang authored Apr 19, 2023

The system hang because of dsa_tag_8021q_port_setup()->
				stmmac_vlan_rx_add_vid().

I found in stmmac_drv_probe() that cailing pm_runtime_put()
disabled the clock.

First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
resume/suspend is active.

Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
registers after stmmac's clock is closed.

I would suggest adding the pm_runtime_resume_and_get() to the
stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
while in use.

Fixes: b3dcb312 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yan Wang <rk.code@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35226750

Merge branch 'pds_core' · d8bb3824

David S. Miller authored Apr 21, 2023

Shannon Nelson says:

====================
pds_core driver

Summary:
--------
This patchset implements a new driver for use with the AMD/Pensando
Distributed Services Card (DSC), intended to provide core configuration
services through the auxiliary_bus and through a couple of EXPORTed
functions for use initially in VFio and vDPA feature specific drivers.

To keep this patchset to a manageable size, the pds_vdpa and pds_vfio
drivers have been split out into their own patchsets to be reviewed
separately.

Detail:
-------
AMD/Pensando is making available a new set of devices for supporting vDPA,
VFio, and potentially other features in the Distributed Services Card
(DSC).  These features are implemented through a PF that serves as a Core
device for controlling and configuring its VF devices.  These VF devices
have separate drivers that use the auxiliary_bus to work through the Core
device as the control path.

Currently, the DSC supports standard ethernet operations using the
ionic driver.  This is not replaced by the Core-based devices - these
new devices are in addition to the existing Ethernet device.  Typical DSC
configurations will include both PDS devices and Ionic Eth devices.
However, there is a potential future path for ethernet services to come
through this device as well.

The Core device is a new PCI PF/VF device managed by a new driver
'pds_core'.  The PF device has access to an admin queue for configuring
the services used by the VFs, and sets up auxiliary_bus devices for each
vDPA VF for communicating with the drivers for the vDPA devices.  The VFs
may be for VFio or vDPA, and other services in the future; these VF types
are selected as part of the DSC internal FW configurations, which is out
of the scope of this patchset.

When the vDPA support set is enabled in the core PF through its devlink
param, auxiliary_bus devices are created for each VF that supports the
feature.  The vDPA driver then connects to and uses this auxiliary_device
to do control path configuration through the PF device.  This can then be
used with the vdpa kernel module to provide devices for virtio_vdpa kernel
module for host interfaces, or vhost_vdpa kernel module for interfaces
exported into your favorite VM.

A cheap ASCII diagram of a vDPA instance looks something like this:

                                ,----------.
                                |   vdpa   |
                                '----------'
                                  |     ||
                                 ctl   data
                                  |     ||
                          .----------.  ||
                          | pds_vdpa |  ||
                          '----------'  ||
                               |        ||
                       pds_core.vDPA.1  ||
                               |        ||
                    .---------------.   ||
                    |   pds_core    |   ||
                    '---------------'   ||
                        ||         ||   ||
                      09:00.0      09:00.1
        == PCI ============================================
                        ||            ||
                   .----------.   .----------.
            ,------|    PF    |---|    VF    |-------,
            |      '----------'   '----------'       |
            |                  DSC                   |
            |                                        |
            ------------------------------------------

Changes:
  v11:
 - change strncpy to strscpy
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202304181137.WaZTYyAa-lkp@intel.com/

  v10:
Link: https://lore.kernel.org/netdev/20230418003228.28234-1-shannon.nelson@amd.com/
 - remove CONFIG_DEBUG_FS guard static inline stuff
 - remove unnecessary 0 and null initializations
 - verify in driver load that PDS_CORE_DRV_NAME matches KBUILD_MODNAME
 - remove debugfs irqs_show(), redundant with /proc
 - return -ENOMEM if intr_info = kcalloc() fails
 - move the status code enum into pds_core_if.h as part of API definition
 - fix up one place in pdsc_devcmd_wait() we're using the status codes where we could use the errno
 - remove redundant calls to flush_workqueue()
 - grab config_lock before testing state bits in pdsc_fw_reporter_diagnose()
 - change pdsc_color_match() to return bool
 - remove useless VIF setup loop and just setup vDPA services for now
 - remove pf pointer from struct padev and have clients use pci_physfn()
 - drop use of "vf" in auxdev.c function names, make more generic
 - remove last of client ops struct and simply export the functions
 - drop drivers@pensando.io from MAINTAINERS and add new include dir
 - include dynamic_debug.h in adminq.c to protect dynamic_hex_dump()
 - fixed fw_slot type from u8 to int for handling error returns
 - fixed comment spelling
 - changed void arg in pdsc_adminq_post() to struct pdsc *

  v9:
Link: https://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/
 - change pdsc field name id to uid to clarify the unique id used for aux device
 - remove unnecessary pf->state and other checks in aux device creation
 - hardcode fw slotnames for devlink info, don't use strings from FW
 - handle errors from PDS_CORE_CMD_INIT devcmd call
 - tighten up health thread use of config_lock
 - remove pdsc_queue_health_check() layer over queuing health check
 - start pds_core.rst file in first patch, add to it incrementally
 - give more user interaction info in commit messages
 - removed a few more extraneous includes

  v8:
Link: https://lore.kernel.org/netdev/20230330234628.14627-1-shannon.nelson@amd.com/
 - fixed deadlock problem, use devl_health_reporter_destroy() when devlink is locked
 - don't clear client_id until after auxiliary_device_uninit()

  v7:
Link: https://lore.kernel.org/netdev/20230330192313.62018-1-shannon.nelson@amd.com/
 - use explicit devlink locking and devl_* APIs
 - move some of devlink setup logic into probe and remove
 - use debugfs_create_u{type}() for state and queue head and tail
 - add include for linux/vmalloc.h
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303260420.Tgq0qobF-lkp@intel.com/

  v6:
Link: https://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/
 - removed version.h include noticed by kernel test robot's version check
Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303230742.pX3ply0t-lkp@intel.com/
 - fixed up the more egregious checkpatch line length complaints
 - make sure pdsc_auxbus_dev_register() checks padev pointer errcode

  v5:
Link: https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nelson@amd.com/
 - added devlink health reporter for FW issues
 - removed asic_type, asic_rev, serial_num, fw_version from debugfs as
   they are available through other means
 - trimed OS info in pdsc_identify(), we don't need to send that much info to the FW
 - removed reg/unreg from auxbus client API, they are now in the core when VF
   is started
 - removed need for pdsc definition in client by simplifying the padev to only carry
   struct pci_dev pointers rather than full struct pdsc to the pf and vf
 - removed the unused pdsc argument in pdsc_notify()
 - moved include/linux/pds/pds_core.h to driver/../pds_core/core.h
 - restored a few pds_core_if.h interface values and structs that are shared
   with FW source
 - moved final config_lock unlock to before tear down of timer and workqueue
   to be sure there are no deadlocks while waiting for any stragglers
 - changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
   without regard to kernel PAGE_SIZE configuration
 - removed the redundant *adminqcq argument from pdsc_adminq_post()

  v4:
Link: https://lore.kernel.org/netdev/20230308051310.12544-1-shannon.nelson@amd.com/
 - reworked to attach to both Core PF and vDPA VF PCI devices
 - now creates auxiliary_device as part of each VF PCI probe, removes them on PCI remove
 - auxiliary devices now use simple unique id rather than PCI address for identifier
 - replaced home-grown event publishing with kernel-based notifier service
 - dropped live_migration parameter, not needed when not creating aux device for it
 - replaced devm_* functions with traditional interfaces
 - added MAINTAINERS entry
 - removed lingering traces of set/get_vf attribute adminq commands
 - trimmed some include lists
 - cleaned a kernel test robot complaint about a stray unused variable
        Link: https://lore.kernel.org/oe-kbuild-all/202302181049.yeUQMeWY-lkp@intel.com/

  v3:
Link: https://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/
 - changed names from "pensando" to "amd" and updated copyright strings
 - dropped the DEVLINK_PARAM_GENERIC_ID_FW_BANK for future development
 - changed the auxiliary device creation to be triggered by the
   PCI bus event BOUND_DRIVER, and torn down at UNBIND_DRIVER in order
   to properly handle users using the sysfs bind/unbind functions
 - dropped some noisy log messages
 - rebased to current net-next

  RFC to v2:
Link: https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
 - added separate devlink param patches for DEVLINK_PARAM_GENERIC_ID_ENABLE_MIGRATION
   and DEVLINK_PARAM_GENERIC_ID_FW_BANK, and dropped the driver specific implementations
 - updated descriptions for the new devlink parameters
 - dropped netdev support
 - dropped vDPA patches, will followup later
 - separated fw update and fw bank select into their own patches

  RFC:
Link: https://lore.kernel.org/netdev/20221118225656.48309-1-snelson@pensando.io/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d8bb3824

pds_core: Kconfig and pds_core.rst · ddbcb220

Shannon Nelson authored Apr 19, 2023

Remaining documentation and Kconfig hook for building the driver.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddbcb220

pds_core: publish events to the clients · d24c2827

Shannon Nelson authored Apr 19, 2023

When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler.  Add the code to
pass along the events to the clients.

The entry points pdsc_register_notify() and pdsc_unregister_notify()
are EXPORTed for other drivers that want to listen for these events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d24c2827

pds_core: add the aux client API · 10659034

Shannon Nelson authored Apr 19, 2023

Add the client API operations for running adminq commands.
The core registers the client with the FW, then the client
has a context for requesting adminq services.  We expect
to add additional operations for other clients, including
requesting additional private adminqs and IRQs, but don't have
the need yet.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

10659034

pds_core: devlink params for enabling VIF support · 40ced894

Shannon Nelson authored Apr 19, 2023

Add the devlink parameter switches so the user can enable
the features supported by the VFs.  The only feature supported
at the moment is vDPA.

Example:
    devlink dev param set pci/0000:2b:00.0 \
	    name enable_vnet cmode runtime value true
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

40ced894

pds_core: add auxiliary_bus devices · 4569cce4

Shannon Nelson authored Apr 19, 2023

An auxiliary_bus device is created for each vDPA type VF at VF
probe and destroyed at VF remove.  The aux device name comes
from the driver name + VIF type + the unique id assigned at PCI
probe.  The VFs are always removed on PF remove, so there should
be no issues with VFs trying to access missing PF structures.

The auxiliary_device names will look like "pds_core.vDPA.nn"
where 'nn' is the VF's uid.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4569cce4

pds_core: add initial VF device handling · f53d9311

Shannon Nelson authored Apr 19, 2023

This is the initial VF PCI driver framework for the new
pds_vdpa VF device, which will work in conjunction with an
auxiliary_bus client of the pds_core driver.  This does the
very basics of registering for the new VF device, setting
up debugfs entries, and registering with devlink.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f53d9311

pds_core: set up the VIF definitions and defaults · 65e0185a

Shannon Nelson authored Apr 19, 2023

The Virtual Interfaces (VIFs) supported by the DSC's
configuration (vDPA, Eth, RDMA, etc) are reported in the
dev_ident struct and made visible in debugfs.  At this point
only vDPA is supported in this driver so we only setup
devices for that feature.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e0185a

pds_core: add FW update feature to devlink · 49ce92fb

Shannon Nelson authored Apr 19, 2023

Add in the support for doing firmware updates.  Of the two
main banks available, a and b, this updates the one not in
use and then selects it for the next boot.

Example:
    devlink dev flash pci/0000:b2:00.0 \
	    file pensando/dsc_fw_1.63.0-22.tar
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

49ce92fb

pds_core: Add adminq processing and commands · 01ba61b5

Shannon Nelson authored Apr 19, 2023

Add the service routines for submitting and processing
the adminq messages and for handling notifyq events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

01ba61b5

pds_core: set up device and adminq · 45d76f49

Shannon Nelson authored Apr 19, 2023

Set up the basic adminq and notifyq queue structures.  These are
used mostly by the client drivers for feature configuration.
These are essentially the same adminq and notifyq as in the
ionic driver.

Part of this includes querying for device identity and FW
information, so we can make that available to devlink dev info.

  $ devlink dev info pci/0000:b5:00.0
  pci/0000:b5:00.0:
    driver pds_core
    serial_number FLM18420073
    versions:
        fixed:
          asic.id 0x0
          asic.rev 0x0
        running:
          fw 1.51.0-73
        stored:
          fw.goldfw 1.15.9-C-22
          fw.mainfwa 1.60.0-73
          fw.mainfwb 1.60.0-57
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

45d76f49

pds_core: add devlink health facilities · 25b450c0

Shannon Nelson authored Apr 19, 2023

Add devlink health reporting on top of our fw watchdog.

Example:
  # devlink health show pci/0000:2b:00.0 reporter fw
  pci/0000:2b:00.0:
    reporter fw
      state healthy error 0 recover 0
  # devlink health diagnose pci/0000:2b:00.0 reporter fw
   Status: healthy State: 1 Generation: 0 Recoveries: 0
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

25b450c0

pds_core: health timer and workqueue · c2dbb090

Shannon Nelson authored Apr 19, 2023

Add in the periodic health check and the related workqueue,
as well as the handlers for when a FW reset is seen.

The firmware is polled every 5 seconds to be sure that it is
still alive and that the FW generation didn't change.

The alive check looks to see that the PCI bus is still readable
and the fw_status still has the RUNNING bit on.  If not alive,
the driver stops activity and tears things down.  When the FW
recovers and the alive check again succeeds, the driver sets
back up for activity.

The generation check looks at the fw_generation to see if it
has changed, which can happen if the FW crashed and recovered
or was updated in between health checks.  If changed, the
driver counts that as though the alive test failed and forces
the fw_down/fw_up cycle.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2dbb090

pds_core: add devcmd device interfaces · 523847df

Shannon Nelson authored Apr 19, 2023

The devcmd interface is the basic connection to the device through the
PCI BAR for low level identification and command services.  This does
the early device initialization and finds the identity data, and adds
devcmd routines to be used by later driver bits.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

523847df

pds_core: initial framework for pds_core PF driver · 55435ea7

Shannon Nelson authored Apr 19, 2023

This is the initial PCI driver framework for the new pds_core device
driver and its family of devices.  This does the very basics of
registering for the new PF PCI device 1dd8:100c, setting up debugfs
entries, and registering with devlink.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

55435ea7

Merge branch 'bridge-neigh-suppression' · 25c800b2

David S. Miller authored Apr 21, 2023

Ido Schimmel says:

====================
bridge: Add per-{Port, VLAN} neighbor suppression

Background
==========

In order to minimize the flooding of ARP and ND messages in the VXLAN
network, EVPN includes provisions [1] that allow participating VTEPs to
suppress such messages in case they know the MAC-IP binding and can
reply on behalf of the remote host. In Linux, the above is implemented
in the bridge driver using a per-port option called "neigh_suppress"
that was added in kernel version 4.15 [2].

Motivation
==========

Some applications use ARP messages as keepalives between the application
nodes in the network. This works perfectly well when two nodes are
connected to the same VTEP. When a node goes down it will stop
responding to ARP requests and the other node will notice it
immediately.

However, when the two nodes are connected to different VTEPs and
neighbor suppression is enabled, the local VTEP will reply to ARP
requests even after the remote node went down, until certain timers
expire and the EVPN control plane decides to withdraw the MAC/IP
Advertisement route for the address. Therefore, some users would like to
be able to disable neighbor suppression on VLANs where such applications
reside and keep it enabled on the rest.

Implementation
==============

The proposed solution is to allow user space to control neighbor
suppression on a per-{Port, VLAN} basis, in a similar fashion to other
per-port options that gained per-{Port, VLAN} counterparts such as
"mcast_router". This allows users to benefit from the operational
simplicity and scalability associated with shared VXLAN devices (i.e.,
external / collect-metadata mode), while still allowing for per-VLAN/VNI
neighbor suppression control.

The user interface is extended with a new "neigh_vlan_suppress" bridge
port option that allows user space to enable per-{Port, VLAN} neighbor
suppression on the bridge port. When enabled, the existing
"neigh_suppress" option has no effect and neighbor suppression is
controlled using a new "neigh_suppress" VLAN option. Example usage:

 # bridge link set dev vxlan0 neigh_vlan_suppress on
 # bridge vlan add vid 10 dev vxlan0
 # bridge vlan set vid 10 dev vxlan0 neigh_suppress on

Testing
=======

Tested using existing bridge selftests. Added a dedicated selftest in
the last patch.

Patchset overview
=================

Patches #1-#5 are preparations.

Patch #6 adds per-{Port, VLAN} neighbor suppression support to the
bridge's data path.

Patches #7-#8 add the required netlink attributes to enable the feature.

Patch #9 adds a selftest.

iproute2 patches can be found here [3].

Changelog
=========

Since RFC [4]:

No changes.

[1] https://www.rfc-editor.org/rfc/rfc7432#section-10
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a42317785c898c0ed46db45a33b0cc71b671bf29
[3] https://github.com/idosch/iproute2/tree/submit/neigh_suppress_v1
[4] https://lore.kernel.org/netdev/20230413095830.2182382-1-idosch@nvidia.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

25c800b2

selftests: net: Add bridge neighbor suppression test · 7648ac72

Ido Schimmel authored Apr 19, 2023

Add test cases for bridge neighbor suppression, testing both per-port
and per-{Port, VLAN} neighbor suppression with both ARP and NS packets.

Example truncated output:

 # ./test_bridge_neigh_suppress.sh
 [...]
 Tests passed: 148
 Tests failed:   0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7648ac72

bridge: Allow setting per-{Port, VLAN} neighbor suppression state · 160656d7

Ido Schimmel authored Apr 19, 2023

Add a new bridge port attribute that allows user space to enable
per-{Port, VLAN} neighbor suppression. Example:

 # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
 false
 # bridge link set dev swp1 neigh_vlan_suppress on
 # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
 true
 # bridge link set dev swp1 neigh_vlan_suppress off
 # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
 false
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

160656d7

bridge: vlan: Allow setting VLAN neighbor suppression state · 83f6d600

Ido Schimmel authored Apr 19, 2023

Add a new VLAN attribute that allows user space to set the neighbor
suppression state of the port VLAN. Example:

 # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
 false
 # bridge vlan set vid 10 dev swp1 neigh_suppress on
 # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
 true
 # bridge vlan set vid 10 dev swp1 neigh_suppress off
 # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
 false

 # bridge vlan set vid 10 dev br0 neigh_suppress on
 Error: bridge: Can't set neigh_suppress for non-port vlans.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

83f6d600

bridge: Add per-{Port, VLAN} neighbor suppression data path support · 412614b1

Ido Schimmel authored Apr 19, 2023

When the bridge is not VLAN-aware (i.e., VLAN ID is 0), determine if
neighbor suppression is enabled on a given bridge port solely based on
the existing 'BR_NEIGH_SUPPRESS' flag.

Otherwise, if the bridge is VLAN-aware, first check if per-{Port, VLAN}
neighbor suppression is enabled on the given bridge port using the
'BR_NEIGH_VLAN_SUPPRESS' flag. If so, look up the VLAN and check whether
it has neighbor suppression enabled based on the per-VLAN
'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED' flag.

If the bridge is VLAN-aware, but the bridge port does not have
per-{Port, VLAN} neighbor suppression enabled, then fallback to
determine neighbor suppression based on the 'BR_NEIGH_SUPPRESS' flag.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

412614b1

bridge: Encapsulate data path neighbor suppression logic · 3aca683e

Ido Schimmel authored Apr 19, 2023

Currently, there are various places in the bridge data path that check
whether neighbor suppression is enabled on a given bridge port.

As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate
this logic in a function and pass the VLAN ID of the packet as an
argument.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3aca683e

bridge: Take per-{Port, VLAN} neighbor suppression into account · 6be42ed0

Ido Schimmel authored Apr 19, 2023

The bridge driver gates the neighbor suppression code behind an internal
per-bridge flag called 'BROPT_NEIGH_SUPPRESS_ENABLED'. The flag is set
when at least one bridge port has neighbor suppression enabled.

As a preparation for per-{Port, VLAN} neighbor suppression, make sure
the global flag is also set if per-{Port, VLAN} neighbor suppression is
enabled. That is, when the 'BR_NEIGH_VLAN_SUPPRESS' flag is set on at
least one bridge port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6be42ed0

bridge: Add internal flags for per-{Port, VLAN} neighbor suppression · a714e3ec

Ido Schimmel authored Apr 19, 2023

Add two internal flags that will be used to enable / disable per-{Port,
VLAN} neighbor suppression:

1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that
per-{Port, VLAN} neighbor suppression is enabled on the bridge port.
When set, 'BR_NEIGH_SUPPRESS' has no effect.

2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate
that neighbor suppression is enabled on the given VLAN.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a714e3ec

bridge: Pass VLAN ID to br_flood() · e408336a

Ido Schimmel authored Apr 19, 2023

Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which will require br_flood() to potentially suppress ARP /
NS packets on a per-{Port, VLAN} basis.

As a preparation, pass the VLAN ID of the packet as another argument to
br_flood().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e408336a

bridge: Reorder neighbor suppression check when flooding · 013a7ce8

Ido Schimmel authored Apr 19, 2023

The bridge does not flood ARP / NS packets for which a reply was sent to
bridge ports that have neighbor suppression enabled.

Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which is going to make it more expensive to check whether
neighbor suppression is enabled since a VLAN lookup will be required.

Therefore, instead of unnecessarily performing this lookup for every
packet, only perform it for ARP / NS packets for which a reply was sent.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

013a7ce8

Merge branch 'macsec-vlan' · 1cf3fe1c

David S. Miller authored Apr 21, 2023

Emeel Hakim says:

====================
Support MACsec VLAN

This patch series introduces support for hardware (HW) offload MACsec
devices with VLAN configuration. The patches address both scenarios
where the VLAN header is both the inner and outer header for MACsec.

The changes include:

1. Adding MACsec offload operation for VLAN.
2. Considering VLAN when accessing MACsec net device.
3. Currently offloading MACsec when it's configured over VLAN with
current MACsec TX steering rules would wrongly insert the MACsec sec tag
after inserting the VLAN header. This resulted in an ETHERNET | SECTAG |
VLAN packet when ETHERNET | VLAN | SECTAG is configured. The patche
handles this issue when configuring steering rules.
4. Adding MACsec rx_handler change support in case of a marked skb and a
mismatch on the dst MAC address.

Please review these changes and let me know if you have any feedback or
concerns.

Updates since v1:
- Consult vlan_features when adding NETIF_F_HW_MACSEC.
- Allow grep for the functions.
- Add helper function to get the macsec operation to allow the compiler
  to make some choice.

Updates since v2:
- Don't use macros to allow direct navigattion from mdo functions to its
  implementation.
- Make the vlan_get_macsec_ops argument a const.
- Check if the specific mdo function is available before calling it.
- Enable NETIF_F_HW_MACSEC by default when the lower device has it enabled
  and in case the lower device currently has NETIF_F_HW_MACSEC but disabled
  let the new vlan device also have it disabled.

Updates since v3:
- Split patch ("vlan: Add MACsec offload operations for VLAN interface")
  to prevent mixing generic vlan code changes with driver changes.
- Add mdo_open, stop and stats to support drivers which have those.
- Don't fail if macsec offload operations are available but a specific
  function is not, to support drivers which does not implement all
  macsec offload operations.
- Don't call find_rx_sc twice in the same loop, instead save the result
  in a parameter and re-use it.
- Completely remove _BUILD_VLAN_MACSEC_MDO macro, to prevent returning
  from a macro.
- Reorder the functions inside struct macsec_ops to match the struct
  decleration.

 Updates since v4:
 - Change subject line of ("macsec: Add MACsec rx_handler change support") and adapt commit message.
 - Don't separate the new check in patch ("macsec: Add MACsec rx_handler change support")
   from the previous if/else if.
 - Drop"_found" from the parameter naming "rx_sc_found" and move the definition to
   the relevant block.
 - Remove "{}" since not needed around a single line.

 Updates since v5:
 - Consider promiscuous mode case.

 Updates since v6:
 - Use IS_ENABLED instead of checking for ifdef.
 - Don't add inline keywork in c files, let the compiler make its own decisions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1cf3fe1c

macsec: Don't rely solely on the dst MAC address to identify destination MACsec device · 7661351a

Emeel Hakim authored Apr 19, 2023

Offloading device drivers will mark offloaded MACsec SKBs with the
corresponding SCI in the skb_metadata_dst so the macsec rx handler will
know to which interface to divert those skbs, in case of a marked skb
and a mismatch on the dst MAC address, divert the skb to the macsec
net_device where the macsec rx_handler will be called to consider cases
where relying solely on the dst MAC address is insufficient.

One such instance is when using MACsec with a VLAN as an inner
header, where the packet structure is ETHERNET | SECTAG | VLAN.
In such a scenario, the dst MAC address in the ethernet header
will correspond to the VLAN MAC address, resulting in a mismatch.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7661351a

net/mlx5: Consider VLAN interface in MACsec TX steering rules · 765f974c

Emeel Hakim authored Apr 19, 2023

Offloading MACsec when its configured over VLAN with current MACsec
TX steering rules will wrongly insert MACsec sec tag after inserting
the VLAN header leading to a ETHERNET | SECTAG | VLAN packet when
ETHERNET | VLAN | SECTAG is configured.

The above issue is due to adding the SECTAG by HW which is a later
stage compared to the VLAN header insertion stage.

Detect such a case and adjust TX steering rules to insert the
SECTAG in the correct place by using reformat_param_0 field in
the packet reformat to indicate the offset of SECTAG from end of
the MAC header to account for VLANs in granularity of 4Bytes.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

765f974c

net/mlx5: Support MACsec over VLAN · 4bba492b

Emeel Hakim authored Apr 19, 2023

MACsec device may have a VLAN device on top of it.
Detect MACsec state correctly under this condition,
and return the correct net device accordingly.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bba492b

net/mlx5: Enable MACsec offload feature for VLAN interface · 339ccec8

Emeel Hakim authored Apr 19, 2023

Enable MACsec offload feature over VLAN by adding NETIF_F_HW_MACSEC
to the device vlan_features.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

339ccec8

vlan: Add MACsec offload operations for VLAN interface · abff3e5e

Emeel Hakim authored Apr 19, 2023

Add support for MACsec offload operations for VLAN driver
to allow offloading MACsec when VLAN's real device supports
Macsec offload by forwarding the offload request to it.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abff3e5e

Merge branch 'sctp-nested-flex-arrays' · e2598dbd

David S. Miller authored Apr 21, 2023

Xin Long says:

====================
sctp: fix a plenty of flexible-array-nested warnings

Paolo noticed a compile warning in SCTP,

../net/sctp/stream_sched_fc.c: note: in included file (through ../include/net/sctp/sctp.h):
../include/net/sctp/structs.h:335:41: warning: array of flexible structures

But not only this, there are actually quite a lot of such warnings in
some SCTP structs. This patchset fixes most of warnings by deleting
these nested flexible array members.

After this patchset, there are still some warnings left:

  # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
  ./include/net/sctp/structs.h:1145:41: warning: nested flexible array
  ./include/uapi/linux/sctp.h:641:34: warning: nested flexible array
  ./include/uapi/linux/sctp.h:643:34: warning: nested flexible array
  ./include/uapi/linux/sctp.h:644:33: warning: nested flexible array
  ./include/uapi/linux/sctp.h:650:40: warning: nested flexible array
  ./include/uapi/linux/sctp.h:653:39: warning: nested flexible array

the 1st is caused by __data[] in struct ip_options, not in SCTP;
the others are in uapi, and we should not touch them.

Note that instead of completely deleting it, we just leave it as a
comment in the struct, signalling to the reader that we do expect
such variable parameters over there, as Marcelo suggested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e2598dbd

sctp: delete the nested flexible array payload · dbda0fba

Xin Long authored Apr 19, 2023

This patch deletes the flexible-array payload[] from the structure
sctp_datahdr to avoid some sparse warnings:

  # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
  net/sctp/socket.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
  ./include/linux/sctp.h:230:29: warning: nested flexible array

This member is not even used anywhere.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbda0fba

sctp: delete the nested flexible array hmac · 2ab399a9

Xin Long authored Apr 19, 2023

This patch deletes the flexible-array hmac[] from the structure
sctp_authhdr to avoid some sparse warnings:

  # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
  net/sctp/auth.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
  ./include/linux/sctp.h:735:29: warning: nested flexible array
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ab399a9