1. 21 Apr, 2023 40 commits
    • Leon Romanovsky's avatar
      net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs · 45fd01f2
      Leon Romanovsky authored
      ARP discovery code has same logic for RX and TX flows, but with
      different source and destination fields. Instead of duplicating
      same code in mlx5e_ipsec_init_macs, let's refactor.
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45fd01f2
    • Leon Romanovsky's avatar
      net/mlx5e: Properly release work data structure · 94edec44
      Leon Romanovsky authored
      There are some flows in which work structure is not allocated at all
      and it is needed to be checked prior release of data structure.
      
       general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN
       KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
       CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: events xfrm_state_gc_task
       RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core]
       Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89
       RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206
       RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000
       RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050
       RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000
       R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000
       R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118
       FS:  0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
      
        ___xfrm_state_destroy+0x3c8/0x5e0
        xfrm_state_gc_task+0xf6/0x140
        ? ___xfrm_state_destroy+0x5e0/0x5e0
        process_one_work+0x7c2/0x1340
        ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
        ? pwq_dec_nr_in_flight+0x230/0x230
        ? spin_bug+0x1d0/0x1d0
        worker_thread+0x59d/0xec0
        ? __kthread_parkme+0xd9/0x1d0
        ? process_one_work+0x1340/0x1340
        kthread+0x28f/0x330
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
      
       Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core]
       ---[ end trace 0000000000000000 ]---
      
      Fixes: 4562116f ("net/mlx5e: Generalize IPsec work structs")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94edec44
    • Leon Romanovsky's avatar
      net/mlx5e: Compare all fields in IPv6 address · 3198ae7d
      Leon Romanovsky authored
      Fix size argument in memcmp to compare whole IPv6 address.
      
      Fixes: b3beba1f ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Reviewed-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3198ae7d
    • Leon Romanovsky's avatar
      net/mlx5e: Don't overwrite extack message returned from IPsec SA validator · 697b3518
      Leon Romanovsky authored
      Addition of new err_xfrm label caused to error messages be overwritten.
      Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
      in a default message.
      
      Fixes: aa8bd0c9 ("net/mlx5e: Support IPsec acquire default SA")
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697b3518
    • Leon Romanovsky's avatar
      net/mlx5e: Fix FW error while setting IPsec policy block action · e239e31a
      Leon Romanovsky authored
      When trying to set IPsec policy block action the following error is
      generated:
      
       mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
      	status bad parameter(0x3), syndrome (0x8708c3), err(-22)
      
      This error means that drop action is not allowed when modify action is
      set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.
      
      Fixes: 67212396 ("net/mlx5e: Skip IPsec encryption for TX path without matching policy")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e239e31a
    • Yan Wang's avatar
      net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports · 35226750
      Yan Wang authored
      The system hang because of dsa_tag_8021q_port_setup()->
      				stmmac_vlan_rx_add_vid().
      
      I found in stmmac_drv_probe() that cailing pm_runtime_put()
      disabled the clock.
      
      First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
      resume/suspend is active.
      
      Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
      will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
      The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
      registers after stmmac's clock is closed.
      
      I would suggest adding the pm_runtime_resume_and_get() to the
      stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
      while in use.
      
      Fixes: b3dcb312 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()")
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarYan Wang <rk.code@outlook.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35226750
    • David S. Miller's avatar
      Merge branch 'pds_core' · d8bb3824
      David S. Miller authored
      Shannon Nelson says:
      
      ====================
      pds_core driver
      
      Summary:
      --------
      This patchset implements a new driver for use with the AMD/Pensando
      Distributed Services Card (DSC), intended to provide core configuration
      services through the auxiliary_bus and through a couple of EXPORTed
      functions for use initially in VFio and vDPA feature specific drivers.
      
      To keep this patchset to a manageable size, the pds_vdpa and pds_vfio
      drivers have been split out into their own patchsets to be reviewed
      separately.
      
      Detail:
      -------
      AMD/Pensando is making available a new set of devices for supporting vDPA,
      VFio, and potentially other features in the Distributed Services Card
      (DSC).  These features are implemented through a PF that serves as a Core
      device for controlling and configuring its VF devices.  These VF devices
      have separate drivers that use the auxiliary_bus to work through the Core
      device as the control path.
      
      Currently, the DSC supports standard ethernet operations using the
      ionic driver.  This is not replaced by the Core-based devices - these
      new devices are in addition to the existing Ethernet device.  Typical DSC
      configurations will include both PDS devices and Ionic Eth devices.
      However, there is a potential future path for ethernet services to come
      through this device as well.
      
      The Core device is a new PCI PF/VF device managed by a new driver
      'pds_core'.  The PF device has access to an admin queue for configuring
      the services used by the VFs, and sets up auxiliary_bus devices for each
      vDPA VF for communicating with the drivers for the vDPA devices.  The VFs
      may be for VFio or vDPA, and other services in the future; these VF types
      are selected as part of the DSC internal FW configurations, which is out
      of the scope of this patchset.
      
      When the vDPA support set is enabled in the core PF through its devlink
      param, auxiliary_bus devices are created for each VF that supports the
      feature.  The vDPA driver then connects to and uses this auxiliary_device
      to do control path configuration through the PF device.  This can then be
      used with the vdpa kernel module to provide devices for virtio_vdpa kernel
      module for host interfaces, or vhost_vdpa kernel module for interfaces
      exported into your favorite VM.
      
      A cheap ASCII diagram of a vDPA instance looks something like this:
      
                                      ,----------.
                                      |   vdpa   |
                                      '----------'
                                        |     ||
                                       ctl   data
                                        |     ||
                                .----------.  ||
                                | pds_vdpa |  ||
                                '----------'  ||
                                     |        ||
                             pds_core.vDPA.1  ||
                                     |        ||
                          .---------------.   ||
                          |   pds_core    |   ||
                          '---------------'   ||
                              ||         ||   ||
                            09:00.0      09:00.1
              == PCI ============================================
                              ||            ||
                         .----------.   .----------.
                  ,------|    PF    |---|    VF    |-------,
                  |      '----------'   '----------'       |
                  |                  DSC                   |
                  |                                        |
                  ------------------------------------------
      
      Changes:
        v11:
       - change strncpy to strscpy
      Reported-by: default avatarkernel test robot <lkp@intel.com>
           Link: https://lore.kernel.org/oe-kbuild-all/202304181137.WaZTYyAa-lkp@intel.com/
      
        v10:
      Link: https://lore.kernel.org/netdev/20230418003228.28234-1-shannon.nelson@amd.com/
       - remove CONFIG_DEBUG_FS guard static inline stuff
       - remove unnecessary 0 and null initializations
       - verify in driver load that PDS_CORE_DRV_NAME matches KBUILD_MODNAME
       - remove debugfs irqs_show(), redundant with /proc
       - return -ENOMEM if intr_info = kcalloc() fails
       - move the status code enum into pds_core_if.h as part of API definition
       - fix up one place in pdsc_devcmd_wait() we're using the status codes where we could use the errno
       - remove redundant calls to flush_workqueue()
       - grab config_lock before testing state bits in pdsc_fw_reporter_diagnose()
       - change pdsc_color_match() to return bool
       - remove useless VIF setup loop and just setup vDPA services for now
       - remove pf pointer from struct padev and have clients use pci_physfn()
       - drop use of "vf" in auxdev.c function names, make more generic
       - remove last of client ops struct and simply export the functions
       - drop drivers@pensando.io from MAINTAINERS and add new include dir
       - include dynamic_debug.h in adminq.c to protect dynamic_hex_dump()
       - fixed fw_slot type from u8 to int for handling error returns
       - fixed comment spelling
       - changed void arg in pdsc_adminq_post() to struct pdsc *
      
        v9:
      Link: https://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/
       - change pdsc field name id to uid to clarify the unique id used for aux device
       - remove unnecessary pf->state and other checks in aux device creation
       - hardcode fw slotnames for devlink info, don't use strings from FW
       - handle errors from PDS_CORE_CMD_INIT devcmd call
       - tighten up health thread use of config_lock
       - remove pdsc_queue_health_check() layer over queuing health check
       - start pds_core.rst file in first patch, add to it incrementally
       - give more user interaction info in commit messages
       - removed a few more extraneous includes
      
        v8:
      Link: https://lore.kernel.org/netdev/20230330234628.14627-1-shannon.nelson@amd.com/
       - fixed deadlock problem, use devl_health_reporter_destroy() when devlink is locked
       - don't clear client_id until after auxiliary_device_uninit()
      
        v7:
      Link: https://lore.kernel.org/netdev/20230330192313.62018-1-shannon.nelson@amd.com/
       - use explicit devlink locking and devl_* APIs
       - move some of devlink setup logic into probe and remove
       - use debugfs_create_u{type}() for state and queue head and tail
       - add include for linux/vmalloc.h
      Reported-by: default avatarkernel test robot <lkp@intel.com>
           Link: https://lore.kernel.org/oe-kbuild-all/202303260420.Tgq0qobF-lkp@intel.com/
      
        v6:
      Link: https://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/
       - removed version.h include noticed by kernel test robot's version check
      Reported-by: default avatarkernel test robot <lkp@intel.com>
           Link: https://lore.kernel.org/oe-kbuild-all/202303230742.pX3ply0t-lkp@intel.com/
       - fixed up the more egregious checkpatch line length complaints
       - make sure pdsc_auxbus_dev_register() checks padev pointer errcode
      
        v5:
      Link: https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nelson@amd.com/
       - added devlink health reporter for FW issues
       - removed asic_type, asic_rev, serial_num, fw_version from debugfs as
         they are available through other means
       - trimed OS info in pdsc_identify(), we don't need to send that much info to the FW
       - removed reg/unreg from auxbus client API, they are now in the core when VF
         is started
       - removed need for pdsc definition in client by simplifying the padev to only carry
         struct pci_dev pointers rather than full struct pdsc to the pf and vf
       - removed the unused pdsc argument in pdsc_notify()
       - moved include/linux/pds/pds_core.h to driver/../pds_core/core.h
       - restored a few pds_core_if.h interface values and structs that are shared
         with FW source
       - moved final config_lock unlock to before tear down of timer and workqueue
         to be sure there are no deadlocks while waiting for any stragglers
       - changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
         without regard to kernel PAGE_SIZE configuration
       - removed the redundant *adminqcq argument from pdsc_adminq_post()
      
        v4:
      Link: https://lore.kernel.org/netdev/20230308051310.12544-1-shannon.nelson@amd.com/
       - reworked to attach to both Core PF and vDPA VF PCI devices
       - now creates auxiliary_device as part of each VF PCI probe, removes them on PCI remove
       - auxiliary devices now use simple unique id rather than PCI address for identifier
       - replaced home-grown event publishing with kernel-based notifier service
       - dropped live_migration parameter, not needed when not creating aux device for it
       - replaced devm_* functions with traditional interfaces
       - added MAINTAINERS entry
       - removed lingering traces of set/get_vf attribute adminq commands
       - trimmed some include lists
       - cleaned a kernel test robot complaint about a stray unused variable
              Link: https://lore.kernel.org/oe-kbuild-all/202302181049.yeUQMeWY-lkp@intel.com/
      
        v3:
      Link: https://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/
       - changed names from "pensando" to "amd" and updated copyright strings
       - dropped the DEVLINK_PARAM_GENERIC_ID_FW_BANK for future development
       - changed the auxiliary device creation to be triggered by the
         PCI bus event BOUND_DRIVER, and torn down at UNBIND_DRIVER in order
         to properly handle users using the sysfs bind/unbind functions
       - dropped some noisy log messages
       - rebased to current net-next
      
        RFC to v2:
      Link: https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
       - added separate devlink param patches for DEVLINK_PARAM_GENERIC_ID_ENABLE_MIGRATION
         and DEVLINK_PARAM_GENERIC_ID_FW_BANK, and dropped the driver specific implementations
       - updated descriptions for the new devlink parameters
       - dropped netdev support
       - dropped vDPA patches, will followup later
       - separated fw update and fw bank select into their own patches
      
        RFC:
      Link: https://lore.kernel.org/netdev/20221118225656.48309-1-snelson@pensando.io/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8bb3824
    • Shannon Nelson's avatar
      pds_core: Kconfig and pds_core.rst · ddbcb220
      Shannon Nelson authored
      Remaining documentation and Kconfig hook for building the driver.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddbcb220
    • Shannon Nelson's avatar
      pds_core: publish events to the clients · d24c2827
      Shannon Nelson authored
      When the Core device gets an event from the device, or notices
      the device FW to be up or down, it needs to send those events
      on to the clients that have an event handler.  Add the code to
      pass along the events to the clients.
      
      The entry points pdsc_register_notify() and pdsc_unregister_notify()
      are EXPORTed for other drivers that want to listen for these events.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d24c2827
    • Shannon Nelson's avatar
      pds_core: add the aux client API · 10659034
      Shannon Nelson authored
      Add the client API operations for running adminq commands.
      The core registers the client with the FW, then the client
      has a context for requesting adminq services.  We expect
      to add additional operations for other clients, including
      requesting additional private adminqs and IRQs, but don't have
      the need yet.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10659034
    • Shannon Nelson's avatar
      pds_core: devlink params for enabling VIF support · 40ced894
      Shannon Nelson authored
      Add the devlink parameter switches so the user can enable
      the features supported by the VFs.  The only feature supported
      at the moment is vDPA.
      
      Example:
          devlink dev param set pci/0000:2b:00.0 \
      	    name enable_vnet cmode runtime value true
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40ced894
    • Shannon Nelson's avatar
      pds_core: add auxiliary_bus devices · 4569cce4
      Shannon Nelson authored
      An auxiliary_bus device is created for each vDPA type VF at VF
      probe and destroyed at VF remove.  The aux device name comes
      from the driver name + VIF type + the unique id assigned at PCI
      probe.  The VFs are always removed on PF remove, so there should
      be no issues with VFs trying to access missing PF structures.
      
      The auxiliary_device names will look like "pds_core.vDPA.nn"
      where 'nn' is the VF's uid.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4569cce4
    • Shannon Nelson's avatar
      pds_core: add initial VF device handling · f53d9311
      Shannon Nelson authored
      This is the initial VF PCI driver framework for the new
      pds_vdpa VF device, which will work in conjunction with an
      auxiliary_bus client of the pds_core driver.  This does the
      very basics of registering for the new VF device, setting
      up debugfs entries, and registering with devlink.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f53d9311
    • Shannon Nelson's avatar
      pds_core: set up the VIF definitions and defaults · 65e0185a
      Shannon Nelson authored
      The Virtual Interfaces (VIFs) supported by the DSC's
      configuration (vDPA, Eth, RDMA, etc) are reported in the
      dev_ident struct and made visible in debugfs.  At this point
      only vDPA is supported in this driver so we only setup
      devices for that feature.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65e0185a
    • Shannon Nelson's avatar
      pds_core: add FW update feature to devlink · 49ce92fb
      Shannon Nelson authored
      Add in the support for doing firmware updates.  Of the two
      main banks available, a and b, this updates the one not in
      use and then selects it for the next boot.
      
      Example:
          devlink dev flash pci/0000:b2:00.0 \
      	    file pensando/dsc_fw_1.63.0-22.tar
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49ce92fb
    • Shannon Nelson's avatar
      pds_core: Add adminq processing and commands · 01ba61b5
      Shannon Nelson authored
      Add the service routines for submitting and processing
      the adminq messages and for handling notifyq events.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01ba61b5
    • Shannon Nelson's avatar
      pds_core: set up device and adminq · 45d76f49
      Shannon Nelson authored
      Set up the basic adminq and notifyq queue structures.  These are
      used mostly by the client drivers for feature configuration.
      These are essentially the same adminq and notifyq as in the
      ionic driver.
      
      Part of this includes querying for device identity and FW
      information, so we can make that available to devlink dev info.
      
        $ devlink dev info pci/0000:b5:00.0
        pci/0000:b5:00.0:
          driver pds_core
          serial_number FLM18420073
          versions:
              fixed:
                asic.id 0x0
                asic.rev 0x0
              running:
                fw 1.51.0-73
              stored:
                fw.goldfw 1.15.9-C-22
                fw.mainfwa 1.60.0-73
                fw.mainfwb 1.60.0-57
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45d76f49
    • Shannon Nelson's avatar
      pds_core: add devlink health facilities · 25b450c0
      Shannon Nelson authored
      Add devlink health reporting on top of our fw watchdog.
      
      Example:
        # devlink health show pci/0000:2b:00.0 reporter fw
        pci/0000:2b:00.0:
          reporter fw
            state healthy error 0 recover 0
        # devlink health diagnose pci/0000:2b:00.0 reporter fw
         Status: healthy State: 1 Generation: 0 Recoveries: 0
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25b450c0
    • Shannon Nelson's avatar
      pds_core: health timer and workqueue · c2dbb090
      Shannon Nelson authored
      Add in the periodic health check and the related workqueue,
      as well as the handlers for when a FW reset is seen.
      
      The firmware is polled every 5 seconds to be sure that it is
      still alive and that the FW generation didn't change.
      
      The alive check looks to see that the PCI bus is still readable
      and the fw_status still has the RUNNING bit on.  If not alive,
      the driver stops activity and tears things down.  When the FW
      recovers and the alive check again succeeds, the driver sets
      back up for activity.
      
      The generation check looks at the fw_generation to see if it
      has changed, which can happen if the FW crashed and recovered
      or was updated in between health checks.  If changed, the
      driver counts that as though the alive test failed and forces
      the fw_down/fw_up cycle.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2dbb090
    • Shannon Nelson's avatar
      pds_core: add devcmd device interfaces · 523847df
      Shannon Nelson authored
      The devcmd interface is the basic connection to the device through the
      PCI BAR for low level identification and command services.  This does
      the early device initialization and finds the identity data, and adds
      devcmd routines to be used by later driver bits.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      523847df
    • Shannon Nelson's avatar
      pds_core: initial framework for pds_core PF driver · 55435ea7
      Shannon Nelson authored
      This is the initial PCI driver framework for the new pds_core device
      driver and its family of devices.  This does the very basics of
      registering for the new PF PCI device 1dd8:100c, setting up debugfs
      entries, and registering with devlink.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55435ea7
    • David S. Miller's avatar
      Merge branch 'bridge-neigh-suppression' · 25c800b2
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      bridge: Add per-{Port, VLAN} neighbor suppression
      
      Background
      ==========
      
      In order to minimize the flooding of ARP and ND messages in the VXLAN
      network, EVPN includes provisions [1] that allow participating VTEPs to
      suppress such messages in case they know the MAC-IP binding and can
      reply on behalf of the remote host. In Linux, the above is implemented
      in the bridge driver using a per-port option called "neigh_suppress"
      that was added in kernel version 4.15 [2].
      
      Motivation
      ==========
      
      Some applications use ARP messages as keepalives between the application
      nodes in the network. This works perfectly well when two nodes are
      connected to the same VTEP. When a node goes down it will stop
      responding to ARP requests and the other node will notice it
      immediately.
      
      However, when the two nodes are connected to different VTEPs and
      neighbor suppression is enabled, the local VTEP will reply to ARP
      requests even after the remote node went down, until certain timers
      expire and the EVPN control plane decides to withdraw the MAC/IP
      Advertisement route for the address. Therefore, some users would like to
      be able to disable neighbor suppression on VLANs where such applications
      reside and keep it enabled on the rest.
      
      Implementation
      ==============
      
      The proposed solution is to allow user space to control neighbor
      suppression on a per-{Port, VLAN} basis, in a similar fashion to other
      per-port options that gained per-{Port, VLAN} counterparts such as
      "mcast_router". This allows users to benefit from the operational
      simplicity and scalability associated with shared VXLAN devices (i.e.,
      external / collect-metadata mode), while still allowing for per-VLAN/VNI
      neighbor suppression control.
      
      The user interface is extended with a new "neigh_vlan_suppress" bridge
      port option that allows user space to enable per-{Port, VLAN} neighbor
      suppression on the bridge port. When enabled, the existing
      "neigh_suppress" option has no effect and neighbor suppression is
      controlled using a new "neigh_suppress" VLAN option. Example usage:
      
       # bridge link set dev vxlan0 neigh_vlan_suppress on
       # bridge vlan add vid 10 dev vxlan0
       # bridge vlan set vid 10 dev vxlan0 neigh_suppress on
      
      Testing
      =======
      
      Tested using existing bridge selftests. Added a dedicated selftest in
      the last patch.
      
      Patchset overview
      =================
      
      Patches #1-#5 are preparations.
      
      Patch #6 adds per-{Port, VLAN} neighbor suppression support to the
      bridge's data path.
      
      Patches #7-#8 add the required netlink attributes to enable the feature.
      
      Patch #9 adds a selftest.
      
      iproute2 patches can be found here [3].
      
      Changelog
      =========
      
      Since RFC [4]:
      
      No changes.
      
      [1] https://www.rfc-editor.org/rfc/rfc7432#section-10
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a42317785c898c0ed46db45a33b0cc71b671bf29
      [3] https://github.com/idosch/iproute2/tree/submit/neigh_suppress_v1
      [4] https://lore.kernel.org/netdev/20230413095830.2182382-1-idosch@nvidia.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25c800b2
    • Ido Schimmel's avatar
      selftests: net: Add bridge neighbor suppression test · 7648ac72
      Ido Schimmel authored
      Add test cases for bridge neighbor suppression, testing both per-port
      and per-{Port, VLAN} neighbor suppression with both ARP and NS packets.
      
      Example truncated output:
      
       # ./test_bridge_neigh_suppress.sh
       [...]
       Tests passed: 148
       Tests failed:   0
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7648ac72
    • Ido Schimmel's avatar
      bridge: Allow setting per-{Port, VLAN} neighbor suppression state · 160656d7
      Ido Schimmel authored
      Add a new bridge port attribute that allows user space to enable
      per-{Port, VLAN} neighbor suppression. Example:
      
       # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
       false
       # bridge link set dev swp1 neigh_vlan_suppress on
       # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
       true
       # bridge link set dev swp1 neigh_vlan_suppress off
       # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
       false
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      160656d7
    • Ido Schimmel's avatar
      bridge: vlan: Allow setting VLAN neighbor suppression state · 83f6d600
      Ido Schimmel authored
      Add a new VLAN attribute that allows user space to set the neighbor
      suppression state of the port VLAN. Example:
      
       # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
       false
       # bridge vlan set vid 10 dev swp1 neigh_suppress on
       # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
       true
       # bridge vlan set vid 10 dev swp1 neigh_suppress off
       # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
       false
      
       # bridge vlan set vid 10 dev br0 neigh_suppress on
       Error: bridge: Can't set neigh_suppress for non-port vlans.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83f6d600
    • Ido Schimmel's avatar
      bridge: Add per-{Port, VLAN} neighbor suppression data path support · 412614b1
      Ido Schimmel authored
      When the bridge is not VLAN-aware (i.e., VLAN ID is 0), determine if
      neighbor suppression is enabled on a given bridge port solely based on
      the existing 'BR_NEIGH_SUPPRESS' flag.
      
      Otherwise, if the bridge is VLAN-aware, first check if per-{Port, VLAN}
      neighbor suppression is enabled on the given bridge port using the
      'BR_NEIGH_VLAN_SUPPRESS' flag. If so, look up the VLAN and check whether
      it has neighbor suppression enabled based on the per-VLAN
      'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED' flag.
      
      If the bridge is VLAN-aware, but the bridge port does not have
      per-{Port, VLAN} neighbor suppression enabled, then fallback to
      determine neighbor suppression based on the 'BR_NEIGH_SUPPRESS' flag.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      412614b1
    • Ido Schimmel's avatar
      bridge: Encapsulate data path neighbor suppression logic · 3aca683e
      Ido Schimmel authored
      Currently, there are various places in the bridge data path that check
      whether neighbor suppression is enabled on a given bridge port.
      
      As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate
      this logic in a function and pass the VLAN ID of the packet as an
      argument.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aca683e
    • Ido Schimmel's avatar
      bridge: Take per-{Port, VLAN} neighbor suppression into account · 6be42ed0
      Ido Schimmel authored
      The bridge driver gates the neighbor suppression code behind an internal
      per-bridge flag called 'BROPT_NEIGH_SUPPRESS_ENABLED'. The flag is set
      when at least one bridge port has neighbor suppression enabled.
      
      As a preparation for per-{Port, VLAN} neighbor suppression, make sure
      the global flag is also set if per-{Port, VLAN} neighbor suppression is
      enabled. That is, when the 'BR_NEIGH_VLAN_SUPPRESS' flag is set on at
      least one bridge port.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6be42ed0
    • Ido Schimmel's avatar
      bridge: Add internal flags for per-{Port, VLAN} neighbor suppression · a714e3ec
      Ido Schimmel authored
      Add two internal flags that will be used to enable / disable per-{Port,
      VLAN} neighbor suppression:
      
      1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that
      per-{Port, VLAN} neighbor suppression is enabled on the bridge port.
      When set, 'BR_NEIGH_SUPPRESS' has no effect.
      
      2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate
      that neighbor suppression is enabled on the given VLAN.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a714e3ec
    • Ido Schimmel's avatar
      bridge: Pass VLAN ID to br_flood() · e408336a
      Ido Schimmel authored
      Subsequent patches are going to add per-{Port, VLAN} neighbor
      suppression, which will require br_flood() to potentially suppress ARP /
      NS packets on a per-{Port, VLAN} basis.
      
      As a preparation, pass the VLAN ID of the packet as another argument to
      br_flood().
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e408336a
    • Ido Schimmel's avatar
      bridge: Reorder neighbor suppression check when flooding · 013a7ce8
      Ido Schimmel authored
      The bridge does not flood ARP / NS packets for which a reply was sent to
      bridge ports that have neighbor suppression enabled.
      
      Subsequent patches are going to add per-{Port, VLAN} neighbor
      suppression, which is going to make it more expensive to check whether
      neighbor suppression is enabled since a VLAN lookup will be required.
      
      Therefore, instead of unnecessarily performing this lookup for every
      packet, only perform it for ARP / NS packets for which a reply was sent.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      013a7ce8
    • David S. Miller's avatar
      Merge branch 'macsec-vlan' · 1cf3fe1c
      David S. Miller authored
      Emeel Hakim says:
      
      ====================
      Support MACsec VLAN
      
      This patch series introduces support for hardware (HW) offload MACsec
      devices with VLAN configuration. The patches address both scenarios
      where the VLAN header is both the inner and outer header for MACsec.
      
      The changes include:
      
      1. Adding MACsec offload operation for VLAN.
      2. Considering VLAN when accessing MACsec net device.
      3. Currently offloading MACsec when it's configured over VLAN with
      current MACsec TX steering rules would wrongly insert the MACsec sec tag
      after inserting the VLAN header. This resulted in an ETHERNET | SECTAG |
      VLAN packet when ETHERNET | VLAN | SECTAG is configured. The patche
      handles this issue when configuring steering rules.
      4. Adding MACsec rx_handler change support in case of a marked skb and a
      mismatch on the dst MAC address.
      
      Please review these changes and let me know if you have any feedback or
      concerns.
      
      Updates since v1:
      - Consult vlan_features when adding NETIF_F_HW_MACSEC.
      - Allow grep for the functions.
      - Add helper function to get the macsec operation to allow the compiler
        to make some choice.
      
      Updates since v2:
      - Don't use macros to allow direct navigattion from mdo functions to its
        implementation.
      - Make the vlan_get_macsec_ops argument a const.
      - Check if the specific mdo function is available before calling it.
      - Enable NETIF_F_HW_MACSEC by default when the lower device has it enabled
        and in case the lower device currently has NETIF_F_HW_MACSEC but disabled
        let the new vlan device also have it disabled.
      
      Updates since v3:
      - Split patch ("vlan: Add MACsec offload operations for VLAN interface")
        to prevent mixing generic vlan code changes with driver changes.
      - Add mdo_open, stop and stats to support drivers which have those.
      - Don't fail if macsec offload operations are available but a specific
        function is not, to support drivers which does not implement all
        macsec offload operations.
      - Don't call find_rx_sc twice in the same loop, instead save the result
        in a parameter and re-use it.
      - Completely remove _BUILD_VLAN_MACSEC_MDO macro, to prevent returning
        from a macro.
      - Reorder the functions inside struct macsec_ops to match the struct
        decleration.
      
       Updates since v4:
       - Change subject line of ("macsec: Add MACsec rx_handler change support") and adapt commit message.
       - Don't separate the new check in patch ("macsec: Add MACsec rx_handler change support")
         from the previous if/else if.
       - Drop"_found" from the parameter naming "rx_sc_found" and move the definition to
         the relevant block.
       - Remove "{}" since not needed around a single line.
      
       Updates since v5:
       - Consider promiscuous mode case.
      
       Updates since v6:
       - Use IS_ENABLED instead of checking for ifdef.
       - Don't add inline keywork in c files, let the compiler make its own decisions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cf3fe1c
    • Emeel Hakim's avatar
      macsec: Don't rely solely on the dst MAC address to identify destination MACsec device · 7661351a
      Emeel Hakim authored
      Offloading device drivers will mark offloaded MACsec SKBs with the
      corresponding SCI in the skb_metadata_dst so the macsec rx handler will
      know to which interface to divert those skbs, in case of a marked skb
      and a mismatch on the dst MAC address, divert the skb to the macsec
      net_device where the macsec rx_handler will be called to consider cases
      where relying solely on the dst MAC address is insufficient.
      
      One such instance is when using MACsec with a VLAN as an inner
      header, where the packet structure is ETHERNET | SECTAG | VLAN.
      In such a scenario, the dst MAC address in the ethernet header
      will correspond to the VLAN MAC address, resulting in a mismatch.
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7661351a
    • Emeel Hakim's avatar
      net/mlx5: Consider VLAN interface in MACsec TX steering rules · 765f974c
      Emeel Hakim authored
      Offloading MACsec when its configured over VLAN with current MACsec
      TX steering rules will wrongly insert MACsec sec tag after inserting
      the VLAN header leading to a ETHERNET | SECTAG | VLAN packet when
      ETHERNET | VLAN | SECTAG is configured.
      
      The above issue is due to adding the SECTAG by HW which is a later
      stage compared to the VLAN header insertion stage.
      
      Detect such a case and adjust TX steering rules to insert the
      SECTAG in the correct place by using reformat_param_0 field in
      the packet reformat to indicate the offset of SECTAG from end of
      the MAC header to account for VLANs in granularity of 4Bytes.
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765f974c
    • Emeel Hakim's avatar
      net/mlx5: Support MACsec over VLAN · 4bba492b
      Emeel Hakim authored
      MACsec device may have a VLAN device on top of it.
      Detect MACsec state correctly under this condition,
      and return the correct net device accordingly.
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bba492b
    • Emeel Hakim's avatar
      net/mlx5: Enable MACsec offload feature for VLAN interface · 339ccec8
      Emeel Hakim authored
      Enable MACsec offload feature over VLAN by adding NETIF_F_HW_MACSEC
      to the device vlan_features.
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      339ccec8
    • Emeel Hakim's avatar
      vlan: Add MACsec offload operations for VLAN interface · abff3e5e
      Emeel Hakim authored
      Add support for MACsec offload operations for VLAN driver
      to allow offloading MACsec when VLAN's real device supports
      Macsec offload by forwarding the offload request to it.
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abff3e5e
    • David S. Miller's avatar
      Merge branch 'sctp-nested-flex-arrays' · e2598dbd
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: fix a plenty of flexible-array-nested warnings
      
      Paolo noticed a compile warning in SCTP,
      
      ../net/sctp/stream_sched_fc.c: note: in included file (through ../include/net/sctp/sctp.h):
      ../include/net/sctp/structs.h:335:41: warning: array of flexible structures
      
      But not only this, there are actually quite a lot of such warnings in
      some SCTP structs. This patchset fixes most of warnings by deleting
      these nested flexible array members.
      
      After this patchset, there are still some warnings left:
      
        # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
        ./include/net/sctp/structs.h:1145:41: warning: nested flexible array
        ./include/uapi/linux/sctp.h:641:34: warning: nested flexible array
        ./include/uapi/linux/sctp.h:643:34: warning: nested flexible array
        ./include/uapi/linux/sctp.h:644:33: warning: nested flexible array
        ./include/uapi/linux/sctp.h:650:40: warning: nested flexible array
        ./include/uapi/linux/sctp.h:653:39: warning: nested flexible array
      
      the 1st is caused by __data[] in struct ip_options, not in SCTP;
      the others are in uapi, and we should not touch them.
      
      Note that instead of completely deleting it, we just leave it as a
      comment in the struct, signalling to the reader that we do expect
      such variable parameters over there, as Marcelo suggested.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2598dbd
    • Xin Long's avatar
      sctp: delete the nested flexible array payload · dbda0fba
      Xin Long authored
      This patch deletes the flexible-array payload[] from the structure
      sctp_datahdr to avoid some sparse warnings:
      
        # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
        net/sctp/socket.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
        ./include/linux/sctp.h:230:29: warning: nested flexible array
      
      This member is not even used anywhere.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbda0fba
    • Xin Long's avatar
      sctp: delete the nested flexible array hmac · 2ab399a9
      Xin Long authored
      This patch deletes the flexible-array hmac[] from the structure
      sctp_authhdr to avoid some sparse warnings:
      
        # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
        net/sctp/auth.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
        ./include/linux/sctp.h:735:29: warning: nested flexible array
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ab399a9