1. 16 Dec, 2021 17 commits
  2. 15 Dec, 2021 23 commits
    • luo penghao's avatar
      netfilter: conntrack: Remove useless assignment statements · 284ca764
      luo penghao authored
      The old_size assignment here will not be used anymore
      
      The clang_analyzer complains as follows:
      
      Value stored to 'old_size' is never read
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Signed-off-by: default avatarluo penghao <luo.penghao@zte.com.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      284ca764
    • Shay Drory's avatar
      net/mlx5: Introduce log_max_current_uc_list_wr_supported bit · 685b1afd
      Shay Drory authored
      Downstream patch will use this bit in order to know whether the device
      supports changing of max_uc_list.
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      685b1afd
    • Jesse Brandeburg's avatar
      ice: use modern kernel API for kick · 9c99d099
      Jesse Brandeburg authored
      The kernel gained a new interface for drivers to use to combine tail
      bump (doorbell) and BQL updates, attempt to use those new interfaces.
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9c99d099
    • Jesse Brandeburg's avatar
      ice: tighter control over VSI_DOWN state · 21c6e36b
      Jesse Brandeburg authored
      The driver had comments to the effect of: This flag should be set before
      calling this function. While reviewing code it was found that there were
      several violations of this policy, which could introduce hard to find
      bugs or races.
      
      Fix the violations of the "VSI DOWN state must be set before calling
      ice_down" and make checking the state into code with a WARN_ON.
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      21c6e36b
    • Jesse Brandeburg's avatar
      ice: use prefetch methods · cc14db11
      Jesse Brandeburg authored
      The kernel provides some prefetch mechanisms to speed up commonly
      cold cache line accesses during receive processing. Since these are
      software structures it helps to have these strategically placed
      prefetches.
      
      Be careful to call BQL prefetch complete only for non XDP queues.
      Co-developed-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Signed-off-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      cc14db11
    • Jesse Brandeburg's avatar
      ice: update to newer kernel API · 1c96c168
      Jesse Brandeburg authored
      Use the netif_tx_* API from netdevice.h which has simpler parameters.
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1c96c168
    • Jacob Keller's avatar
      ice: support immediate firmware activation via devlink reload · 399e27db
      Jacob Keller authored
      The ice hardware contains an embedded chip with firmware which can be
      updated using devlink flash. The firmware which runs on this chip is
      referred to as the Embedded Management Processor firmware (EMP
      firmware).
      
      Activating the new firmware image currently requires that the system be
      rebooted. This is not ideal as rebooting the system can cause unwanted
      downtime.
      
      In practical terms, activating the firmware does not always require a
      full system reboot. In many cases it is possible to activate the EMP
      firmware immediately. There are a couple of different scenarios to
      cover.
      
       * The EMP firmware itself can be reloaded by issuing a special update
         to the device called an Embedded Management Processor reset (EMP
         reset). This reset causes the device to reset and reload the EMP
         firmware.
      
       * PCI configuration changes are only reloaded after a cold PCIe reset.
         Unfortunately there is no generic way to trigger this for a PCIe
         device without a system reboot.
      
      When performing a flash update, firmware is capable of responding with
      some information about the specific update requirements.
      
      The driver updates the flash by programming a secondary inactive bank
      with the contents of the new image, and then issuing a command to
      request to switch the active bank starting from the next load.
      
      The response to the final command for updating the inactive NVM flash
      bank includes an indication of the minimum reset required to fully
      update the device. This can be one of the following:
      
       * A full power on is required
       * A cold PCIe reset is required
       * An EMP reset is required
      
      The response to the command to switch flash banks includes an indication
      of whether or not the firmware will allow an EMP reset request.
      
      For most updates, an EMP reset is sufficient to load the new EMP
      firmware without issues. In some cases, this reset is not sufficient
      because the PCI configuration space has changed. When this could cause
      incompatibility with the new EMP image, the firmware is capable of
      rejecting the EMP reset request.
      
      Add logic to ice_fw_update.c to handle the response data flash update
      AdminQ commands.
      
      For the reset level, issue a devlink status notification informing the
      user of how to complete the update with a simple suggestion like
      "Activate new firmware by rebooting the system".
      
      Cache the status of whether or not firmware will restrict the EMP reset
      for use in implementing devlink reload.
      
      Implement support for devlink reload with the "fw_activate" flag. This
      allows user space to request the firmware be activated immediately.
      
      For the .reload_down handler, we will issue a request for the EMP reset
      using the appropriate firmware AdminQ command. If we know that the
      firmware will not allow an EMP reset, simply exit with a suitable
      netlink extended ACK message indicating that the EMP reset is not
      available.
      
      For the .reload_up handler, simply wait until the driver has finished
      resetting. Logic to handle processing of an EMP reset already exists in
      the driver as part of its reset and rebuild flows.
      
      Implement support for the devlink reload interface with the
      "fw_activate" action. This allows userspace to request activation of
      firmware without a reboot.
      
      Note that support for indicating the required reset and EMP reset
      restriction is not supported on old versions of firmware. The driver can
      determine if the two features are supported by checking the device
      capabilities report. I confirmed support has existed since at least
      version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
      EMP reset request has existed in all version of the EMP firmware for the
      ice hardware.
      
      Check the device capabilities report to determine whether or not the
      indications are reported by the running firmware. If the reset
      requirement indication is not supported, always assume a full power on
      is necessary. If the reset restriction capability is not supported,
      always assume the EMP reset is available.
      
      Users can verify if the EMP reset has activated the firmware by using
      the devlink info report to check that the 'running' firmware version has
      updated. For example a user might do the following:
      
       # Check current version
       $ devlink dev info
      
       # Update the device
       $ devlink dev flash pci/0000:af:00.0 file firmware.bin
      
       # Confirm stored version updated
       $ devlink dev info
      
       # Reload to activate new firmware
       $ devlink dev reload pci/0000:af:00.0 action fw_activate
      
       # Confirm running version updated
       $ devlink dev info
      
      Finally, this change does *not* implement basic driver-only reload
      support. I did look into trying to do this. However, it requires
      significant refactor of how the ice driver probes and loads everything.
      The ice driver probe and allocation flows were not designed with such
      a reload in mind. Refactoring the flow to support this is beyond the
      scope of this change.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      399e27db
    • Jacob Keller's avatar
      ice: reduce time to read Option ROM CIVD data · af18d886
      Jacob Keller authored
      During probe and device reset, the ice driver reads some data from the
      NVM image as part of ice_init_nvm. Part of this data includes a section
      of the Option ROM which contains version information.
      
      The function ice_get_orom_civd_data is used to locate the '$CIV' data
      section of the Option ROM.
      
      Timing of ice_probe and ice_rebuild indicate that the
      ice_get_orom_civd_data function takes about 10 seconds to finish
      executing.
      
      The function locates the section by scanning the Option ROM every 512
      bytes. This requires a significant number of NVM read accesses, since
      the Option ROM bank is 500KB. In the worst case it would take about 1000
      reads. Worse, all PFs serialize this operation during reload because of
      acquiring the NVM semaphore.
      
      The CIVD section is located at the end of the Option ROM image data.
      Unfortunately, the driver has no easy method to determine the offset
      manually. Practical experiments have shown that the data could be at
      a variety of locations, so simply reversing the scanning order is not
      sufficient to reduce the overall read time.
      
      Instead, copy the entire contents of the Option ROM into memory. This
      allows reading the data using 4Kb pages instead of 512 bytes at a time.
      This reduces the total number of firmware commands by a factor of 8. In
      addition, reading the whole section together at once allows better
      indication to firmware of when we're "done".
      
      Re-write ice_get_orom_civd_data to allocate virtual memory to store the
      Option ROM data. Copy the entire OptionROM contents at once using
      ice_read_flash_module. Finally, use this memory copy to scan for the
      '$CIV' section.
      
      This change significantly reduces the time to read the Option ROM CIVD
      section from ~10 seconds down to ~1 second. This has a significant
      impact on the total time to complete a driver rebuild or probe.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      af18d886
    • Jacob Keller's avatar
      ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image · c9f7a483
      Jacob Keller authored
      The ice_devlink_flash_update function performs a few upfront checks and
      then calls ice_flash_pldm_image.
      
      Most if these checks make more sense in the context of code within
      ice_flash_pldm_image. Merge ice_devlink_flash_update and
      ice_flash_pldm_image into one function, placing it in ice_fw_update.c
      
      Since this is still the entry point for devlink, call the function
      ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a
      single function which handles the devlink parameters and then initiates
      a PLDM update.
      
      With this change, the ice_devlink_flash_update function in
      ice_fw_update.c becomes the main entry point for flash update. It
      elimintes some unnecessary boiler plate code between the two previous
      functions. The ultimate motivation for this is that it eases supporting
      a dry run with the PLDM library in a future change.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c9f7a483
    • Jacob Keller's avatar
      ice: move and rename ice_check_for_pending_update · c356eaa8
      Jacob Keller authored
      The ice_devlink_flash_update function performs a few checks and then
      calls ice_flash_pldm_image. One of these checks is to call
      ice_check_for_pending_update. This function checks if the device has
      a pending update, and cancels it if so. This is necessary to allow
      a new flash update to proceed.
      
      We want to refactor the ice code to eliminate ice_devlink_flash_update,
      moving its checks into ice_flash_pldm_image.
      
      To do this, ice_check_for_pending_update will become static, and only
      called by ice_flash_pldm_image. To make this change easier to review,
      first just move the function up within the ice_fw_update.c file.
      
      While at it, note that the function has a misleading name. Its primary
      action is to cancel a pending update. Using the verb "check" does not
      imply this. Rename it to ice_cancel_pending_update.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c356eaa8
    • Jacob Keller's avatar
      ice: devlink: add shadow-ram region to snapshot Shadow RAM · 78ad87da
      Jacob Keller authored
      We have a region for reading the contents of the NVM flash as
      a snapshot. This region does not allow reading the Shadow RAM, as it
      always passes the FLASH_ONLY bit to the low level firmware interface.
      
      Add a separate shadow-ram region which will allow snapshot of the
      current contents of the Shadow RAM. This data is built from the NVM
      contents but is distinct as the device builds up the Shadow RAM during
      initialization, so being able to snapshot its contents can be useful
      when attempting to debug flash related issues.
      
      Fix the comment description of the nvm-flash region which incorrectly
      stated that it filled the shadow-ram region, and add a comment
      explaining that the nvm-flash region does not actually read the Shadow
      RAM.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      78ad87da
    • Jakub Kicinski's avatar
      ethtool: always write dev in ethnl_parse_header_dev_get · 3bc14ea0
      Jakub Kicinski authored
      Commit 0976b888 ("ethtool: fix null-ptr-deref on ref tracker")
      made the write to req_info.dev conditional, but as Eric points out
      in a different follow up the structure is often allocated on the
      stack and not kzalloc()'d so seems safer to always write the dev,
      in case it's garbage on input.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bc14ea0
    • Eric Dumazet's avatar
      net: add net device refcount tracker to struct packet_type · f1d9268e
      Eric Dumazet authored
      Most notable changes are in af_packet, tipc ones are trivial.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jon Maloy <jmaloy@redhat.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1d9268e
    • David S. Miller's avatar
      Merge branch 'mlxsw-ipv6-underlay' · ab8c83cf
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add support for VxLAN with IPv6 underlay
      
      So far, mlxsw only supported VxLAN with IPv4 underlay. This patchset
      extends mlxsw to also support VxLAN with IPv6 underlay. The main
      difference is related to the way IPv6 addresses are handled by the
      device. See patch #1 for a detailed explanation.
      
      Patch #1 creates a common hash table to store the mapping from IPv6
      addresses to KVDL indexes. This table is useful for both IP-in-IP and
      VxLAN tunnels with an IPv6 underlay.
      
      Patch #2 converts the IP-in-IP code to use the new hash table.
      
      Patches #3-#6 are preparations.
      
      Patch #7 finally adds support for VxLAN with IPv6 underlay.
      
      Patch #8 removes a test case that checked that VxLAN configurations with
      IPv6 underlay are vetoed by the driver.
      
      A follow-up patchset will add forwarding selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab8c83cf
    • Amit Cohen's avatar
      selftests: mlxsw: vxlan: Remove IPv6 test case · fb488be8
      Amit Cohen authored
      Currently, there is a test case to verify that VxLAN with IPv6 underlay
      is forbidden.
      
      Remove this test case as support for VxLAN with IPv6 underlay was added
      by the previous patch.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb488be8
    • Amit Cohen's avatar
      mlxsw: Add support for VxLAN with IPv6 underlay · 06c08f86
      Amit Cohen authored
      Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
      Add support for IPv6 underlay.
      
      The main differences are:
      
      * Learning is not supported for IPv6 FDB entries, use static entries and
        do not allow 'learning' flag for IPv6 VxLAN.
      
      * IPv6 addresses for FDB entries should be saved as part of KVDL.
        Use the new API to allocate and release entries for IPv6 addresses.
      
      * Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
        packets with checksum zero are dropped.
        Force the relevant flags which allow the VxLAN device to generate UDP
        packets with zero checksum and also receive them.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06c08f86
    • Amit Cohen's avatar
      mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entries · 0860c764
      Amit Cohen authored
      FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold
      a reference on a resource. Namely, the KVDL entry where the IPv6
      underlay destination IP is stored. When such an FDB entry is deleted, it
      needs to drop the reference from the corresponding KVDL entry.
      
      To that end, maintain a hash table that maps an FDB entry (i.e., {MAC,
      FID}) to the IPv6 address used by it.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0860c764
    • Amit Cohen's avatar
      mlxsw: reg: Add a function to fill IPv6 unicast FDB entries · 4b08c3e6
      Amit Cohen authored
      Add a function to fill IPv6 unicast FDB entries. Use the common function
      for common fields.
      
      Unlike IPv4 entries, the underlay IP address is not filled in the
      register payload, but instead a pointer to KVDL is used.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b08c3e6
    • Amit Cohen's avatar
      mlxsw: Split handling of FDB tunnel entries between address families · 1fd85416
      Amit Cohen authored
      Currently, the function which adds/removes unicast tunnel FDB entries is
      shared between IPv4 and IPv6, while for IPv6 it warns because there is
      no support for it.
      
      The code for IPv6 will be more complicated because it needs to
      allocate/release a KVDL pointer for the underlay IPv6 address.
      
      As a preparation for IPv6 underlay support, split the code according to
      address family.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fd85416
    • Amit Cohen's avatar
      mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address family · 720d683c
      Amit Cohen authored
      As part of 'can_offload' checks, there is a check of VxLAN flags.
      
      The supported flags for IPv6 VxLAN will be different from the existing
      flags because of some limitations.
      
      As preparation for IPv6 underlay support, make this check per address
      family.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      720d683c
    • Amit Cohen's avatar
      mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping · cf429115
      Amit Cohen authored
      Use the common hash table introduced by the previous patch instead of
      the IP-in-IP specific implementation.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf429115
    • Amit Cohen's avatar
      mlxsw: spectrum: Add hash table for IPv6 address mapping · e846efe2
      Amit Cohen authored
      The device supports forwarding entries such as routes and FDBs that
      perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation.
      When the underlay is IPv6, these entries do not encode the 128 bit IPv6
      address used for encapsulation / decapsulation. Instead, these entries
      encode a 24 bit pointer to an array called KVDL where the IPv6 address
      is stored.
      
      Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent
      patches will add support for VxLAN with IPv6 underlay. To avoid
      duplicating the logic required to store and retrieve these IPv6
      addresses, introduce a hash table that will store the mapping between
      IPv6 addresses and their KVDL index.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e846efe2
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f71f1bcb
      David S. Miller authored
      Saed Mahameed says:
      
      ====================
      mlx5-updates-2021-12-14
      
      Parsing Infrastructure for TC actions:
      
      The series introduce a TC action infrastructure to help
      parsing TC actions in a generic way for both FDB and NIC rules.
      
      To help maintain the parsing code of TC actions, we the parsing code to
      action parser per action TC type in separate files, instead of having one
      big switch case loop, duplicated between FDB and NIC parsers as before this
      patchset.
      
      Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler
      which has callbacks to check if the specific action is offload supported and
      to parse the specific action.
      
      We move each case (TC action) handling into the specific handler, which is
      responsible for parsing and determining if the action is supported.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f71f1bcb