1. 21 Jul, 2020 40 commits
    • Arthur Kiyanovski's avatar
      net: ena: support new LLQ acceleration mode · 0e3a3f6d
      Arthur Kiyanovski authored
      New devices add a new hardware acceleration engine, which adds some
      restrictions to the driver.
      Metadata descriptor must be present for each packet and the maximum
      burst size between two doorbells is now limited to a number
      advertised by the device.
      
      This patch adds:
      1. A handshake protocol between the driver and the device, so the
      device will enable the accelerated queues only when both sides
      support it.
      
      2. The driver support for the new acceleration engine:
      2.1. Send metadata descriptor for each Tx packet.
      2.2. Limit the number of packets sent between doorbells.(*)
      
      (*) A previous driver implementation of this feature was comitted in
      commit 05d62ca2 ("net: ena: add handling of llq max tx burst size")
      however the design of the interface between the driver and device
      changed since then. This change is reflected in this commit.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e3a3f6d
    • Arthur Kiyanovski's avatar
      net: ena: move llq configuration from ena_probe to ena_device_init() · c29efeae
      Arthur Kiyanovski authored
      When the ENA device resets to recover from some error state, all LLQ
      configuration values are reset to their defaults, because LLQ is
      initialized only once during ena_probe().
      
      Changes in this commit:
      1. Move the LLQ configuration process into ena_init_device()
      which is called from both ena_probe() and ena_restore_device(). This
      way, LLQ setup configurations that are different from the default
      values will survive resets.
      
      2. Extract the LLQ bar mapping to ena_map_llq_bar(),
      and call once in the lifetime of the driver from ena_probe(),
      since there is no need to unmap and map the LLQ bar again every reset.
      
      3. Map the LLQ bar if it exists, regardless if initialization of LLQ
      placement policy (ENA_ADMIN_PLACEMENT_POLICY_DEV) succeeded
      or not. Initialization might fail the first time, falling back to the
      ENA_ADMIN_PLACEMENT_POLICY_HOST placement policy, but later succeed
      after device reset, in which case the LLQ bar needs to be mapped
      already.
      Signed-off-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29efeae
    • Arthur Kiyanovski's avatar
      net: ena: enable support of rss hash key and function changes · 0ee60edf
      Arthur Kiyanovski authored
      Add the rss_configurable_function_key bit to driver_supported_feature.
      
      This bit tells the device that the driver in question supports the
      retrieving and updating of RSS function and hash key, and therefore
      the device should allow RSS function and key manipulation.
      
      This commit turns on  device support for hash key and RSS function
      management. Without this commit this feature is turned off at the
      device and appears to the user as unsupported.
      
      This commit concludes the following series of already merged commits:
      commit 0af3c4e2 ("net: ena: changes to RSS hash key allocation")
      commit c1bd17e5 ("net: ena: change default RSS hash function to Toeplitz")
      commit f66c2ea3 ("net: ena: allow setting the hash function without changing the key")
      commit e9a1de37 ("net: ena: fix error returning in ena_com_get_hash_function()")
      commit 80f8443f ("net: ena: avoid unnecessary admin command when RSS function set fails")
      commit 6a4f7dc8 ("net: ena: rss: do not allocate key when not supported")
      commit 0d1c3de7 ("net: ena: fix incorrect default RSS key")
      
      The above commits represent the last part of the implementation of
      this feature, and with them merged the feature can be enabled
      in the device.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ee60edf
    • Arthur Kiyanovski's avatar
      net: ena: add support for traffic mirroring · 0f505c60
      Arthur Kiyanovski authored
      Add support for traffic mirroring, where the hardware reads the
      buffer from the instance memory directly.
      
      Traffic Mirroring needs access to the rx buffers in the instance.
      To have this access, this patch:
      1. Changes the code to map and unmap the rx buffers bidirectionally.
      2. Enables the relevant bit in driver_supported_features to indicate
         to the FW that this driver supports traffic mirroring.
      
      Rx completion is not generated until mirroring is done to avoid
      the situation where the driver changes the buffer before it is
      mirrored.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f505c60
    • Arthur Kiyanovski's avatar
      net: ena: cosmetic: change ena_com_stats_admin stats to u64 · 0dcec686
      Arthur Kiyanovski authored
      The size of the admin statistics in ena_com_stats_admin is changed
      from 32bit to 64bit so to align with the sizes of the other statistics
      in the driver (i.e. rx_stats, tx_stats and ena_stats_dev).
      
      This is done as part of an effort to create a unified API to read
      statistics.
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dcec686
    • Arthur Kiyanovski's avatar
      net: ena: cosmetic: satisfy gcc warning · 79890d3f
      Arthur Kiyanovski authored
      gcc 4.8 reports a warning when initializing with = {0}.
      Dropping the "0" from the braces fixes the issue.
      This fix is not ANSI compatible but is allowed by gcc.
      Signed-off-by: default avatarSameeh Jubran <sameehj@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79890d3f
    • Arthur Kiyanovski's avatar
      net: ena: add reserved PCI device ID · 866032ab
      Arthur Kiyanovski authored
      Add a reserved PCI device ID to the driver's table
      Used for internal testing purposes.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      866032ab
    • Arthur Kiyanovski's avatar
      net: ena: avoid unnecessary rearming of interrupt vector when busy-polling · 1e5ae350
      Arthur Kiyanovski authored
      For an overview of the race created by this patch goto synchronization
      label.
      
      In napi busy-poll mode, the kernel invokes the napi handler of the
      device repeatedly to poll the NIC's receive queues. This process
      repeats until a timeout, specific for each connection, is up.
      By polling packets in busy-poll mode the user may gain lower latency
      and higher throughput (since the kernel no longer waits for interrupts
      to poll the queues) in expense of CPU usage.
      
      Upon completing a napi routine, the driver checks whether
      the routine was called by an interrupt handler. If so, the driver
      re-enables interrupts for the device. This is needed since an
      interrupt routine invocation disables future invocations until
      explicitly re-enabled.
      
      The driver avoids re-enabling the interrupts if they were not disabled
      in the first place (e.g. if driver in busy mode).
      Originally, the driver checked whether interrupt re-enabling is needed
      by reading the 'ena_napi->unmask_interrupt' variable. This atomic
      variable was set upon interrupt and cleared after re-enabling it.
      
      In the 4.10 Linux version, the 'napi_complete_done' call was changed
      so that it returns 'false' when device should not re-enable
      interrupts, and 'true' otherwise. The change includes reading the
      "NAPIF_STATE_IN_BUSY_POLL" flag to check if the napi call is in
      busy-poll mode, and if so, return 'false'.
      The driver was changed to re-enable interrupts according to this
      routine's return value.
      The Linux community rejected the use of the
      'ena_napi->unmaunmask_interrupt' variable to determine whether
      unmasking is needed, and urged to use napi_napi_complete_done()
      return value solely.
      See https://lore.kernel.org/patchwork/patch/741149/ for more details
      
      As explained, a busy-poll session exists for a specified timeout
      value, after which it exits the busy-poll mode and re-enters it later.
      This leads to many invocations of the napi handler where
      napi_complete_done() false indicates that interrupts should be
      re-enabled.
      This creates a bug in which the interrupts are re-enabled
      unnecessarily.
      To reproduce this bug:
          1) echo 50 | sudo tee /proc/sys/net/core/busy_poll
          2) echo 50 | sudo tee /proc/sys/net/core/busy_read
          3) Add counters that check whether
          'ena_unmask_interrupt(tx_ring, rx_ring);'
          is called without disabling the interrupts in the first
          place (i.e. with calling the interrupt routine
          ena_intr_msix_io())
      
      Steps 1+2 enable busy-poll as the default mode for new connections.
      
      The busy poll routine rearms the interrupts after every session by
      design, and so we need to add an extra check that the interrupts were
      masked in the first place.
      
      synchronization:
      This patch introduces a race between the interrupt handler
      ena_intr_msix_io() and the napi routine ena_io_poll().
      Some macros and instruction were added to prevent this race from leaving
      the interrupts masked. The following specifies the different race
      scenarios in this patch:
      
      1) interrupt handler and napi routine run sequentially
          i) interrupt handler is called, sets 'interrupts_masked' flag and
      	successfully schedules the napi handler via softirq.
      
          In this scenario the napi routine might not see the flag change
          for several reasons:
      	a) The flag is stored in a register by the compiler. For this
      	case the WRITE_ONCE macro which prevents this.
      	b) The compiler might reorder the instruction. For this the
      	smp_wmb() instruction was used which implies a compiler memory
      	barrier.
      	c) On archs with weak consistency model (like ARM64) the napi
      	routine might be scheduled and start running before the flag
      	STORE instruction is committed to cache/memory. To ensure this
      	doesn't happen, the smp_wmb() instruction was added. It ensures
      	that the flag set instruction is committed before scheduling
      	napi.
      
          ii) compiler reorders the flag's value check in the 'if' with
          the flag set in the napi routine.
      
          This scenario is prevented by smp_rmb() call after the flag check.
      
      2) interrupt handler and napi routine run in parallel (can happen when
      busy poll routine invokes the napi handler)
      
          i) interrupt handler sets the flag in one core, while the napi
          routine reads it in another core.
      
          This scenario also is divided into two cases:
      	a) napi_complete_done() doesn't finish running, in which case
      	napi_sched() would just set NAPIF_STATE_MISSED and the napi
      	routine would reschedule itself without changing the flag's value.
      
      	b) napi_complete_done() finishes running. In this case the
      	napi routine might override the flag's value.
      	This doesn't present any rise since it later unmasks the
      	interrupt vector.
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e5ae350
    • Yuval Basson's avatar
      qed: Fix ILT and XRCD bitmap memory leaks · d4eae993
      Yuval Basson authored
      - Free ILT lines used for XRC-SRQ's contexts.
      - Free XRCD bitmap
      
      Fixes: b8204ad8 ("qed: changes to ILT to support XRC")
      Fixes: 7bfb399e ("qed: Add XRC to RoCE")
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarYuval Basson <ybason@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4eae993
    • David S. Miller's avatar
      Merge branch 'Phylink-PCS-updates' · 11de5770
      David S. Miller authored
      Russell King says:
      
      ====================
      Phylink PCS updates
      
      This series updates the rudimentary phylink PCS support with the
      results of the last four months of development of that.  Phylink
      PCS support was initially added back at the end of March, when it
      became clear that the current approach of treating everything at
      the MAC end as being part of the MAC was inadequate.
      
      However, this rudimentary implementation was fine initially for
      mvneta and similar, but in practice had a fair number of issues,
      particularly when ethtool interfaces were used to change various
      link properties.
      
      It became apparent that relying on the phylink_config structure for
      the PCS was also bad when it became clear that the same PCS was used
      in DSA drivers as well as in NXPs other offerings, and there was a
      desire to re-use that code.
      
      It also became apparent that splitting the "configuration" step on
      an interface mode configuration between the MAC and PCS using just
      mac_config() and pcs_config() methods was not sufficient for some
      setups, as the MAC needed to be "taken down" prior to making changes,
      and once all settings were complete, the MAC could only then be
      resumed.
      
      This series addresses these points, progressing PCS support, and
      has been developed with mvneta and DPAA2 setups, with work on both
      those drivers to prove this approach.  It has been rigorously tested
      with mvneta, as that provides the most flexibility for testing the
      various code paths.
      
      To solve the phylink_config reuse problem, we introduce a struct
      phylink_pcs, which contains the minimal information necessary, and it
      is intended that this is embedded in the PCS private data structure.
      
      To solve the interface mode configuration problem, we introduce two
      new MAC methods, mac_prepare() and mac_finish() which wrap the entire
      interface mode configuration only.  This has the additional benefit of
      relieving MAC drivers from working out whether an interface change has
      occurred, and whether they need to do some major work.
      
      I have not yet updated all the interface documentation for these
      changes yet, that work remains, but this patch set is provided in the
      hope that those working on PCS support in NXP will find this useful.
      
      Since there is a lot of change here, this is the reason why I strongly
      advise that everyone has converted to the mac_link_up() way of
      configuring the link parameters when the link comes up, rather than
      the old way of using mac_config() - especially as splitting the PCS
      changes how and when phylink calls mac_config(). Although no change
      for existing users is intended, that is something I no longer am able
      to test.
      
      Changes since RFC:
      - fix bisect build failure
      - add patch to use config.an_enabled
      - rename phylink_config_interface to phylink_major_reconfig
      - add expanded documentation for phylink_set_pcs()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11de5770
    • Russell King's avatar
      net: phylink: add interface to configure clause 22 PCS PHY · 93eaceb0
      Russell King authored
      Add an interface to configure the advertisement for a clause 22 PCS
      PHY, and set the AN enable flag in the BMCR appropriately.
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93eaceb0
    • Russell King's avatar
      net: phylink: add struct phylink_pcs · 7137e18f
      Russell King authored
      Add a way for MAC PCS to have private data while keeping independence
      from struct phylink_config, which is used for the MAC itself. We need
      this independence as we will have stand-alone code for PCS that is
      independent of the MAC.  Introduce struct phylink_pcs, which is
      designed to be embedded in a driver private data structure.
      
      This structure does not include a mdio_device as there are PCS
      implementations such as the Marvell DSA and network drivers where this
      is not necessary.
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7137e18f
    • Russell King's avatar
      net: phylink: re-implement interface configuration with PCS · b7ad14c2
      Russell King authored
      With PCS support, how we implement interface reconfiguration (or other
      major reconfiguration) is not up to the job; we end up reconfiguring
      the PCS for an interface change while the link could potentially be up.
      In order to solve this, add two additional MAC methods for major
      configuration, one to prepare for the change, and one to finish the
      change.
      
      This allows mvneta and mvpp2 to shutdown what they require prior to the
      MAC and PCS configuration calls, and then restart as appropriate.
      
      This impacts ksettings_set(), which now needs to identify whether the
      change is a minor tweak to the advertisement masks or whether the
      interface mode has changed, and call the appropriate function for that
      update.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7ad14c2
    • Russell King's avatar
      net: phylink: in-band pause mode advertisement update for PCS · 1571e700
      Russell King authored
      Re-code the pause in-band advertisement update in light of the addition
      of PCS support, so that we perform the minimum required; only the PCS
      configuration function needs to be called in this case, followed by the
      request to trigger a restart of negotiation if the programmed
      advertisement changed.
      
      We need to change the pcs_config() signature to pass whether resolved
      pause should be passed to the MAC for setups such as mvneta and mvpp2
      where doing so overrides the MAC manual flow controls.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1571e700
    • Russell King's avatar
      net: phylink: simplify fixed-link case for ksettings_set method · 1e1bf14a
      Russell King authored
      For fixed links, we only allow the current settings, so this should be
      a matter of merely rejecting an attempt to change the settings.  If the
      settings agree, then there is nothing more we need to do.
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e1bf14a
    • Russell King's avatar
      net: phylink: use config.an_enabled in ksettings_set method · a83c8829
      Russell King authored
      Rather than recomputing whether AN is enabled, use config.an_enabled.
      Suggested-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a83c8829
    • Russell King's avatar
      net: phylink: simplify phy case for ksettings_set method · cbc1bb1e
      Russell King authored
      When we have a PHY attached, an ethtool ksettings_set() call only
      really needs to call through to the phylib equivalent; phylib will
      call back to us when the link changes so we can update our state.
      Therefore, we can bypass most of our ksettings_set() call for this
      case.
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbc1bb1e
    • Russell King's avatar
      net: phylink: simplify ksettings_set() implementation · c8cab719
      Russell King authored
      Simplify the ksettings_set() implementation to look more like phylib's
      implementation; use a switch() for validating the autoneg setting, and
      use the linkmode_modify() helper to set the autoneg bit in the
      advertisement mask.
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8cab719
    • Russell King's avatar
      net: phylink: avoid mac_config calls · 7cceb599
      Russell King authored
      Avoid calling mac_config() when using split PCS, and the interface
      remains the same.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cceb599
    • Russell King's avatar
      net: phylink: update PCS when changing interface during resolution · 5005b163
      Russell King authored
      The only PHYs that are used with phylink which change their interface
      are the BCM84881 and MV88X3310 family, both of which only change their
      interface modes on link-up events.  This will break when drivers are
      converted to split-PCS.  Fix this.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5005b163
    • Russell King's avatar
      net: phylink: ensure link is down when changing interface · 16319a7d
      Russell King authored
      The only PHYs that are used with phylink which change their interface
      are the BCM84881 and MV88X3310 family, both of which only change their
      interface modes on link-up events.  However, rather than relying upon
      this behaviour by the PHY, we should give a stronger guarantee when
      resolving that the link will be down whenever we change the interface
      mode.  This patch implements that stronger guarantee for resolve.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16319a7d
    • Russell King's avatar
      net: phylink: rearrange resolve mac_config() call · 319bfafe
      Russell King authored
      Use a boolean to indicate whether mac_config() should be called during
      a resolution. This allows resolution to have a single location where
      mac_config() will be called, which will allow us to make decisions
      about how and what we do.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      319bfafe
    • Russell King's avatar
      net: phylink: rejig link state tracking · b06e5cac
      Russell King authored
      Rejig the link state tracking, so that we can use the current state
      in a future patch.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b06e5cac
    • Russell King's avatar
      net: phylink: update ethtool reporting for fixed-link modes · 1ceb7ee7
      Russell King authored
      Comparing the ethtool output from phylink and non-phylink fixed-link
      setups shows that we have some differences:
      
      - The "auto-negotiation" fields are different; phylink reports these
        as "No", non-phylink reports these as "Yes" for the supported and
        advertising masks.
      - The link partner advertisement is set to the link speed with non-
        phylink, but phylink leaves this unset, causing all link partner
        fields to be omitted.
      
      The phylink ethtool output also disagrees with the software emulated
      PHY dump via the MII registers.
      
      Update the phylink fixed-link parsing code so that we better reflect
      the behaviour of the non-phylink code that this facility replaces, and
      bring the ethtool interface more into line with the report from via the
      MII interface.
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ceb7ee7
    • David S. Miller's avatar
      Merge branch 'enetc-Add-adaptive-interrupt-coalescing' · ccbc6dac
      David S. Miller authored
      Claudiu Manoil says:
      
      ====================
      enetc: Add adaptive interrupt coalescing
      
      Apart from some related cleanup patches, this set
      introduces in a straightforward way the support needed
      to enable and configure interrupt coalescing for ENETC.
      
      Patch 5 introduces the support needed for configuring the
      interrupt coalescing parameters and for switching between
      moderated (int. coalescing) and per-packet interrupt modes.
      When interrupt coalescing is enabled the Rx/Tx time
      thresholds are configurable, packet thresholds are fixed.
      To make this work reliably, patch 5 uses the traffic
      pause procedure introduced in patch 2.
      
      Patch 6 adds DIM (Dynamic Interrupt Moderation) to implement
      adaptive coalescing based on time thresholds, for the Rx 'channel'.
      On the Tx side a default optimal value is used instead, optimized for
      TCP traffic over 1G and 2.5G links.  This default 'optimal' value can
      be overridden anytime via 'ethtool -C tx-usecs'.
      
      netperf -t TCP_MAERTS measurements show a significant CPU load
      reduction correlated w/ reduced interrupt rates. For the
      measurement results refer to the comments in patch 6.
      
      v2: Replaced Tx DIM with predefined optimal value, giving
      better results. This was also suggested by Jakub (cc).
      Switched order of patches 4 and 5, for better grouping.
      
      v3: minor cleanup/improvements
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccbc6dac
    • Claudiu Manoil's avatar
      enetc: Add adaptive interrupt coalescing · ae0e6a5d
      Claudiu Manoil authored
      Use the generic dynamic interrupt moderation (dim)
      framework to implement adaptive interrupt coalescing
      on Rx.  With the per-packet interrupt scheme, a high
      interrupt rate has been noted for moderate traffic flows
      leading to high CPU utilization.  The 'dim' scheme
      implemented by the current patch addresses this issue
      improving CPU utilization while using minimal coalescing
      time thresholds in order to preserve a good latency.
      On the Tx side use an optimal time threshold value by
      default.  This value has been optimized for Tx TCP
      streams at a rate of around 85kpps on a 1G link,
      at which rate half of the Tx ring size (128) gets filled
      in 1500 usecs.  Scaling this down to 2.5G links yields
      the current value of 600 usecs, which is conservative
      and gives good enough results for 1G links too (see
      next).
      
      Below are some measurement results for before and after
      this patch (and related dependencies) basically, for a
      2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
      using 60secs log netperf TCP stream tests @ 1Gbit link
      (maximum throughput):
      
      1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
      thread on the same CPU:
      	CPU utilization		int rate (ints/sec)
      Before:	50%-60% (over 50%)		92k
      After:  13%-22%				3.5k-12k
      Comment:  Major CPU utilization improvement for a single flow
      	  Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
      	  CPU. Usually settles under 16% for longer tests.
      
      2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
      	Total CPU utilization	Total int rate (ints/sec)
      Before:	~80% (spikes to 90%)		~100k
      After:   60% (more steady)		  ~4k
      Comment:  Important improvement for this load test, while the
      	  ping test outcome does not show any notable
      	  difference compared to before.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae0e6a5d
    • Claudiu Manoil's avatar
      enetc: Add interrupt coalescing support · 91571081
      Claudiu Manoil authored
      Enable programming of the interrupt coalescing registers
      and allow manual configuration of the coalescing time
      thresholds via ethtool.  Packet thresholds have been fixed
      to predetermined values as there's no point in making them
      run-time configurable, also anticipating the dynamic interrupt
      moderation (DIM) algorithm which uses fixed packet thresholds
      as well.  If the interface is up when the operation mode of
      traffic interrupt events is changed by the user (i.e. switching
      from default per-packet interrupts to coalesced interrupts),
      the traffic needs to be paused in the process.
      This patch also prepares the ground for introducing DIM on Rx.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91571081
    • Claudiu Manoil's avatar
      enetc: Drop redundant ____cacheline_aligned_in_smp · 058d9cfa
      Claudiu Manoil authored
      'struct enetc_bdr' is already '____cacheline_aligned_in_smp'.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      058d9cfa
    • Claudiu Manoil's avatar
      enetc: Fix interrupt coalescing register naming · 12460a0a
      Claudiu Manoil authored
      Interrupt coalescing registers naming in the current revision
      of the Ref Man (RM) is ICR, deprecating the ICIR name used
      in earlier (draft) versions of the RM.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12460a0a
    • Claudiu Manoil's avatar
      enetc: Factor out the traffic start/stop procedures · bbb96dc7
      Claudiu Manoil authored
      A reliable traffic pause (and reconfiguration) procedure
      is needed to be able to safely make h/w configuration
      changes during run-time, like changing the mode in which the
      interrupts are operating (i.e. with or without coalescing),
      as opposed to making on-the-fly register updates that
      may be subject to h/w or s/w concurrency issues.
      To this end, the code responsible of the run-time device
      configurations that basically starts resp. stops the traffic
      flow through the device has been extracted from the
      the enetc_open/_close procedures, to the separate standalone
      enetc_start/_stop procedures. Traffic stop should be as
      graceful as possible, it lets the executing napi threads to
      to finish while the interrupts stay disabled.  But since
      the napi thread will try to re-enable interrupts by clearing
      the device's unmask register, the enable_irq/ disable_irq
      API has been used to avoid this potential concurrency issue
      and make the traffic pause procedure more reliable.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbb96dc7
    • Claudiu Manoil's avatar
      enetc: Refine buffer descriptor ring sizes · 02293dd4
      Claudiu Manoil authored
      It's time to differentiate between Rx and Tx ring sizes.
      Not only Tx rings are processed differently than Rx rings,
      but their default number also differs - i.e. up to 8 Tx rings
      per device (8 traffic classes) vs. 2 Rx rings (one per CPU).
      So let's set Tx rings sizes to half the size of the Rx rings
      for now, to be conservative.
      The default ring sizes were decreased as well (to the next
      lower power of 2), to reduce the memory footprint, buffering
      etc., since the measurements I've made so far show that the
      rings are very unlikely to get full.
      This change also anticipates the introduction of the
      dynamic interrupt moderation (dim) algorithm which operates
      on maximum packet thresholds of 256 packets for Rx and 128
      packets for Tx.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02293dd4
    • Jisheng Zhang's avatar
      net: mdio-mux-gpio: use devm_gpiod_get_array() · c17e3178
      Jisheng Zhang authored
      Use devm_gpiod_get_array() to simplify the error handling and exit
      code path.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c17e3178
    • Vladimir Oltean's avatar
      net: dsa: use the ETH_MIN_MTU and ETH_DATA_LEN default values · 71d4364a
      Vladimir Oltean authored
      Now that DSA supports MTU configuration, undo the effects of commit
      8b1efc0f ("net: remove MTU limits on a few ether_setup callers") and
      let DSA interfaces use the default min_mtu and max_mtu specified by
      ether_setup(). This is more important for min_mtu: since DSA is
      Ethernet, the minimum MTU is the same as of any other Ethernet
      interface, and definitely not zero. For the max_mtu, we have a callback
      through which drivers can override that, if they want to.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71d4364a
    • Jonathan McDowell's avatar
      net: dsa: qca8k: implement the port MTU callbacks · f58d2598
      Jonathan McDowell authored
      This switch has a single max frame size configuration register, so we
      track the requested MTU for each port and apply the largest.
      
      v2:
      - Address review feedback from Vladimir Oltean
      Signed-off-by: default avatarJonathan McDowell <noodles@earth.li>
      Acked-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f58d2598
    • Wang Hai's avatar
      net: hsr: remove redundant null check · 2b96692b
      Wang Hai authored
      Because kfree_skb already checked NULL skb parameter,
      so the additional checks are unnecessary, just remove them.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b96692b
    • Christophe JAILLET's avatar
      net/fealnx: switch from 'pci_' to 'dma_' API · 405e30e2
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated, GFP_KERNEL can be used because it is called from
      the probe function (i.e. 'fealnx_init_one()') and no lock is taken.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      405e30e2
    • Christophe JAILLET's avatar
      mISDN: switch from 'pci_' to 'dma_' API · e85da794
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'setup_hw()' (hfcpci.c) GFP_KERNEL can be used
      because it is called from the probe function and no lock is taken.
      The call chain is:
         hfc_probe()
         --> setup_card()
         --> setup_hw()
      
      When memory is allocated in 'inittiger()' (netjet.c) GFP_ATOMIC must be
      used because a spin_lock is taken by the caller (i.e. 'nj_init_card()')
      This is also consistent with the other allocations done in the function.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e85da794
    • Briana Oursler's avatar
      tc-testing: Add tdc to kselftests · 2b9843fb
      Briana Oursler authored
      Add tdc to existing kselftest infrastructure so that it can be run with
      existing kselftests. TDC now generates objects in objdir/kselftest
      without cluttering main objdir, leaves source directory clean, and
      installs correctly in kselftest_install, properly adding itself to
      run_kselftest.sh script.
      
      Add tc-testing as a target of selftests/Makefile. Create tdc.sh to run
      tdc.py targets with correct arguments. To support single target from
      selftest/Makefile, combine tc-testing/bpf/Makefile and
      tc-testing/Makefile. Move action.c up a directory to tc-testing/.
      
      Tested with:
       make O=/tmp/{objdir} TARGETS="tc-testing" kselftest
       cd /tmp/{objdir}
       cd kselftest
       cd tc-testing
       ./tdc.sh
      
       make -C tools/testing/selftests/ TARGETS=tc-testing run_tests
      
       make TARGETS="tc-testing" kselftest
       cd tools/testing/selftests
       ./kselftest_install.sh /tmp/exampledir
       My VM doesn't run all the kselftests so I commented out all except my
       target and net/pmtu.sh then:
       cd /tmp/exampledir && ./run_kselftest.sh
      Co-developed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarBriana Oursler <briana.oursler@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b9843fb
    • Vinay Kumar Yadav's avatar
      crypto/chtls: Enable tcp window scaling option · c3466a76
      Vinay Kumar Yadav authored
      Enable tcp window scaling option in hw based on sysctl settings
      and option in connection request.
      
      v1->v2:
      - Set window scale option based on option in connection request.
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3466a76
    • David S. Miller's avatar
      Merge branch 'net-atlantic-various-features' · 4f1b4da5
      David S. Miller authored
      Mark Starovoytov says:
      
      ====================
      net: atlantic: various features
      
      This patchset adds more features for Atlantic NICs:
       * media detect;
       * additional per-queue stats;
       * PTP stats;
       * ipv6 support for TCP LSO and UDP GSO;
       * 64-bit operations;
       * A0 ntuple filters;
       * MAC temperature (hwmon).
      
      This work is a joint effort of Marvell developers.
      
      v3:
       * reworked patches related to stats:
         . fixed u64_stats_update_* usage;
         . use simple assignment in _get_stats / _fill_stats_data;
         . made _get_sw_stats / _fill_stats_data return count as return value;
         . split rx and tx per-queue stats;
      
      v2: https://patchwork.ozlabs.org/cover/1329652/
       * removed media detect feature (will be reworked and submitted later);
       * removed irq counter from stats;
       * use u64_stats_update_* to protect 64-bit stats;
       * use io-64-nonatomic-lo-hi.h for readq/writeq fallbacks;
      
      v1: https://patchwork.ozlabs.org/cover/1327894/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f1b4da5