1. 01 Nov, 2022 2 commits
  2. 31 Oct, 2022 18 commits
  3. 30 Oct, 2022 1 commit
  4. 29 Oct, 2022 19 commits
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 02a97e02
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2022-10-24
      
      SW steering updates from Yevgeny Kliteynik:
      
      1) 1st Four patches: small fixes / optimizations for SW steering:
      
       - Patch 1: Don't abort destroy flow if failed to destroy table - continue
         and free everything else.
       - Patches 2 and 3 deal with fast teardown:
          + Skip sync during fast teardown, as PCI device is not there any more.
          + Check device state when polling CQ - otherwise SW steering keeps polling
            the CQ forever, because nobody is there to flush it.
       - Patch 4: Removing unneeded function argument.
      
      2) Deal with the hiccups that we get during rules insertion/deletion,
      which sometimes reach 1/4 of a second. While insertion/deletion rate
      improvement was not the focus here, it still is a by-product of removing these
      hiccups.
      
      Another by-product is the reduced standard deviation in measuring the duration
      of rules insertion/deletion bursts.
      
      In the testing we add K rules (warm-up phase), and then continuously do
      insertion/deletion bursts of N rules.
      During the test execution, the driver measures hiccups (amount and duration)
      and total time for insertion/deletion of a batch of rules.
      
      Here are some numbers, before and after these patches:
      
      +--------------------------------------------+-----------------+----------------+
      |                                            |   Create rules  |  Delete rules  |
      |                                            +--------+--------+--------+-------+
      |                                            | Before |  After | Before | After |
      +--------------------------------------------+--------+--------+--------+-------+
      | Max hiccup [msec]                          |    253 |     42 |    254 |    68 |
      +--------------------------------------------+--------+--------+--------+-------+
      | Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 |
      +--------------------------------------------+--------+--------+--------+-------+
      | Num of hiccups per 100K rules add/remove   |   7.77 |   7.97 |  12.60 | 11.57 |
      +--------------------------------------------+--------+--------+--------+-------+
      | Avg hiccup duration [msec]                 |  36.92 |  33.25 |  36.15 | 33.74 |
      +--------------------------------------------+--------+--------+--------+-------+
      
       - Patch 5: Allocate a short array on stack instead of dynamically- it is
         destroyed at the end of the function.
       - Patch 6: Rather than cleaning the corresponding chunk's section of
         ste_arrays on chunk deletion, initialize these areas upon chunk creation.
         Chunk destruction tend to come in large batches (during pool syncing),
         so instead of doing huge memory initialization during pool sync,
         we amortize this by doing small initsializations on chunk creation.
       - Patch 7: In order to simplifies error flow and allows cleaner addition
         of new pools, handle creation/destruction of all the domain's memory pools
         and other memory-related fields in a separate init/uninit functions.
       - Patch 8: During rehash, write each table row immediately instead of waiting
         for the whole table to be ready and writing it all - saves allocations
         of ste_send_info structures and improves performance.
       - Patch 9: Instead of allocating/freeing send info objects dynamically,
         manage them in pool. The number of send info objects doesn't depend on
         number of rules, so after pre-populating the pool with an initial batch of
         send info objects, the pool is not expected to grow.
         This way we save alloc/free during writing STEs to ICM, which by itself can
         sometimes take up to 40msec.
       - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered
         the alloc/free "hiccups" frequency.
       - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator.
       - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for
         sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have
         more syncs, each     sync will be shorter and will help with insertion rate
         stability. Also, notice that the overall number of hiccups wasn't increased
         due to all the other patches.
       - Patch 13: Keep track of hot ICM chunks in an array instead of list.
         After steering sync, we traverse the hot list and finally free all the
         chunks. It appears that traversing a long list takes unusually long time
         due to cache misses on many entries, which causes a big "hiccup" during
         rule insertion. This patch replaces the list with pre-allocated array that
         stores only the bookkeeping information that is needed to later free the
         chunks in its buddy allocator.
       - Patch 14: Remove the unneeded buddy used_list - we don't need to have the
         list of used chunks, we only need the total amount of used memory.
      
      * tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: DR, Remove the buddy used_list
        net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list
        net/mlx5: DR, Lower sync threshold for ICM hot memory
        net/mlx5: DR, Allocate htbl from its own slab allocator
        net/mlx5: DR, Allocate icm_chunks from their own slab allocator
        net/mlx5: DR, Manage STE send info objects in pool
        net/mlx5: DR, In rehash write the line in the entry immediately
        net/mlx5: DR, Handle domain memory resources init/uninit separately
        net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation
        net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically
        net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy
        net/mlx5: DR, Check device state when polling CQ
        net/mlx5: DR, Fix the SMFS sync_steering for fast teardown
        net/mlx5: DR, In destroy flow, free resources even if FW command failed
      ====================
      
      Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      02a97e02
    • Jakub Kicinski's avatar
      Merge branch 'net-ipa-start-adding-ipa-v5-0-functionality' · eb288cbd
      Jakub Kicinski authored
      Alex Elder says:
      
      ====================
      net: ipa: start adding IPA v5.0 functionality
      
      The biggest change for IPA v5.0 is that it supports more than 32
      endpoints.  However there are two other unrelated changes:
        - The STATS_TETHERING memory region is not required
        - Filter tables no longer support a "global" filter
      
      Beyond this, refactoring some code makes supporting more than 32
      endpoints (in an upcoming series) easier.  So this series includes
      a few other changes (not in this order):
        - The maximum endpoint ID in use is determined during config
        - Loops over all endpoints only involve those in use
        - Endpoints IDs and their directions are checked for validity
          differently to simplify comparison against the maximum
      ====================
      
      Link: https://lore.kernel.org/r/20221027122632.488694-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eb288cbd
    • Alex Elder's avatar
      net: ipa: record and use the number of defined endpoint IDs · b7aaff0b
      Alex Elder authored
      Define a new field in the IPA structure that records the maximum
      number of entries that will be used in the IPA endpoint array.  Use
      that value rather than IPA_ENDPOINT_MAX to determine the end
      condition for two loops that iterate over all endpoints.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7aaff0b
    • Alex Elder's avatar
      net: ipa: determine the maximum endpoint ID · 5274c715
      Alex Elder authored
      Each endpoint ID has an entry in the IPA endpoint array.  But the
      size of that array is defined at compile time.  Instead, rename
      ipa_endpoint_data_valid() to be ipa_endpoint_max() and have it
      return the maximum endpoint ID defined in configuration data.
      That function will still validate configuration data.
      
      Zero is returned on error; it's a valid endpoint ID, but we need
      more than one, so it can't be the maximum.  The next patch makes use
      of the returned maximum value.
      
      Finally, rename the "initialized" mask of endpoints defined by
      configuration data to be "defined".
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5274c715
    • Alex Elder's avatar
      net: ipa: refactor endpoint loops · e359ba89
      Alex Elder authored
      Change two functions that iterate over all endpoints to use while
      loops, using "endpoint_id" as the index variables in both spots.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e359ba89
    • Alex Elder's avatar
      net: ipa: more completely check endpoint validity · 2b87d721
      Alex Elder authored
      Ensure all defined TX endpoints are in the range [0, CONS_PIPES) and
      defined RX endpoints are within [PROD_LOWEST, PROD_LOWEST+PROD_PIPES).
      
      Modify the way local variables are used to make the checks easier
      to understand.  Check for each endpoint being in valid range in the
      loop, and drop the logical-AND check of initialized against
      unavailable IDs.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b87d721
    • Alex Elder's avatar
      net: ipa: no more global filtering starting with IPA v5.0 · bd552493
      Alex Elder authored
      IPA v5.0 eliminates the global filter table entry.  As a result,
      there is no need to shift the filtered endpoint bitmap when it is
      written to IPA local memory.
      
      Update comments to explain this.  Also delete a redundant block of
      comments above the function.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bd552493
    • Alex Elder's avatar
      net: ipa: change an IPA v5.0 memory requirement · 5ba5faa2
      Alex Elder authored
      Don't require IPA v5.0 to have a STATS_TETHERING memory region.
      Downstream defines its size to 0, so it apparently is unused.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ba5faa2
    • Alex Elder's avatar
      net: ipa: define IPA v5.0 · 5783c68a
      Alex Elder authored
      In preparation for adding support for IPA v5.0, define it as an
      understood version.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5783c68a
    • Willem de Bruijn's avatar
      net/packet: add PACKET_FANOUT_FLAG_IGNORE_OUTGOING · 58ba4263
      Willem de Bruijn authored
      Extend packet socket option PACKET_IGNORE_OUTGOING to fanout groups.
      
      The socket option sets ptype.ignore_outgoing, which makes
      dev_queue_xmit_nit skip the socket.
      
      When the socket joins a fanout group, the option is not reflected in
      the struct ptype of the group. dev_queue_xmit_nit only tests the
      fanout ptype, so the flag is ignored once a socket joins a
      fanout group.
      
      Inheriting the option from a socket would change established behavior.
      Different sockets in the group can set different flags, and can also
      change them at runtime.
      
      Testing in packet_rcv_fanout defeats the purpose of the original
      patch, which is to avoid skb_clone in dev_queue_xmit_nit (esp. for
      MSG_ZEROCOPY packets).
      
      Instead, introduce a new fanout group flag with the same behavior.
      
      Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/test_psock_fanout_ignore_outgoing.cSigned-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221027211014.3581513-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      58ba4263
    • Lukasz Czapnik's avatar
      ice: Add additional CSR registers to ETHTOOL_GREGS · 637639cb
      Lukasz Czapnik authored
      In the event of a Tx hang it can be useful to read a variety of hardware
      registers to capture some state about why the transmit queue got stuck.
      
      Extend the ETHTOOL_GREGS dump provided by the ice driver with several CSR
      registers that provide such relevant information regarding the hardware Tx
      state. This enables capturing relevant data to enable debugging such a Tx
      hang.
      Signed-off-by: default avatarLukasz Czapnik <lukasz.czapnik@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Link: https://lore.kernel.org/r/20221027104239.1691549-1-jacob.e.keller@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      637639cb
    • Jakub Kicinski's avatar
      Merge branch 'clean-up-sfp-register-definitions' · 00643631
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      Clean up SFP register definitions
      
      This two-part patch series cleans up the SFP register definitions by
      1. converting them from hex to decimal, as all the definitions in the
         documents use decimal, this makes it easier to cross-reference.
      2. moving the bit definitions for each register along side their
         register address definition
      ====================
      
      Link: https://lore.kernel.org/r/Y1qFvaDlLVM1fHdG@shell.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      00643631
    • Russell King (Oracle)'s avatar
      net: sfp: move field definitions along side register index · d83845d2
      Russell King (Oracle) authored
      Just as we do for the A2h enum, arrange the A0h enum to have the
      field definitions next to their corresponding register index.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d83845d2
    • Russell King (Oracle)'s avatar
      net: sfp: convert register indexes from hex to decimal · 17dd3611
      Russell King (Oracle) authored
      The register indexes in the standards are in decimal rather than hex,
      so lets specify them in decimal in the header file so we can easily
      cross-reference without converting between hex and decimal.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      17dd3611
    • Jakub Kicinski's avatar
      Merge branch 'net-mtk_eth_soc-improve-pcs-implementation' · e3855920
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      net: mtk_eth_soc: improve PCS implementation
      
      As a result of invesigations from Frank Wunderlich, we know a lot more
      about the Mediatek "SGMII" PCS block, and can implement the PCS support
      correctly. This series achieves that, and Frank has tested the final
      result and reports that it works for him. The series could do with
      further testing by others, but I suspect that is unlikely to happen
      until it is merged based on past performances with this driver.
      
      Briefly, the patches in order:
      
      1. Add a new helper to get the link timer duration in nanoseconds
      2. Add definitions for the newly discovered registers and updates to
         bit definitions, including bitmasks for the BMCR, BMSR and two
         advertisement registers.
      3. Remove unnecessary/unused error handling (functions always returning
         zero.)
      4. Adding the missing pcs_get_state() implementation.
      5. Converting the code to use regmap_update_bits() rather than
         open-coding read-modify-write sequences.
      6. Adding out-of-band speed and duplex forcing for all non-inband modes
         not just the 802.3z link modes the code currently does.
      7. Moving the release of the PHY power down to the main pcs_config()
         function.
      8. Moving the interface speed selection to the main pcs_config()
         function.
      9. Adding advertisement programming.
      10. Adding correct link timer programming using the new helper in the
          first patch.
      11. Adding support for 802.3z negotiation.
      
      There is one remaining issue - when configuring the PCS for in-band,
      for some reason the AN restart bit is always set. This should not be
      necessary, but requires further investigation with the hardware to
      find out whether it is really necessary. I suspect this was a work
      around for a previous poor implementation.
      ====================
      
      Link: https://lore.kernel.org/r/Y1qDMw+DJLAJHT40@shell.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e3855920
    • Russell King (Oracle)'s avatar
      net: mtk_eth_soc: add support for in-band 802.3z negotiation · 81b0f12a
      Russell King (Oracle) authored
      As a result of help from Frank Wunderlich to investigate and test, we
      now know how to program this PCS for in-band 802.3z negotiation. Add
      support for this by moving the contents of the two functions into the
      common mtk_pcs_config() function and adding the register settings for
      802.3z negotiation.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      81b0f12a
    • Russell King (Oracle)'s avatar
      net: mtk_eth_soc: move and correct link timer programming · 3027d89f
      Russell King (Oracle) authored
      Program the link timer appropriately for the interface mode being
      used, using the newly introduced phylink helper that provides the
      nanosecond link timer interval.
      
      The intervals are 1.6ms for SGMII based protocols and 10ms for
      802.3z based protocols.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3027d89f
    • Russell King (Oracle)'s avatar
      net: mtk_eth_soc: add advertisement programming · c125c66e
      Russell King (Oracle) authored
      Program the advertisement into the mtk PCS block.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c125c66e
    • Russell King (Oracle)'s avatar
      net: mtk_eth_soc: move interface speed selection · f752c0df
      Russell King (Oracle) authored
      Move the selection of the underlying interface speed to the pcs_config
      function, so we always program the interface speed.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f752c0df