1. 15 Oct, 2021 40 commits
    • Maciej Fijalkowski's avatar
      ice: make use of ice_for_each_* macros · 2faf63b6
      Maciej Fijalkowski authored
      Go through the code base and use ice_for_each_* macros.  While at it,
      introduce ice_for_each_xdp_txq() macro that can be used for looping over
      xdp_rings array.
      
      Commit is not introducing any new functionality.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2faf63b6
    • Maciej Fijalkowski's avatar
      ice: introduce XDP_TX fallback path · 22bf877e
      Maciej Fijalkowski authored
      Under rare circumstances there might be a situation where a requirement
      of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
      resources have to be shared between CPUs. This yields a need for placing
      accesses to xdp_ring inside a critical section protected by spinlock.
      These accesses happen to be in the hot path, so let's introduce the
      static branch that will be triggered from the control plane when driver
      could not provide Tx queue dedicated for XDP on each CPU.
      
      Currently, the design that has been picked is to allow any number of XDP
      Tx queues that is at least half of a count of CPUs that platform has.
      For lower number driver will bail out with a response to user that there
      were not enough Tx resources that would allow configuring XDP. The
      sharing of rings is signalled via static branch enablement which in turn
      indicates that lock for xdp_ring accesses needs to be taken in hot path.
      
      Approach based on static branch has no impact on performance of a
      non-fallback path. One thing that is needed to be mentioned is a fact
      that the static branch will act as a global driver switch, meaning that
      if one PF got out of Tx resources, then other PFs that ice driver is
      servicing will suffer. However, given the fact that HW that ice driver
      is handling has 1024 Tx queues per each PF, this is currently an
      unlikely scenario.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      22bf877e
    • Maciej Fijalkowski's avatar
      ice: optimize XDP_TX workloads · 9610bd98
      Maciej Fijalkowski authored
      Optimize Tx descriptor cleaning for XDP. Current approach doesn't
      really scale and chokes when multiple flows are handled.
      
      Introduce two ring fields, @next_dd and @next_rs that will keep track of
      descriptor that should be looked at when the need for cleaning arise and
      the descriptor that should have the RS bit set, respectively.
      
      Note that at this point the threshold is a constant (32), but it is
      something that we could make configurable.
      
      First thing is to get away from setting RS bit on each descriptor. Let's
      do this only once NTU is higher than the currently @next_rs value. In
      such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
      advance the @next_rs by a 32.
      
      Second thing is to clean the Tx ring only when there are less than 32
      free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
      This bit is written back by HW to let the driver know that xmit was
      successful. It will happen only for those descriptors that had RS bit
      set. Clean only 32 descriptors and advance the DD bit.
      
      Actual cleaning routine is moved from ice_napi_poll() down to the
      ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
      SKBs in there that would rely on interrupts for the cleaning. Nice side
      effect is that for rare case of Tx fallback path (that next patch is
      going to introduce) we don't have to trigger the SW irq to clean the
      ring.
      
      With those two concepts, ring is kept at being almost full, but it is
      guaranteed that driver will be able to produce Tx descriptors.
      
      This approach seems to work out well even though the Tx descriptors are
      produced in one-by-one manner. Test was conducted with the ice HW
      bombarded with packets from HW generator, configured to generate 30
      flows.
      
      Xdp2 sample yields the following results:
      <snip>
      proto 17:   79973066 pkt/s
      proto 17:   80018911 pkt/s
      proto 17:   80004654 pkt/s
      proto 17:   79992395 pkt/s
      proto 17:   79975162 pkt/s
      proto 17:   79955054 pkt/s
      proto 17:   79869168 pkt/s
      proto 17:   79823947 pkt/s
      proto 17:   79636971 pkt/s
      </snip>
      
      As that sample reports the Rx'ed frames, let's look at sar output.
      It says that what we Rx'ed we do actually Tx, no noticeable drops.
      Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
      Average:       ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00      0.00      0.00     38.32
      
      with tx_busy staying calm.
      
      When compared to a state before:
      Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
      Average:       ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00      0.00      0.00     43.64
      
      it can be observed that the amount of txpck/s is almost doubled, meaning
      that the performance is improved by around 90%. All of this due to the
      drops in the driver, previously the tx_busy stat was bumped at a 7mpps
      rate.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9610bd98
    • Maciej Fijalkowski's avatar
      ice: propagate xdp_ring onto rx_ring · eb087cd8
      Maciej Fijalkowski authored
      With rings being split, it is now convenient to introduce a pointer to
      XDP ring within the Rx ring. For XDP_TX workloads this means that
      xdp_rings array access will be skipped, which was executed per each
      processed frame.
      
      Also, read the XDP prog once per NAPI and if prog is present, set up the
      local xdp_ring pointer. Reading prog a single time was discussed in [1]
      with some concern raised by Toke around dispatcher handling and having
      the need for going through the RCU grace period in the ndo_bpf driver
      callback, but ice currently is torning down NAPI instances regardless of
      the prog presence on VSI.
      
      Although the pointer to XDP ring introduced to Rx ring makes things a
      lot slimmer/simpler, I still feel that single prog read per NAPI
      lifetime is beneficial.
      
      Further patch that will introduce the fallback path will also get a
      profit from that as xdp_ring pointer will be set during the XDP rings
      setup.
      
      [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      eb087cd8
    • Maciej Fijalkowski's avatar
      ice: do not create xdp_frame on XDP_TX · a55e16fa
      Maciej Fijalkowski authored
      xdp_frame is not needed for XDP_TX data path in ice driver case.
      For this data path cleaning of sent descriptor will not happen anywhere
      outside of the driver, which means that carrying the information about
      the underlying memory model via xdp_frame will not be used. Therefore,
      this conversion can be simply dropped, which would relieve CPU a bit.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a55e16fa
    • Maciej Fijalkowski's avatar
      ice: unify xdp_rings accesses · 0bb4f9ec
      Maciej Fijalkowski authored
      There has been a long lasting issue of improper xdp_rings indexing for
      XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
      is mixed with smp_processor_id(), there could be a situation where Tx
      descriptors are produced onto XDP Tx ring, but tail is never bumped -
      for example pin a particular queue id to non-matching IRQ line.
      
      Address this problem by ignoring the user ring count setting and always
      initialize the xdp_rings array to be of num_possible_cpus() size. Then,
      always use the smp_processor_id() as an index to xdp_rings array. This
      provides serialization as at given time only a single softirq can run on
      a particular CPU.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0bb4f9ec
    • Maciej Fijalkowski's avatar
      ice: split ice_ring onto Tx/Rx separate structs · e72bba21
      Maciej Fijalkowski authored
      While it was convenient to have a generic ring structure that served
      both Tx and Rx sides, next commits are going to introduce several
      Tx-specific fields, so in order to avoid hurting the Rx side, let's
      pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
      
      Rx ring could be handled by the old ice_ring which would reduce the code
      churn within this patch, but this would make things asymmetric.
      
      Make the union out of the ring container within ice_q_vector so that it
      is possible to iterate over newly introduced ice_tx_ring.
      
      Remove the @size as it's only accessed from control path and it can be
      calculated pretty easily.
      
      Change definitions of ice_update_ring_stats and
      ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
      used for both Rx and Tx rings.
      
      Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
      Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
      difference now.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e72bba21
    • Maciej Fijalkowski's avatar
      ice: move ice_container_type onto ice_ring_container · dc23715c
      Maciej Fijalkowski authored
      Currently ice_container_type is scoped only for ice_ethtool.c. Next
      commit that will split the ice_ring struct onto Rx/Tx specific ring
      structs is going to also modify the type of linked list of rings that is
      within ice_ring_container. Therefore, the functions that are taking the
      ice_ring_container as an input argument will need to be aware of a ring
      type that will be looked up.
      
      Embed ice_container_type within ice_ring_container and initialize it
      properly when allocating the q_vectors.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      dc23715c
    • Maciej Fijalkowski's avatar
      ice: remove ring_active from ice_ring · e93d1c37
      Maciej Fijalkowski authored
      This field is dead and driver is not making any use of it. Simply remove
      it.
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e93d1c37
    • David S. Miller's avatar
      Merge branch 'dpaa2-irq-coalescing' · 295711fa
      David S. Miller authored
      Ioana Ciornei says:
      
      ====================
      dpaa2-eth: add support for IRQ coalescing
      
      This patch set adds support for interrupts coalescing in dpaa2-eth.
      The first patches add support for the hardware level configuration of
      the IRQ coalescing in the dpio driver, while the ones that touch the
      dpaa2-eth driver are responsible for the ethtool user interraction.
      
      With the adaptive IRQ coalescing in place and enabled we have observed
      the following changes in interrupt rates on one A72 core @2.2GHz
      (LX2160A) while running a Rx TCP flow.  The TCP stream is sent on a
      10Gbit link and the only cpu that does Rx is fully utilized.
                                      IRQ rate (irqs / sec)
      before:   4.59 Gbits/sec                24k
      after:    5.67 Gbits/sec                1.3k
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      295711fa
    • Ioana Ciornei's avatar
      net: dpaa2: add adaptive interrupt coalescing · fc398bec
      Ioana Ciornei authored
      Add support for adaptive interrupt coalescing to the dpaa2-eth driver.
      First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported
      coalesce parameter and the requested state is configured through the
      dpio APIs added in the previous patch.
      
      Besides the ethtool API interaction, we keep track of how many bytes and
      frames are dequeued per CDAN (Channel Data Availability Notification)
      and update the Net DIM instance through the dpaa2_io_update_net_dim()
      API.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc398bec
    • Ioana Ciornei's avatar
      soc: fsl: dpio: add Net DIM integration · 69651bd8
      Ioana Ciornei authored
      Use the generic dynamic interrupt moderation (dim) framework to
      implement adaptive interrupt coalescing on Rx. With the per-packet
      interrupt scheme, a high interrupt rate has been noted for moderate
      traffic flows leading to high CPU utilization.
      
      The dpio driver exports new functions to enable/disable adaptive IRQ
      coalescing on a DPIO object, to query the state or to update Net DIM
      with a new set of bytes and frames dequeued.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69651bd8
    • Ioana Ciornei's avatar
      net: dpaa2: add support for manual setup of IRQ coalesing · a64b4421
      Ioana Ciornei authored
      Use the newly exported dpio driver API to manually configure the IRQ
      coalescing parameters requested by the user.
      The .get_coalesce() and .set_coalesce() net_device callbacks are
      implemented and directly export or setup the rx-usecs on all the
      channels configured.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64b4421
    • Ioana Ciornei's avatar
      soc: fsl: dpio: add support for irq coalescing per software portal · ed1d2143
      Ioana Ciornei authored
      In DPAA2 based SoCs, the IRQ coalesing support per software portal has 2
      configurable parameters:
       - the IRQ timeout period (QBMAN_CINH_SWP_ITPR): how many 256 QBMAN
         cycles need to pass until a dequeue interrupt is asserted.
       - the IRQ threshold (QBMAN_CINH_SWP_DQRR_ITR): how many dequeue
         responses in the DQRR ring would generate an IRQ.
      
      Add support for setting up and querying these IRQ coalescing related
      parameters.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed1d2143
    • Ioana Ciornei's avatar
      soc: fsl: dpio: extract the QBMAN clock frequency from the attributes · 2cf0b6fe
      Ioana Ciornei authored
      Through the dpio_get_attributes() firmware call the dpio driver has
      access to the QBMAN clock frequency. Extend the structure which holds
      the firmware's response so that we can have access to this information.
      
      This will be needed in the next patches which also add support for
      interrupt coalescing which needs to be configured based on the
      frequency.
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cf0b6fe
    • David S. Miller's avatar
      Merge branch 'L4S-style-ce_threshold_ect1-marking' · f3fafbcb
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net/sched: implement L4S style ce_threshold_ect1 marking
      
      As suggested by Ingemar Johansson, Neal Cardwell, and others, fq_codel can be used
      for Low Latency, Low Loss, Scalable Throughput (L4S) with a small change.
      
      In ce_threshold_ect1 mode, only ECT(1) packets can be marked to CE if
      their sojourn time is above the threshold.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3fafbcb
    • Eric Dumazet's avatar
      fq_codel: implement L4S style ce_threshold_ect1 marking · e72aeb9e
      Eric Dumazet authored
      Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency,
      Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold.
      
      If enabled, only packets with ECT(1) can be transformed to CE
      if their sojourn time is above the ce_threshold.
      
      Note that this new option does not change rules for codel law.
      In particular, if TCA_FQ_CODEL_ECN is left enabled (this is
      the default when fq_codel qdisc is created), ECT(0) packets can
      still get CE if codel law (as governed by limit/target) decides so.
      
      Section 4.3.b of current draft [1] states:
      
      b.  A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can
          be used for L4S.  For instance within each queue of an FQ-CoDel
          system, as well as a CoDel AQM, there is typically also ECN
          marking at an immediate (unsmoothed) shallow threshold to support
          use in data centres (see Sec.5.2.7 of [RFC8290]).  This can be
          modified so that the shallow threshold is solely applied to
          ECT(1) packets.  Then if there is a flow of non-ECN or ECT(0)
          packets in the per-flow-queue, the Classic AQM (e.g.  CoDel) is
          applied; while if there is a flow of ECT(1) packets in the queue,
          the shallower (typically sub-millisecond) threshold is applied.
      
      Tested:
      
      tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec
      
      netperf ... -t TCP_STREAM -- K dctcp
      
      tc -s -d qd sh dev eth1
      qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64
       Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013)
       backlog 0b 0p requeues 152013
        maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639
        new_flows_len 0 old_flows_len 0
      
      [1] L4S current draft:
      https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-archSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
      Cc: Tom Henderson <tomh@tomh.org>
      Cc: Bob Briscoe <in@bobbriscoe.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e72aeb9e
    • Eric Dumazet's avatar
      net: add skb_get_dsfield() helper · 70e939dd
      Eric Dumazet authored
      skb_get_dsfield(skb) gets dsfield from skb, or -1
      if an error was found.
      
      This is basically a wrapper around ipv4_get_dsfield()
      and ipv6_get_dsfield().
      
      Used by following patch for fq_codel.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
      Cc: Tom Henderson <tomh@tomh.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70e939dd
    • Eric Dumazet's avatar
      tcp: switch orphan_count to bare per-cpu counters · 19757ceb
      Eric Dumazet authored
      Use of percpu_counter structure to track count of orphaned
      sockets is causing problems on modern hosts with 256 cpus
      or more.
      
      Stefan Bach reported a serious spinlock contention in real workloads,
      that I was able to reproduce with a netfilter rule dropping
      incoming FIN packets.
      
          53.56%  server  [kernel.kallsyms]      [k] queued_spin_lock_slowpath
                  |
                  ---queued_spin_lock_slowpath
                     |
                      --53.51%--_raw_spin_lock_irqsave
                                |
                                 --53.51%--__percpu_counter_sum
                                           tcp_check_oom
                                           |
                                           |--39.03%--__tcp_close
                                           |          tcp_close
                                           |          inet_release
                                           |          inet6_release
                                           |          sock_close
                                           |          __fput
                                           |          ____fput
                                           |          task_work_run
                                           |          exit_to_usermode_loop
                                           |          do_syscall_64
                                           |          entry_SYSCALL_64_after_hwframe
                                           |          __GI___libc_close
                                           |
                                            --14.48%--tcp_out_of_resources
                                                      tcp_write_timeout
                                                      tcp_retransmit_timer
                                                      tcp_write_timer_handler
                                                      tcp_write_timer
                                                      call_timer_fn
                                                      expire_timers
                                                      __run_timers
                                                      run_timer_softirq
                                                      __softirqentry_text_start
      
      As explained in commit cf86a086 ("net/dst: use a smaller percpu_counter
      batch for dst entries accounting"), default batch size is too big
      for the default value of tcp_max_orphans (262144).
      
      But even if we reduce batch sizes, there would still be cases
      where the estimated count of orphans is beyond the limit,
      and where tcp_too_many_orphans() has to call the expensive
      percpu_counter_sum_positive().
      
      One solution is to use plain per-cpu counters, and have
      a timer to periodically refresh this cache.
      
      Updating this cache every 100ms seems about right, tcp pressure
      state is not radically changing over shorter periods.
      
      percpu_counter was nice 15 years ago while hosts had less
      than 16 cpus, not anymore by current standards.
      
      v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
          reported by kernel test robot <lkp@intel.com>
          Remove unused socket argument from tcp_too_many_orphans()
      
      Fixes: dd24c001 ("net: Use a percpu_counter for orphan_count")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarStefan Bach <sfb@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19757ceb
    • Matt Johnston's avatar
      mctp: Avoid leak of mctp_sk_key · 0b93aed2
      Matt Johnston authored
      mctp_key_alloc() returns a key already referenced.
      
      The mctp_route_input() path receives a packet for a bind socket and
      allocates a key. It passes the key to mctp_key_add() which takes a
      refcount and adds the key to lists. mctp_route_input() should then
      release its own refcount when setting the key pointer to NULL.
      
      In the mctp_alloc_local_tag() path (for mctp_local_output()) we
      similarly need to unref the key before returning (mctp_reserve_tag()
      takes a refcount and adds the key to lists).
      
      Fixes: 73c61845 ("mctp: locking, lifetime and validity changes for sk_keys")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Reviewed-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b93aed2
    • David S. Miller's avatar
      Merge branch 'qca8337-improvements' · bf136673
      David S. Miller authored
      Ansuel Smith says:
      
      ====================
      Multiple improvement for qca8337 switch
      
      This series is the final step of a long process of porting 80+ devices
      to use the new qca8k driver instead of the hacky qca one based on never
      merged swconfig platform.
      Some background to justify all these additions.
      QCA used a special binding to declare raw initval to set the swich. I
      made a script to convert all these magic values and convert 80+ dts and
      scan all the needed "unsupported regs". We find a baseline where we
      manage to find the common and used regs so in theory hopefully we don't
      have to add anymore things.
      We discovered lots of things with this, especially about how differently
      qca8327 works compared to qca8337.
      
      In short, we found that qca8327 have some problem with suspend/resume for
      their internal phy. It instead sets some dedicated regs that suspend the
      phy without setting the standard bit. First 4 patch are to fix this.
      There is also a patch about preferring master. This is directly from the
      original driver and it seems to be needed to prevent some problem with
      the pause frame.
      
      Every ipq806x target sets the mac power sel and this specific reg
      regulates the output voltage of the regulator. Without this some
      instability can occur.
      
      Some configuration (for some reason) swap mac6 with mac0. We add support
      for this.
      Also, we discovered that some device doesn't work at all with pll enabled
      for sgmii line. In the original code this was based on the switch
      revision. In later revision the pll regs were decided based on the switch
      type (disabled for qca8327 and enabled for qca8337) but still some
      device had that disabled in the initval regs.
      Considering we found at least one qca8337 device that required pll
      disabled to work (no traffic problem) we decided to introduce a binding
      to enable pll and set it only with that.
      
      Lastly, we add support for led open drain that require the power-on-sel
      to set. Also, some device have only the power-on-sel set in the initval
      so we add also support for that. This is needed for the correct function
      of the switch leds.
      Qca8327 have a special reg in the pws regs that set it to a reduced
      48pin layout. This is needed or the switch doesn't work.
      
      These are all the special configuration we find on all these devices that
      are from various targets. Mostly ath79, ipq806x and bcm53xx.
      Changes v7:
      - Fix missing newline in yaml
      - Handle error with wrong cpu port detected
      - Move yaml commit as last to fix bot error
      
      Changes v6:
      - Convert Documentation to yaml
      - Add extra check for cpu port and invalid phy mode
      - Add co developed by tag to give credits to Matthew
      
      Changes v5:
      - Swap patch. Document first then implement.
      - Fix some grammar error reported.
      - Rework function. Remove phylink mac_config DT scan and move everything
        to dedicated function in probe.
      - Introduce new logic for delay selection where is also supported with
        internal delay declared and rgmii set as phy mode
      - Start working on ymal conversion. Will later post this in v6 when we
        finally take final decision about mac swap.
      
      Changes v4:
      - Fix typo in SGMII falling edge about using PHY id instead of
        switch id
      
      Changes v3:
      - Drop phy patches (proposed separateley)
      - Drop special pwr binding. Rework to ipq806x specific
      - Better describe compatible and add serial print on switch chip
      - Drop mac exchange. Rework falling edge and move it to mac_config
      - Add support for port 6 cpu port. Drop hardcoded cpu port to port0
      - Improve port stability with sgmii. QCA source have intenal delay also
        for sgmii
      - Add warning with pll enabled on wrong configuration
      
      Changes v2:
      - Reword Documentation patch to dt-bindings
      - Propose first 2 phy patch to net
      - Better describe and add hint on how to use all the new
        bindings
      - Rework delay scan function and move to phylink mac_config
      - Drop package48 wrong binding
      - Introduce support for qca8328 switch
      - Fix wrong binding name power-on-sel
      - Return error on wrong config with led open drain and
        ignore-power-on-sel not set
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf136673
    • Matthew Hagan's avatar
      dt-bindings: net: dsa: qca8k: convert to YAML schema · d291fbb8
      Matthew Hagan authored
      Convert the qca8k bindings to YAML format.
      Signed-off-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Co-developed-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d291fbb8
    • Ansuel Smith's avatar
      dt-bindings: net: ipq8064-mdio: fix warning with new qca8k switch · e52073a8
      Ansuel Smith authored
      Fix warning now that we have qca8k switch Documentation using yaml.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e52073a8
    • Ansuel Smith's avatar
      net: dsa: qca8k: move port config to dedicated struct · fd0bb28c
      Ansuel Smith authored
      Move ports related config to dedicated struct to keep things organized.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd0bb28c
    • Ansuel Smith's avatar
      net: dsa: qca8k: set internal delay also for sgmii · cef08115
      Ansuel Smith authored
      QCA original code report port instability and sa that SGMII also require
      to set internal delay. Generalize the rgmii delay function and apply the
      advised value if they are not defined in DT.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cef08115
    • Ansuel Smith's avatar
      net: dsa: qca8k: add support for QCA8328 · f477d1c8
      Ansuel Smith authored
      QCA8328 switch is the bigger brother of the qca8327. Same regs different
      chip. Change the function to set the correct pin layout and introduce a
      new match_data to differentiate the 2 switch as they have the same ID
      and their internal PHY have the same ID.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f477d1c8
    • Ansuel Smith's avatar
      dt-bindings: net: dsa: qca8k: document support for qca8328 · ed7988d7
      Ansuel Smith authored
      QCA8328 is the bigger brother of qca8327. Document the new compatible
      binding and add some information to understand the various switch
      compatible.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed7988d7
    • Ansuel Smith's avatar
      net: dsa: qca8k: add support for pws config reg · 362bb238
      Ansuel Smith authored
      Some qca8327 switch require to force the ignore of power on sel
      strapping. Some switch require to set the led open drain mode in regs
      instead of using strapping. While most of the device implements this
      using the correct way using pin strapping, there are still some broken
      device that require to be set using sw regs.
      Introduce a new binding and support these special configuration.
      As led open drain require to ignore pin strapping to work, the probe
      fails with EINVAL error with incorrect configuration.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      362bb238
    • Ansuel Smith's avatar
      dt-bindings: net: dsa: qca8k: Document qca,led-open-drain binding · 924087c5
      Ansuel Smith authored
      Document new binding qca,ignore-power-on-sel used to ignore
      power on strapping and use sw regs instead.
      Document qca,led-open.drain to set led to open drain mode, the
      qca,ignore-power-on-sel is mandatory with this enabled or an error will
      be reported.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      924087c5
    • Ansuel Smith's avatar
      net: dsa: qca8k: add explicit SGMII PLL enable · bbc4799e
      Ansuel Smith authored
      Support enabling PLL on the SGMII CPU port. Some device require this
      special configuration or no traffic is transmitted and the switch
      doesn't work at all. A dedicated binding is added to the CPU node
      port to apply the correct reg on mac config.
      Fail to correctly configure sgmii with qca8327 switch and warn if pll is
      used on qca8337 with a revision greater than 1.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbc4799e
    • Ansuel Smith's avatar
      dt-bindings: net: dsa: qca8k: Document qca,sgmii-enable-pll · 13ad5ccc
      Ansuel Smith authored
      Document qca,sgmii-enable-pll binding used in the CPU nodes to
      enable SGMII PLL on MAC config.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13ad5ccc
    • Ansuel Smith's avatar
      net: dsa: qca8k: rework rgmii delay logic and scan for cpu port 6 · 5654ec78
      Ansuel Smith authored
      Future proof commit. This switch have 2 CPU ports and one valid
      configuration is first CPU port set to sgmii and second CPU port set to
      rgmii-id. The current implementation detects delay only for CPU port
      zero set to rgmii and doesn't count any delay set in a secondary CPU
      port. Drop the current delay scan function and move it to the sgmii
      parser function to generalize and implicitly add support for secondary
      CPU port set to rgmii-id. Introduce new logic where delay is enabled
      also with internal delay binding declared and rgmii set as PHY mode.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5654ec78
    • Ansuel Smith's avatar
      net: dsa: qca8k: add support for cpu port 6 · 3fcf734a
      Ansuel Smith authored
      Currently CPU port is always hardcoded to port 0. This switch have 2 CPU
      ports. The original intention of this driver seems to be use the
      mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration
      where device have connected only the CPU port 6. To skip the
      introduction of a new binding, rework the driver to address the
      secondary CPU port as primary and drop any reference of hardcoded port.
      With configuration of mac06 exchange, just skip the definition of port0
      and define the CPU port as a secondary. The driver will autoconfigure
      the switch to use that as the primary CPU port.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fcf734a
    • Ansuel Smith's avatar
      dt-bindings: net: dsa: qca8k: Document support for CPU port 6 · 731d6133
      Ansuel Smith authored
      The switch now support CPU port to be set 6 instead of be hardcoded to
      0. Document support for it and describe logic selection.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      731d6133
    • Ansuel Smith's avatar
      net: dsa: qca8k: add support for sgmii falling edge · 6c43809b
      Ansuel Smith authored
      Add support for this in the qca8k driver. Also add support for SGMII
      rx/tx clock falling edge. This is only present for pad0, pad5 and
      pad6 have these bit reserved from Documentation. Add a comment that this
      is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and
      setting falling in port0 applies to both configuration with sgmii used
      for port0 or port6.
      Co-developed-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Signed-off-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c43809b
    • Ansuel Smith's avatar
      dt-bindings: net: dsa: qca8k: Add SGMII clock phase properties · fdbf35df
      Ansuel Smith authored
      Add names and descriptions of additional PORT0_PAD_CTRL properties.
      qca,sgmii-(rx|tx)clk-falling-edge are for setting the respective clock
      phase to failling edge.
      Co-developed-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Signed-off-by: default avatarMatthew Hagan <mnhagan88@gmail.com>
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdbf35df
    • Ansuel Smith's avatar
      dsa: qca8k: add mac_power_sel support · d8b6f5ba
      Ansuel Smith authored
      Add missing mac power sel support needed for ipq8064/5 SoC that require
      1.8v for the internal regulator port instead of the default 1.5v.
      If other device needs this, consider adding a dedicated binding to
      support this.
      Signed-off-by: default avatarAnsuel Smith <ansuelsmth@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8b6f5ba
    • Colin Ian King's avatar
      xen-netback: Remove redundant initialization of variable err · bacc8daf
      Colin Ian King authored
      The variable err is being initialized with a value that is never read, it
      is being updated immediately afterwards. The assignment is redundant and
      can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bacc8daf
    • Yunsheng Lin's avatar
      page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA · d00e60ee
      Yunsheng Lin authored
      As the 32-bit arch with 64-bit DMA seems to rare those days,
      and page pool might carry a lot of code and complexity for
      systems that possibly.
      
      So disable dma mapping support for such systems, if drivers
      really want to work on such systems, they have to implement
      their own DMA-mapping fallback tracking outside page_pool.
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d00e60ee
    • Jakub Kicinski's avatar
      Merge branch 'octeontx2-af-miscellaneous-changes-for-cpt' · 40088915
      Jakub Kicinski authored
      Srujana Challa says:
      
      ====================
      octeontx2-af: Miscellaneous changes for CPT
      
      This patchset consists of miscellaneous changes for CPT.
      First patch enables the CPT HW interrupts, second patch
      adds support for CPT LF teardown in non FLR path and
      final patch does CPT CTX flush in FLR handler.
      
      v2:
      - Fixed a warning reported by kernel test robot.
      ====================
      
      Link: https://lore.kernel.org/r/20211013055621.1812301-1-schalla@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      40088915