1. 16 Jan, 2023 3 commits
  2. 14 Jan, 2023 23 commits
  3. 13 Jan, 2023 14 commits
    • Yunhui Cui's avatar
      sock: add tracepoint for send recv length · 6e6eda44
      Yunhui Cui authored
      Add 2 tracepoints to monitor the tcp/udp traffic
      of per process and per cgroup.
      
      Regarding monitoring the tcp/udp traffic of each process, there are two
      existing solutions, the first one is https://www.atoptool.nl/netatop.php.
      The second is via kprobe/kretprobe.
      
      Netatop solution is implemented by registering the hook function at the
      hook point provided by the netfilter framework.
      
      These hook functions may be in the soft interrupt context and cannot
      directly obtain the pid. Some data structures are added to bind packets
      and processes. For example, struct taskinfobucket, struct taskinfo ...
      
      Every time the process sends and receives packets it needs multiple
      hashmaps,resulting in low performance and it has the problem fo inaccurate
      tcp/udp traffic statistics(for example: multiple threads share sockets).
      
      We can obtain the information with kretprobe, but as we know, kprobe gets
      the result by trappig in an exception, which loses performance compared
      to tracepoint.
      
      We compared the performance of tracepoints with the above two methods, and
      the results are as follows:
      
      ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html
      without trace:
      Time per request: 39.660 [ms] (mean)
      Time per request: 0.040 [ms] (mean, across all concurrent requests)
      
      netatop:
      Time per request: 50.717 [ms] (mean)
      Time per request: 0.051 [ms] (mean, across all concurrent requests)
      
      kr:
      Time per request: 43.168 [ms] (mean)
      Time per request: 0.043 [ms] (mean, across all concurrent requests)
      
      tracepoint:
      Time per request: 41.004 [ms] (mean)
      Time per request: 0.041 [ms] (mean, across all concurrent requests
      
      It can be seen that tracepoint has better performance.
      Signed-off-by: default avatarYunhui Cui <cuiyunhui@bytedance.com>
      Signed-off-by: default avatarXiongchun Duan <duanxiongchun@bytedance.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e6eda44
    • David S. Miller's avatar
      Merge branch 'rmnet-tx-pkt-aggregation' · 8e8b6c63
      David S. Miller authored
      Daniele Palmas says:
      
      ====================
      net: add tx packets aggregation to ethtool and rmnet
      
      Hello maintainers and all,
      
      this patchset implements tx qmap packets aggregation in rmnet and generic
      ethtool support for that.
      
      Some low-cat Thread-x based modems are not capable of properly reaching the maximum
      allowed throughput both in tx and rx during a bidirectional test if tx packets
      aggregation is not enabled.
      
      I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem
      (50Mbps/150Mbps max throughput). What is actually happening is pictured at
      https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view
      
      Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue
      in tx and minor issues in rx (not able to reach max throughput). When there are concurrent
      tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present.
      
      The same scenario with tx aggregation enabled is pictured at
      https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view
      showing a regular graph.
      
      This issue does not happen with high-cat modems (e.g. SDX20), or at least it
      does not happen at the throughputs I'm able to test currently: maybe the same
      could happen when moving close to the maximum rates supported by those modems.
      Anyway, having the tx aggregation enabled should not hurt.
      
      The first attempt to solve this issue was in qmi_wwan qmap implementation,
      see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/
      
      However, it turned out that rmnet was a better candidate for the implementation.
      
      Moreover, Greg and Jakub suggested also to use ethtool for the configuration:
      not sure if I got their advice right, but this patchset add also generic ethtool
      support for tx aggregation.
      
      The patches have been tested mainly against an MDM9207 based modem through USB
      and SDX55 through PCI (MHI).
      
      v2 should address the comments highlighted in the review: the implementation is
      still in rmnet, due to Subash's request of keeping tx aggregation there.
      
      v3 fixes ethtool-netlink.rst content out of table bounds and a W=1 build warning
      for patch 2.
      
      v4 solves a race related to egress_agg_params.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e8b6c63
    • Daniele Palmas's avatar
      net: qualcomm: rmnet: add ethtool support for configuring tx aggregation · db8a563a
      Daniele Palmas authored
      Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx
      aggregation settings.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db8a563a
    • Daniele Palmas's avatar
      net: qualcomm: rmnet: add tx packets aggregation · 64b5d1f8
      Daniele Palmas authored
      Add tx packets aggregation.
      
      Bidirectional TCP throughput tests through iperf with low-cat
      Thread-x based modems revelead performance issues both in tx
      and rx.
      
      The Windows driver does not show this issue: inspecting USB
      packets revealed that the only notable change is the driver
      enabling tx packets aggregation.
      
      Tx packets aggregation is by default disabled and can be enabled
      by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES.
      
      The maximum aggregated size is by default set to a reasonably low
      value in order to support the majority of modems.
      
      This implementation is based on patches available in Code Aurora
      repositories (msm kernel) whose main authors are
      
      Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Sean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Reviewed-by: default avatarSubash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64b5d1f8
    • Daniele Palmas's avatar
      ethtool: add tx aggregation parameters · 31de2842
      Daniele Palmas authored
      Add the following ethtool tx aggregation parameters:
      
      ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES
      Maximum size in bytes of a tx aggregated block of frames.
      
      ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
      Maximum number of frames that can be aggregated into a block.
      
      ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS
      Time in usecs after the first packet arrival in an aggregated
      block for the block to be sent.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31de2842
    • David S. Miller's avatar
      Merge branch 'dsa-microchip-ptp' · 9a06cce6
      David S. Miller authored
      Arun Ramadoss says:
      
      ====================
      net: dsa: microchip: add PTP support for KSZ9563/KSZ8563 and LAN937x
      
      KSZ9563/KSZ8563 and  LAN937x switch are capable for supporting IEEE 1588 PTP
      protocol.  LAN937x has the same PTP register set similar to KSZ9563, hence the
      implementation has been made common for the KSZ switches.  KSZ9563 does not
      support two step timestamping but LAN937x supports both.  Tested the 1step &
      2step p2p timestamping in LAN937x and p2p1step timestamping in KSZ9563.
      
      This patch series is based on the Christian Eggers PTP support for KSZ9563.
      Applied the Christian patch and updated as per the latest refactoring of KSZ
      series code. The features added on top are PTP packet Interrupt
      implementation based on nested handler, LAN937x two step timestamping and
      programmable per_out pins.
      
      Link: https://www.spinics.net/lists/netdev/msg705531.html
      
      Patch v7 -> v8
      - set skb->ip_summed = CHECKSUM_NONE after updating the checksum
      
      Patch v6 -> v7
      - Corrected the misplaced spaces and tabs
      - Added mutex lock in do_aux_work
      - Replaced 0/1 with false/true for ts_en
      - SKB_TX_INPROGRESS flag is set before dsa_enqueue_skb
      - Removed the fallthrough keyword
      - pdelay_resp header correction is performed based on
        KSZ_SKB_CB(skb)->update_correction instead of clone
      
      Patch v5 -> v6
      - Rebased to latest net-next and renamed from RFC to patch net-next.
      
      Patch v4 -> v5
      - Replaced irq_domain_add_simple with irq_doamin_add_linear
      - Used the helper diff_by_scaled_ppm() for adjfine.
      
      Patch v3 -> v4
      - removed IRQF_TRIGGER_FALLING from the request_threaded_irq of ptp msg
      - addressed review comments on patch 10 periodic output
      - added sign off in patch 6 & 9
      - reverted to set PTP_1STEP bit for lan937x which is missed during v3 regression
      
      Patch v2-> v3
      - used port_rxtstamp for reconstructing the absolute timestamp instead of
      tagger function pointer.
      - Reverted to setting of 802.1As bit.
      
      Patch v1 -> v2
      - GPIO perout enable bit is different for LAN937x and KSZ9x. Added new patch
      for configuring LAN937x programmable pins.
      - PTP enabled in hardware based on both tx and rx timestamping of all the user
      ports.
      - Replaced setting of 802.1AS bit with P2P bit in PTP_MSG_CONF1 register.
      
      RFC v2 -> Patch v1
      - Changed the patch author based on past patch submission
      - Changed the commit message prefix as net: dsa: microchip: ptp
      Individual patch changes are listed in correspondig commits.
      
      RFC v1 -> v2
      - Added the p2p1step timestamping and conditional execution of 2 step for
        LAN937x only.
      - Added the periodic output support
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a06cce6
    • Arun Ramadoss's avatar
      net: dsa: microchip: ptp: lan937x: Enable periodic output in LED pins · 168a5940
      Arun Ramadoss authored
      There is difference in implementation of per_out pins between KSZ9563
      and LAN937x. In KSZ9563, Timestamping control register (0x052C) bit 6,
      if 1 - timestamp input and 0 - trigger output. But it is opposite for
      LAN937x 1 - trigger output and 0 - timestamp input.
      As per per_out gpio pins, KSZ9563 has four Led pins and two dedicated
      gpio pins. But in LAN937x dedicated gpio pins are removed instead there
      are up to 10 LED pins out of which LED_0 and LED_1 can be mapped to PTP
      tou 0, 1 or 2. This patch sets the bit 6 in 0x052C register and
      configure the LED override and source register for LAN937x series of
      switches alone.
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      168a5940
    • Arun Ramadoss's avatar
      net: dsa: microchip: ptp: lan937x: add 2 step timestamping · d6261f0b
      Arun Ramadoss authored
      LAN937x series of switches support 2 step timestamping mechanism. There
      are timestamp correction calculation performed in ksz_rcv_timestamp and
      ksz_xmit_timestamp which are applicable only for p2p1step. To check
      whether the 2 step is enabled or not in tag_ksz.c introduced the helper
      function in taggger_data to query it from ksz_ptp.c. Based on whether 2
      step is enabled or not, timestamp calculation are performed.
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6261f0b
    • Arun Ramadoss's avatar
      net: dsa: microchip: ptp: add support for perout programmable pins · 343d3bd8
      Arun Ramadoss authored
      There are two programmable pins available for Trigger output unit to
      generate periodic pulses. This patch add verify_pin for the available 2
      pins and configure it with respect to GPIO index for the TOU unit.
      
      Tested using testptp
      ./testptp -i 0 -L 0,2
      ./testptp -i 0 -d /dev/ptp0 -p 1000000000
      ./testptp -i 1 -L 1,2
      ./testptp -i 1 -d /dev/ptp0 -p 100000000
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      343d3bd8
    • Christian Eggers's avatar
      net: dsa: microchip: ptp: add periodic output signal · 1f12ae5b
      Christian Eggers authored
      LAN937x and KSZ PTP supported switches has Three Trigger output unit.
      This TOU can used to generate the periodic signal for PTP. TOU has the
      cycle width register of 32 bit in size and period width register of 24
      bit, each value is of 8ns so the pulse width can be maximum 125ms.
      
      Tested using ./testptp -d /dev/ptp0 -p 1000000000 -w 100000000 for
      generating the 10ms pulse width
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Co-developed-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f12ae5b
    • Christian Eggers's avatar
      net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag · a32190b1
      Christian Eggers authored
      For PDelay_Resp messages we will likely have a negative value in the
      correction field. The switch hardware cannot correctly update such
      values (produces an off by one error in the UDP checksum), so it must be
      moved to the time stamp field in the tail tag. Format of the correction
      field is 48 bit ns + 16 bit fractional ns.  After updating the
      correction field, clone is no longer required hence it is freed.
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Co-developed-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a32190b1
    • Christian Eggers's avatar
      net: dsa: microchip: ptp: add packet transmission timestamping · ab32f56a
      Christian Eggers authored
      This patch adds the routines for transmission of ptp packets. When the
      ptp pdelay_req packet to be transmitted, it uses the deferred xmit
      worker to schedule the packets.
      During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are
      enabled. So interrupt is triggered for all three packets. But for
      p2p1step, we require only time stamp of Pdelay_req packet. Hence to
      avoid posting of the completion from ISR routine for Sync and
      Pdelay_resp packets, ts_en flag is introduced. This controls which
      packets need to processed for timestamp.
      After the packet is transmitted, ISR is triggered. The time at which
      packet transmitted is recorded to separate register.
      This value is reconstructed to absolute time and posted to the user
      application through socket error queue.
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Co-developed-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab32f56a
    • Christian Eggers's avatar
      net: dsa: microchip: ptp: add packet reception timestamping · 90188fff
      Christian Eggers authored
      Rx Timestamping is done through 4 additional bytes in tail tag.
      Whenever the ptp packet is received, the 4 byte hardware time stamped
      value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates
      it as PTP frame. This 4 byte value is extracted from the tail tag and
      reconstructed to absolute time and assigned to skb hwtstamp.
      If the packet received in PDelay_Resp, then partial ingress timestamp
      is subtracted from the correction field. Since user space tools expects
      to be done in hardware.
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Co-developed-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90188fff
    • Christian Eggers's avatar
      net: ptp: add helper for one-step P2P clocks · 2955762b
      Christian Eggers authored
      For P2P delay measurement, the ingress time stamp of the PDelay_Req is
      required for the correction field of the PDelay_Resp. The application
      echoes back the correction field of the PDelay_Req when sending the
      PDelay_Resp.
      
      Some hardware (like the ZHAW InES PTP time stamping IP core) subtracts
      the ingress timestamp autonomously from the correction field, so that
      the hardware only needs to add the egress timestamp on tx. Other
      hardware (like the Microchip KSZ9563) reports the ingress time stamp via
      an interrupt and requires that the software provides this time stamp via
      tail-tag on tx.
      
      In order to avoid introducing a further application interface for this,
      the driver can simply emulate the behavior of the InES device and
      subtract the ingress time stamp in software from the correction field.
      
      On egress, the correction field can either be kept as it is (and the
      time stamp field in the tail-tag is set to zero) or move the value from
      the correction field back to the tail-tag.
      
      Changing the correction field requires updating the UDP checksum (if UDP
      is used as transport).
      Signed-off-by: default avatarChristian Eggers <ceggers@arri.de>
      Co-developed-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2955762b