Commits · 32d0efabeec09bca0320f4136628699026dfba2f · Kirill Smelkov / linux

18 May, 2022 39 commits

dt-bindings: net: marvell,orion-mdio: Set unevaluatedProperties to false · 32d0efab

Chris Packham authored May 17, 2022

When the binding was converted it appeared necessary to set
'unevaluatedProperties: true' because of the switch devices on the
turris-mox board. Actually the error was because of the reg property
being incorrect causing the rest of the properties to be unevaluated.

After the reg properties are fixed for turris-mox we can set
'unevaluatedProperties: false' as is generally expected.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

32d0efab

arm64: dts: armada-3720-turris-mox: Correct reg property for mdio devices · 9fd914bb

Chris Packham authored May 17, 2022

MDIO devices have #address-cells = <1>, #size-cells = <0>. Now that we
have a schema enforcing this for marvell,orion-mdio we can see that the
turris-mox has a unnecessary 2nd cell for the switch nodes reg property
of it's switch devices. Remove the unnecessary 2nd cell from the
switches reg property.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fd914bb

Merge branch 'dsa-microchip-ksz_switch-refactor' · e8bacf40

David S. Miller authored May 18, 2022

Arun Ramadoss says:

====================
net: dsa: microchip: refactor the ksz switch init function

During the ksz_switch_register function, it calls the individual switches init
functions (ksz8795.c and ksz9477.c). Both these functions have few things in
common like, copying the chip specific data to struct ksz_dev, allocating
ksz_port memory and mib_names memory & cnt. And to add the new LAN937x series
switch, these allocations has to be replicated.
Based on the review feedback of LAN937x part support patch, refactored the
switch init function to move allocations to switch register.

Link:https://patchwork.kernel.org/project/netdevbpf/patch/20220504151755.11737-8-arun.ramadoss@microchip.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e8bacf40

net: dsa: microchip: remove unused members in ksz_device · 008db08b

Arun Ramadoss authored May 17, 2022

The name, regs_size and overrides members in struct ksz_device are
unused. Hence remove it.
And host_mask is used in only place of ksz8795.c file, which can be
replaced by dev->info->cpu_ports
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

008db08b

net: dsa: microchip: add the phylink get_caps · 65ac79e1

Arun Ramadoss authored May 17, 2022

This patch add the support for phylink_get_caps for ksz8795 and ksz9477
series switch. It updates the struct ksz_switch_chip with the details of
the internal phys and xmii interface. Then during the get_caps based on
the bits set in the structure, corresponding phy mode is set.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65ac79e1

net: dsa: move mib->cnt_ptr reset code to ksz_common.c · b094c679

Prasanna Vengateshan authored May 17, 2022

mib->cnt_ptr resetting is handled in multiple places as part of
port_init_cnt(). Hence moved mib->cnt_ptr code to ksz common layer
and removed from individual product files.
Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b094c679

net: dsa: microchip: move get_strings to ksz_common · 997d2126

Arun Ramadoss authored May 17, 2022

ksz8795 and ksz9477 uses the same algorithm for copying the ethtool
strings. Hence moved to ksz_common to remove the redundant code.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

997d2126

net: dsa: microchip: move port memory allocation to ksz_common · 198b3478

Arun Ramadoss authored May 17, 2022

ksz8795 and ksz9477 init function initializes the memory to dev->ports,
mib counters and assigns the ds real number of ports. Since both the
routines are same, moved the allocation of port memory to
ksz_switch_register after init.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

198b3478

net: dsa: microchip: move struct mib_names to ksz_chip_data · a530e6f2

Arun Ramadoss authored May 17, 2022

The ksz88xx family has one set of mib_names. The ksz87xx, ksz9477,
LAN937x based switches has one set of mib_names. In order to remove
redundant declaration, moved the struct mib_names to ksz_chip_data
structure.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a530e6f2

net: dsa: microchip: perform the compatibility check for dev probed · eee16b14

Arun Ramadoss authored May 17, 2022

This patch perform the compatibility check for the device after the chip
detect is done. It is to prevent the mismatch between the device
compatible specified in the device tree and actual device found during
the detect. The ksz9477 device doesn't use any .data in the
of_device_id. But the ksz8795_spi uses .data for assigning the regmap
between 8830 family and 87xx family switch. Changed the regmap
assignment based on the chip_id from the .data.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eee16b14

net: dsa: microchip: move ksz_chip_data to ksz_common · 462d5250

Arun Ramadoss authored May 17, 2022

This patch moves the ksz_chip_data in ksz8795 and ksz9477 to ksz_common.
At present, the dev->chip_id is iterated with the ksz_chip_data and then
copy its value to the ksz_dev structure. These values are declared as
constant.
Instead of copying the values and referencing it, this patch update the
dev->info to the ksz_chip_data based on the chip_id in the init
function. And also update the ksz_chip_data values for the LAN937x based
switches.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

462d5250

net: dsa: microchip: ksz8795: update the port_cnt value in ksz_chip_data · a30bf805

Arun Ramadoss authored May 17, 2022

The port_cnt value in the structure is not used in the switch_init.
Instead it uses the fls(chip->cpu_port), this is due to one of port in
the ksz8794 unavailable. The cpu_port for the 8794 is 0x10, fls(0x10) =
5, hence updating it directly in the ksz_chip_data structure in order to
same with all the other switches in ksz8795.c and ksz9477.c files.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a30bf805

Merge tag 'mlx5-updates-2022-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 6431ce6c

David S. Miller authored May 18, 2022

Saeed Mahameed says:

====================
mlx5-updates-2022-05-17

MISC updates to mlx5 dirver

1) Aya Levin allows relaxed ordering over VFs

2) Gal Pressman Adds support XDP SQs for uplink representors in switchdev mode

3) Add debugfs TC stats and command failure syndrome for debuggability

4) Tariq uses variants of vzalloc where it could help

5) Multiport eswitch support from Elic Cohen:

Eli Cohen Says:
===============

The multiport eswitch feature allows to forward traffic from a
representor net device to the uplink port of an associated eswitch's
uplink port.

This feature requires creating a LAG object. Since LAG can be created
only once for a function, the feature is mutual exclusive with either
bonding or multipath.

Multipath eswitch mode is entered automatically these conditions are
met:
1. No other LAG related mode is active.
2. A rule that explicitly forwards to an uplink port is inserted.

The implementation maintains a reference count on such rules. When the
reference count reaches zero, the LAG is released and other modes may be
used.

When an explicit rule that explicitly forwards to an uplink port is
inserted while another LAG mode is active, that rule will not be
offloaded by the hardware since the hardware cannot guarantee that the
rule will actually be forwarded to that port.

Example rules that forwards to an uplink port is:

$ tc filter add dev rep0 root flower action mirred egress \
  redirect dev uplinkrep0

$ tc filter add dev rep0 root flower action mirred egress \
  redirect dev uplinkrep1

This feature is supported only if LAG_RESOURCE_ALLOCATION firmware
configuration parameter is set to true.

The series consists of three patches:
1. Lag state machine refactor
   This patch does not add new functionality but rather changes the way
   the state of the LAG is maintained.
2. Small fix to remove unused argument.
3. The actual implementation of the feature.
===============

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6431ce6c

net/mlx5: Support multiport eswitch mode · 94db3317

Eli Cohen authored Jan 31, 2022

Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.

This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.

To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.

An example of such rule would be:

$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
  00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0

If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.

logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

94db3317

net/mlx5: Remove unused argument · a4a9c87e

Eli Cohen authored Feb 06, 2022

Argument ndev is not used in mlx5_handle_changeupper_event()
Remove it.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

a4a9c87e

net/mlx5: Lag, refactor lag state machine · ef9a3a4a

Eli Cohen authored Jan 24, 2022

LAG state machine is implemented using bit flags. However, all these bit
flags, except for MLX5_LAG_FLAG_HASH_BASED, are really mutual exclusive.

In addition, MLX5_LAG_FLAG_READY is used by bonding to mark if we have
our netdevices successfully added to lag and does not really belong in
the same flags variable as the other flags.

Rename MLX5_LAG_FLAG_READY to MLX5_LAG_FLAG_NDEVS_READY to better
reflect its purpose and put it in a new flags variable.

For the rest of the flags, we introduce a mode enum to hold the state
of the LAG.

Remove the shared fdb boolean flag from struct mlx5_lag and store this
configuration as a mode flag.

Change all flag related operations to use standard Linux APIs.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ef9a3a4a

net/mlx5e: Add XDP SQs to uplink representors steering tables · 65810a2d

Gal Pressman authored Mar 02, 2022

This patch adds the XDP SQs to the uplink representors steering tables
in swichdev mode and enables XDP usage on them.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

65810a2d

net/mlx5e: Correct the calculation of max channels for rep · 6d0ba493

Moshe Tal authored Apr 27, 2022

Correct the calculation of maximum channels of rep to better utilize
the hardware resources and allow a larger scale of reps.

This will allow creation of all virtual ports configured.

Fixes: 473baf2e ("net/mlx5e: Allow profile-specific limitation on max num of channels")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

6d0ba493

net/mlx5e: CT: Add ct driver counters · 77422a8f

Saeed Mahameed authored May 17, 2022

Connection offload is translated to multiple rules over several
hardware flow tables. Unhandled end-cases may cause a hardware
resource leak causing multiple system symptoms such as a host
memory leak, decreased performance and other scale related issues.

Export the current number of firmware FTEs related to the CT table
as a debugfs counter. Also add a dropped packets counter to help
debug packets dropped on restore failure.

To show the offloaded count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded

To show the dropped count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Roi Dayan <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>

77422a8f

net/mlx5e: Allow relaxed ordering over VFs · f05ec8d9

Aya Levin authored Apr 11, 2022

By PCI spec, the config space of the VF always report relaxed ordering
not supported while it inherits this property from its PF. Hence using
pcie_relaxed_ordering_enable(), always disables the relaxed ordering on
all VFs. Remove this check and rely on the firmware which queries the
config space of the PF and set the capability bit accordingly.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Marina Varshaver <marinav@nvidia.com>
Reviewed-by: Gal Shalom <galshalom@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

f05ec8d9

net/mlx5e: Support partial GSO for tunnels over vlans · 682adfa6

Gal Pressman authored Mar 31, 2022

Offloading outer checksum on tunnels requires GSO partial, add it to
'vlan_features' to allow offloading tunnels over vlans.
For example, running GENEVE over vlan & ipv6 (mandatory UDP checksum)
now allows for hardware TSO instead of software segmentation in GSO
only.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

682adfa6

net/mlx5e: IPoIB, Improve ethtool rxnfc callback structure in IPoIB · 675b9d51

Gal Pressman authored Apr 11, 2022

Followup commit
79ce39be ("net/mlx5e: Improve ethtool rxnfc callback structure")
and handle CONFIG_MLX5_EN_RXNFC enabled/disabled inside the fs layer so
the ethtool callbacks are always available. The fs layer will provide
stubs when CONFIG_MLX5_EN_RXNFC is compiled out.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

675b9d51

net/mlx5e: Allocate virtually contiguous memory for reps structures · 597c1123

Tariq Toukan authored Apr 27, 2021

Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

597c1123

net/mlx5e: Allocate virtually contiguous memory for VLANs list · 035e0dd5

Tariq Toukan authored Apr 27, 2021

Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

035e0dd5

net/mlx5: Allocate virtually contiguous memory in pci_irq.c · 88468311

Tariq Toukan authored Apr 27, 2021

Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

88468311

net/mlx5: Allocate virtually contiguous memory in vport.c · 773c104d

Tariq Toukan authored Apr 27, 2021

Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

773c104d

net/mlx5: Inline db alloc API function · 9b45bde8

Tariq Toukan authored Jan 25, 2022

Take the wrapper version which picks default node into a header file.
This reduces the number of exported functions.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

9b45bde8

net/mlx5: Add last command failure syndrome to debugfs · 1d2c717b

Moshe Shemesh authored May 13, 2022

Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

1d2c717b

net/mlx5: sparse: error: context imbalance in 'mlx5_vf_get_core_dev' · 4c7c8a6d
Saeed Mahameed authored Apr 18, 2022
```
Removing the annotation resolves the issue for some reason.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
```
4c7c8a6d

octeontx2-pf: Add support for adaptive interrupt coalescing · 6e144b47

Suman Ghosh authored May 17, 2022

Added support for adaptive IRQ coalescing. It uses net_dim
algorithm to find the suitable delay/IRQ count based on the
current packet rate.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220517044055.876158-1-sumang@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6e144b47

dn_route: set rt neigh to blackhole_netdev instead of loopback_dev in ifdown · 9cc34128

Xin Long authored May 16, 2022

Like other places in ipv4/6 dst ifdown, change to use blackhole_netdev
instead of pernet loopback_dev in dn dst ifdown.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/0cdf10e5a4af509024f08644919121fb71645bc2.1652751029.git.lucien.xin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9cc34128

ptp: ptp_clockmatrix: return -EBUSY if phase pull-in is in progress · 7c7dcd66

Min Li authored May 16, 2022

Also removes PEROUT_ENABLE_OUTPUT_MASK
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/1652712427-14703-2-git-send-email-min.li.xe@renesas.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7c7dcd66

ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support · bec67592

Min Li authored May 16, 2022

Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY
for gettime and settime exclusively. Before this change,
TOD_READ_PRIMARY was used for both extts and gettime/settime,
which would result in changing TOD read/write triggers between
operations. Using TOD_READ_SECONDARY would make extts
independent of gettime/settime operation
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/1652712427-14703-1-git-send-email-min.li.xe@renesas.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bec67592

net: smc911x: replace ternary operator with min() · 5ff0348b

Guo Zhengkui authored May 16, 2022

Fix the following coccicheck warning:

drivers/net/ethernet/smsc/smc911x.c:483:20-22: WARNING opportunity for min()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Link: https://lore.kernel.org/r/20220516115627.66363-1-guozhengkui@vivo.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5ff0348b

net: thunderx: remove null check after call container_of() · ab4d6357

Haowen Bai authored May 16, 2022

container_of() will never return NULL, so remove useless code.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Link: https://lore.kernel.org/r/1652696212-17516-1-git-send-email-baihaowen@meizu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ab4d6357

octeontx2-pf: Use memset_startat() helper in otx2_stop() · 76e1e5df

Xiu Jianfeng authored May 16, 2022

Use memset_startat() helper to simplify the code, there is no functional
change in this patch.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220516092337.131653-1-xiujianfeng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

76e1e5df

Merge branch 'net-smc-send-and-write-inline-optimization-for-smc' · 68a0bd67

Jakub Kicinski authored May 17, 2022

Guangguan Wang says:

====================
net/smc: send and write inline optimization for smc

Send cdc msgs and write data inline if qp has sufficent inline
space, helps latency reducing.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-1.3us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      10.6 us (-1.3 us)
2B          11.7 us      10.7 us (-1.0 us)
4B          11.7 us      10.7 us (-1.0 us)
8B          11.6 us      10.6 us (-1.0 us)
16B         11.7 us      10.7 us (-1.0 us)
32B         11.7 us      10.6 us (-1.1 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.3 us (-0.5 us)
1KB         11.9 us      11.5 us (-0.4 us)
2KB         12.1 us      11.5 us (-0.6 us)
====================

Link: https://lore.kernel.org/r/20220516055137.51873-1-guangguan.wang@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

68a0bd67

net/smc: rdma write inline if qp has sufficient inline space · 793a7df6

Guangguan Wang authored May 16, 2022

Rdma write with inline flag when sending small packages,
whose length is shorter than the qp's max_inline_data, can
help reducing latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.5us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.2 us      10.6 us (-0.6 us)
2B          11.2 us      10.7 us (-0.5 us)
4B          11.3 us      10.7 us (-0.6 us)
8B          11.2 us      10.6 us (-0.6 us)
16B         11.3 us      10.7 us (-0.6 us)
32B         11.3 us      10.6 us (-0.7 us)
64B         11.2 us      11.2 us (0 us)
128B        11.2 us      11.2 us (0 us)
256B        11.2 us      11.2 us (0 us)
512B        11.4 us      11.3 us (-0.1 us)
1KB         11.4 us      11.5 us (0.1 us)
2KB         11.5 us      11.5 us (0 us)
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

793a7df6

net/smc: send cdc msg inline if qp has sufficient inline space · b632eb06

Guangguan Wang authored May 16, 2022

As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      11.2 us (-0.7 us)
2B          11.7 us      11.2 us (-0.5 us)
4B          11.7 us      11.3 us (-0.4 us)
8B          11.6 us      11.2 us (-0.4 us)
16B         11.7 us      11.3 us (-0.4 us)
32B         11.7 us      11.3 us (-0.4 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.4 us (-0.4 us)
1KB         11.9 us      11.4 us (-0.5 us)
2KB         12.1 us      11.5 us (-0.6 us)
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b632eb06

17 May, 2022 1 commit

net: phy: marvell: Add errata section 5.1 for Alaska PHY · 65a9dedc

Leszek Polak authored May 16, 2022

As per Errata Section 5.1, if EEE is intended to be used, some register
writes must be done once after every hardware reset. This patch now adds
the necessary register writes as listed in the Marvell errata.

Without this fix we experience ethernet problems on some of our boards
equipped with a new version of this ethernet PHY (different supplier).

The fix applies to Marvell Alaska 88E1510/88E1518/88E1512/88E1514
Rev. A0.
Signed-off-by: Leszek Polak <lpolak@arri.de>
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220516070859.549170-1-sr@denx.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>

65a9dedc