Commits · 1c74b89171c39eabc0faba7eb0500c3d73a0e1d1 · Kirill Smelkov / linux

25 Aug, 2021 40 commits

octeontx2-af: Wait for TX link idle for credits change · 1c74b891

Nithin Dabilpuram authored Aug 25, 2021

NIX_AF_TX_LINKX_NORM_CREDIT holds running counter of
tx credits available per link. But, tx credits should be
configured based on MTU config. So MTU change needs tx
credit count update.

An issue exists whereby when both PF & VF are enabled and
PF traffic is flowing, if VF requests for MTU update,
updating the NORM_CREDIT register will lead to corruption
of credit count and subsequent deadlock of tx link as
the NORM_CREDIT register holds running count.

This patch provides workaround by pausing link traffic
using NIX_AF_TL1X_SW_XOFF, waiting for existing packets to
drain, and used credits be returned before updating new
credit count.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c74b891

octeontx2-af: Change the order of queue work and interrupt disable · 906999c9

Nithin Dabilpuram authored Aug 25, 2021

Clear and disable interrupt before queueing work as there might be
a chance that work gets completed on other core faster and
interrupt enable as a part of the work completes before
interrupt disable in the interrupt context. This leads to
permanent disable of interrupt.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

906999c9

octeontx2-af: cn10k: Set cache lines for NPA batch alloc · ae2c341e

Geetha sowjanya authored Aug 25, 2021

Set NPA batch allocation engine to process 35 cache lines
per turn on CN10k platform.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae2c341e

mctp: Remove the repeated declaration · 87e5ef4b

Shaokun Zhang authored Aug 25, 2021

Function 'mctp_dev_get_rtnl' is declared twice, so remove the
repeated declaration.

Cc: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Matt Johnston <matt@codeconstruct.com.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87e5ef4b

Merge tag 'linux-can-next-for-5.15-20210825' of... · 45bc6125

David S. Miller authored Aug 25, 2021

Merge tag 'linux-can-next-for-5.15-20210825' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine says:

====================
pull-request: can-next 2021-08-25

this is a pull request of 4 patches for net-next/master.

The first patch is by Cai Huoqing, and enables COMPILE_TEST for the
rcar CAN drivers.

Lad Prabhakar contributes a patch for the rcar_canfd driver, fixing a
redundant assignment.

The last 2 patches are by Tang Bin, target the mscan driver, and clean
up the driver by converting it to of_device_get_match_data() and
removing a useless BUG_ON.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

45bc6125

Merge branch 'ravb-gbit-refactor' · b87a542c

David S. Miller authored Aug 25, 2021

Biju Das says:

====================
Add Factorisation code to support Gigabit Ethernet driver

The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP.

The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal
TCP/IP Offload Engine (TOE)  and Dedicated Direct memory access controller
(DMAC).

With a few changes in the driver we can support both IPs.

This patch series aims to add factorisation code to support RZ/G2L SoC,
hardware feature bits for gPTP feature, Multiple irq feature and
optional reset support.

Ref:-
 * https://lore.kernel.org/linux-renesas-soc/TYCPR01MB59334319695607A2683C1A5E86E59@TYCPR01MB5933.jpnprd01.prod.outlook.com/T/#t
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b87a542c

ravb: Add reset support · 0d13a1a4

Biju Das authored Aug 25, 2021

Reset support is present on R-Car. Let's support it, if it is
available.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d13a1a4

ravb: Factorise ravb_emac_init function · 511d74d9

Biju Das authored Aug 25, 2021

The E-MAC IP on the R-Car AVB module has different initialization
parameters for RX frame size, duplex settings, different offset
for transfer speed setting and has magic packet detection support
compared to E-MAC on RZ/G2L Gigabit Ethernet module. Factorise
the ravb_emac_init function to support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

511d74d9

ravb: Factorise ravb_dmac_init function · eb4fd127

Biju Das authored Aug 25, 2021

The DMAC IP on the R-Car AVB module has different initialization
parameters for RCR, TGC, TCCR, RIC0, RIC2, and TIC compared to
DMAC IP on the RZ/G2L Gigabit Ethernet module. Factorise the
ravb_dmac_init function to support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb4fd127

ravb: Factorise ravb_set_features · 80f35a0d

Biju Das authored Aug 25, 2021

RZ/G2L supports HW checksum on RX and TX whereas R-Car supports on RX.
Factorise ravb_set_features to support this feature.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80f35a0d

ravb: Factorise ravb_adjust_link function · cb21104f

Biju Das authored Aug 25, 2021

R-Car supports 100 and 1000 Mbps transfer speed whereas RZ/G2L
in addition support 10Mbps. Factorise ravb_adjust_link function
in order to support 10Mbps speed.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb21104f

ravb: Factorise ravb_rx function · d5d95c11

Biju Das authored Aug 25, 2021

R-Car uses an extended descriptor in RX whereas, RZ/G2L uses
normal descriptor in RX. Factorise the ravb_rx function to
support the later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5d95c11

ravb: Factorise ravb_ring_init function · 7870a418

Biju Das authored Aug 25, 2021

The ravb_ring_init function uses an extended descriptor in RX for
R-Car and normal descriptor for RZ/G2L. Add a helper function
for RX ring buffer allocation to support later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7870a418

ravb: Factorise ravb_ring_format function · 1ae22c19

Biju Das authored Aug 25, 2021

The ravb_ring_format function uses an extended descriptor in RX
for R-Car compared to the normal descriptor for RZ/G2L. Factorise
RX ring buffer buildup to extend the support for later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ae22c19

ravb: Factorise ravb_ring_free function · bf46b757

Biju Das authored Aug 25, 2021

R-Car uses extended descriptor in RX, whereas RZ/G2L uses normal
descriptor. Factorise ravb_ring_free function so that it can
support later SoC.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf46b757

ravb: Add ptp_cfg_active to struct ravb_hw_info · a69a3d09

Biju Das authored Aug 25, 2021

There are some H/W differences for the gPTP feature between
R-Car Gen3, R-Car Gen2, and RZ/G2L as below.

1) On R-Car Gen3, gPTP support is active in config mode.
2) On R-Car Gen2, gPTP support is not active in config mode.
3) RZ/G2L does not support the gPTP feature.

Add a ptp_cfg_active hw feature bit to struct ravb_hw_info for
supporting gPTP active in config mode for R-Car Gen3.
This patch also removes enum ravb_chip_id, chip_id from both
struct ravb_hw_info and struct ravb_private, as it is unused.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a69a3d09

ravb: Add no_ptp_cfg_active to struct ravb_hw_info · 8f27219a

Biju Das authored Aug 25, 2021

There are some H/W differences for the gPTP feature between
R-Car Gen3, R-Car Gen2, and RZ/G2L as below.

1) On R-Car Gen2, gPTP support is not active in config mode.
2) On R-Car Gen3, gPTP support is active in config mode.
3) RZ/G2L does not support the gPTP feature.

Add a no_ptp_cfg_active hw feature bit to struct ravb_hw_info for
handling gPTP for R-Car Gen2.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8f27219a

ravb: Add multi_irq to struct ravb_hw_info · 6de19fa0

Biju Das authored Aug 25, 2021

R-Car Gen3 supports separate interrupts for E-MAC and DMA queues,
whereas R-Car Gen2 and RZ/G2L have a single interrupt instead.

Add a multi_irq hw feature bit to struct ravb_hw_info to enable
this only for R-Car Gen3.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6de19fa0

ravb: Remove the macros NUM_TX_DESC_GEN[23] · c81d8942

Biju Das authored Aug 25, 2021

For addressing 4 bytes alignment restriction on transmission
buffer for R-Car Gen2 we use 2 descriptors whereas it is a single
descriptor for other cases.
Replace the macros NUM_TX_DESC_GEN[23] with magic number and
add a comment to explain it.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c81d8942

Merge branch 'dsa-sja1105-vlan-tags' · 6956fa39

David S. Miller authored Aug 25, 2021

Vladimir Oltean says:

====================
Make sja1105 treat tag_8021q VLANs more like real DSA tags

This series solves a nuisance with the sja1105 driver, which is that
non-DSA tagged packets sent directly by the DSA master would still exit
the switch just fine.

We also had an issue for packets coming from the outside world with a
crafted DSA tag, the switch would not reject that tag but think it was
valid.
====================

6956fa39

net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160

Vladimir Oltean authored Aug 24, 2021

Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
function solved quite a different problem than our needs are now.

Then, we used best-effort VLAN filtering and we were using the xmit_tpid
to tunnel packets coming from an 8021q upper through the TX VLAN allocated
by tag_8021q to that egress port. The need for a different VLAN protocol
depending on switch revision came from the fact that this in itself was
more of a hack to trick the hardware into accepting tunneled VLANs in
the first place.

Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
supported them again, we would not do that using the same method of
{tunneling the VLAN on egress, retagging the VLAN on ingress} that we
had in the best-effort VLAN filtering mode. It seems rather simpler that
we just allocate a VLAN in the VLAN table that is simply not used by the
bridge at all, or by any other port.

Anyway, I have 2 gripes with the current sja1105_xmit_tpid:

1. When sending packets on behalf of a VLAN-aware bridge (with the new
TX forwarding offload framework) plus untagged (with the tag_8021q
VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
through the DSA master have a VLAN protocol of 0x8100 and others of
0x88a8. This is strange and there is no reason for it now. If we have
a bridge and are therefore forced to send using that bridge's TPID,
we can as well blend with that bridge's VLAN protocol for all packets.

2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
because it looks inside dp->priv. It is desirable to keep as much
separation between taggers and switch drivers as possible. Now it
doesn't do that anymore.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ded9160

net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e

Vladimir Oltean authored Aug 24, 2021

The sja1105 driver is a bit special in its use of VLAN headers as DSA
tags. This is because in VLAN-aware mode, the VLAN headers use an actual
TPID of 0x8100, which is understood even by the DSA master as an actual
VLAN header.

Furthermore, control packets such as PTP and STP are transmitted with no
VLAN header as a DSA tag, because, depending on switch generation, there
are ways to steer these control packets towards a precise egress port
other than VLAN tags. Transmitting control packets as untagged means
leaving a door open for traffic in general to be transmitted as untagged
from the DSA master, and for it to traverse the switch and exit a random
switch port according to the FDB lookup.

This behavior is a bit out of line with other DSA drivers which have
native support for DSA tagging. There, it is to be expected that the
switch only accepts DSA-tagged packets on its CPU port, dropping
everything that does not match this pattern.

We perhaps rely a bit too much on the switches' hardware dropping on the
CPU port, and place no other restrictions in the kernel data path to
avoid that. For example, sja1105 is also a bit special in that STP/PTP
packets are transmitted using "management routes"
(sja1105_port_deferred_xmit): when sending a link-local packet from the
CPU, we must first write a SPI message to the switch to tell it to
expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
it towards port 3 when it gets it. This entry expires as soon as it
matches a packet received by the switch, and it needs to be reinstalled
for the next packet etc. All in all quite a ghetto mechanism, but it is
all that the sja1105 switches offer for injecting a control packet.
The driver takes a mutex for serializing control packets and making the
pairs of SPI writes of a management route and its associated skb atomic,
but to be honest, a mutex is only relevant as long as all parties agree
to take it. With the DSA design, it is possible to open an AF_PACKET
socket on the DSA master net device, and blast packets towards
01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
it all goes kaput because management routes installed by the driver will
match skbs sent by the DSA master, and not skbs generated by the driver
itself. So they will end up being routed on the wrong port.

So through the lens of that, maybe it would make sense to avoid that
from happening by doing something in the network stack, like: introduce
a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
dev_hard_start_xmit(), introduce the following check:

if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
kfree_skb(skb);

Ok, maybe that is a bit drastic, but that would at least prevent a bunch
of problems. For example, right now, even though the majority of DSA
switches drop packets without DSA tags sent by the DSA master (and
therefore the majority of garbage that user space daemons like avahi and
udhcpcd and friends create), it is still conceivable that an aggressive
user space program can open an AF_PACKET socket and inject a spoofed DSA
tag directly on the DSA master. We have no protection against that; the
packet will be understood by the switch and be routed wherever user
space says. Furthermore: there are some DSA switches where we even have
register access over Ethernet, using DSA tags. So even user space
drivers are possible in this way. This is a huge hole.

However, the biggest thing that bothers me is that udhcpcd attempts to
ask for an IP address on all interfaces by default, and with sja1105, it
will attempt to get a valid IP address on both the DSA master as well as
on sja1105 switch ports themselves. So with IP addresses in the same
subnet on multiple interfaces, the routing table will be messed up and
the system will be unusable for traffic until it is configured manually
to not ask for an IP address on the DSA master itself.

It turns out that it is possible to avoid that in the sja1105 driver, at
least very superficially, by requesting the switch to drop VLAN-untagged
packets on the CPU port. With the exception of control packets, all
traffic originated from tag_sja1105.c is already VLAN-tagged, so only
STP and PTP packets need to be converted. For that, we need to uphold
the equivalence between an untagged and a pvid-tagged packet, and to
remember that the CPU port of sja1105 uses a pvid of 4095.

Now that we drop untagged traffic on the CPU port, non-aggressive user
space applications like udhcpcd stop bothering us, and sja1105 effectively
becomes just as vulnerable to the aggressive kind of user space programs
as other DSA switches are (ok, users can also create 8021q uppers on top
of the DSA master in the case of sja1105, but in future patches we can
easily deny that, but it still doesn't change the fact that VLAN-tagged
packets can still be injected over raw sockets).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0b8c67e

net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83

Vladimir Oltean authored Aug 24, 2021

Currently it is possible for an attacker to craft packets with a fake
DSA tag and send them to us, and our user ports will accept them and
preserve that VLAN when transmitting towards the CPU. Then the tagger
will be misled into thinking that the packets came on a different port
than they really came on.

Up until recently there wasn't a good option to prevent this from
happening. In SJA1105P and later, the MAC Configuration Table introduced
two options called:
- DRPSITAG: Drop Single Inner Tagged Frames
- DRPSOTAG: Drop Single Outer Tagged Frames

Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
it would be in principle possible to enable the DRPSOTAG bit on ports
using tag_8021q, and drop on ingress all packets which have a VLAN tag.
When the switch is VLAN-unaware, this works, because it uses a custom
TPID of 0xdadb, so any "tagged" packets received on a user port are
probably a spoofing attempt. But when the switch overall is VLAN-aware,
and some ports are standalone (therefore they use tag_8021q), the TPID
is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
packets. The untagged ones will be classified to the tag_8021q pvid, and
the tagged ones to the VLAN ID from the packet header. Yes, it is true
that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
ports") we no longer support this mixed mode, but that is a temporary
limitation which will eventually be lifted. It would be nice to not
introduce one more restriction via DRPSOTAG, which would make the
standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
packets.

Also, the DRPSOTAG bit is not available on the first generation of
switches (SJA1105E, SJA1105T). So since one of the key features of this
driver is compatibility across switch generations, this makes it an even
less desirable approach.

The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
make sure untagged packets are dropped on ingress ports with no pvid"),
where it became obvious that untagged packets are not dropped even if
the ingress port is not in the VMEMB_PORT vector of that port's pvid.
However, VLAN-tagged packets are subject to VLAN ingress
checking/dropping. This means that instead of using the catch-all
DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
per-VLAN basis, and this is already compatible with SJA1105E/T.

This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
VLANs still need to be allowed, of course, on ingress to DSA ports and
CPU ports.

We also need to refine the drop_untagged check in sja1105_commit_pvid to
make it not freak out about this new configuration. Currently it will
try to keep the configuration consistent between untagged and pvid-tagged
packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
packets tagged with VID 1 will behave the same as untagged packets, and
be dropped. This behavior is what we want for ports under a VLAN-aware
bridge, but for the ports with a tag_8021q pvid, we want untagged
packets to be accepted, but packets tagged with a header recognized by
the switch as a tag_8021q VLAN to be dropped. So only restrict the
drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73ceab83

net: dsa: mt7530: manually set up VLAN ID 0 · 1ca8a193

DENG Qingfang authored Aug 25, 2021

The driver was relying on dsa_slave_vlan_rx_add_vid to add VLAN ID 0. After
the blamed commit, VLAN ID 0 won't be set up anymore, breaking software
bridging fallback on VLAN-unaware bridges.

Manually set up VLAN ID 0 to fix this.

Fixes: 06cfb2df ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ca8a193

Merge branch 'mana-EQ-sharing' · e93826d3

David S. Miller authored Aug 25, 2021

Haiyang Zhang says:

====================
net: mana: Add support for EQ sharing

The existing code uses (1 + #vPorts * #Queues) MSIXs, which may exceed
the device limit.

Support EQ sharing, so that multiple vPorts can share the same set of
MSIXs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e93826d3

net: mana: Add WARN_ON_ONCE in case of CQE read overflow · c1a3e9f9

Haiyang Zhang authored Aug 24, 2021

This is not an expected case normally.
Add WARN_ON_ONCE in case of CQE read overflow, instead of failing
silently.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1a3e9f9

net: mana: Add support for EQ sharing · 1e2d0824

Haiyang Zhang authored Aug 24, 2021

The existing code uses (1 + #vPorts * #Queues) MSIXs, which may exceed
the device limit.

Support EQ sharing, so that multiple vPorts (NICs) can share the same
set of MSIXs.

And, report the EQ-sharing capability bit to the host, which means the
host can potentially offer more vPorts and queues to the VM.

Also update the resource limit checking and error handling for better
robustness.

Now, we support up to 256 virtual ports per VF (it was 16/VF), and
support up to 64 queues per vPort (it was 16).
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e2d0824

net: mana: Move NAPI from EQ to CQ · e1b5683f

Haiyang Zhang authored Aug 24, 2021

The existing code has NAPI threads polling on EQ directly. To prepare
for EQ sharing among vPorts, move NAPI from EQ to CQ so that one EQ
can serve multiple CQs from different vPorts.

The "arm bit" is only set when CQ processing is completed to reduce
the number of EQ entries, which in turn reduce the number of interrupts
on EQ.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1b5683f

netxen_nic: Remove the repeated declaration · 807d1032

Shaokun Zhang authored Aug 25, 2021

Function 'netxen_rom_fast_read' is declared twice, so remove the
repeated declaration.

Cc: Manish Chopra <manishc@marvell.com>
Cc: Rahul Verma <rahulv@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

807d1032

cxgb4: Properly revert VPD changes · bc4f128d

Nathan Chancellor authored Aug 24, 2021

Clang warns:

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2785:2: error: variable 'kw_offset' is uninitialized when used here [-Werror,-Wuninitialized]
        FIND_VPD_KW(i, "RV");
        ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2776:39: note: expanded from macro 'FIND_VPD_KW'
        var = pci_vpd_find_info_keyword(vpd, kw_offset, vpdr_len, name); \
                                             ^~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2748:34: note: initialize the variable 'kw_offset' to silence this warning
        unsigned int vpdr_len, kw_offset, id_len;
                                        ^
                                         = 0
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2785:2: error: variable 'vpdr_len' is uninitialized when used here [-Werror,-Wuninitialized]
        FIND_VPD_KW(i, "RV");
        ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2776:50: note: expanded from macro 'FIND_VPD_KW'
        var = pci_vpd_find_info_keyword(vpd, kw_offset, vpdr_len, name); \
                                                        ^~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2748:23: note: initialize the variable 'vpdr_len' to silence this warning
        unsigned int vpdr_len, kw_offset, id_len;
                             ^
                              = 0
2 errors generated.

The series "PCI/VPD: Convert more users to the new VPD API functions"
was applied to net-next when it should have been applied to the PCI tree
because of build errors. However, commit 82e34c8a ("Revert "Revert
"cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()""") reapplied a
change, resulting in the warning above.

Properly revert commit 8d63ee60 ("cxgb4: Search VPD with
pci_vpd_find_ro_info_keyword()") to fix the warning and restore proper
functionality. This also reverts commit 3a93bede ("cxgb4: Remove
unused vpd_param member ec") to avoid future merge conflicts, as that
change has been applied to the PCI tree.

Link: https://lore.kernel.org/r/20210823120929.7c6f7a4f@canb.auug.org.au/
Link: https://lore.kernel.org/r/1ca29408-7bc7-4da5-59c7-87893c9e0442@gmail.com/Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc4f128d

Merge branch 'mptcp-next' · cb0f8b03

David S. Miller authored Aug 25, 2021

Mat Martineau says:

====================
mptcp: Optimize output options and add MP_FAIL

This patch set contains two groups of changes that we've been testing in
the MPTCP tree.

The first optimizes the code path and data structure for populating
MPTCP option headers when transmitting.

Patch 1 reorganizes code to reduce the number of conditionals that need
to be evaluated in common cases.

Patch 2 rearranges struct mptcp_out_options to save 80 bytes (on x86_64).

The next five patches add partial support for the MP_FAIL option as
defined in RFC 8684. MP_FAIL is an option header used to cleanly handle
MPTCP checksum failures. When the MPTCP checksum detects an error in the
MPTCP DSS header or the data mapped by that header, the receiver uses a
TCP RST with MP_FAIL to close the subflow that experienced the error and
provide associated MPTCP sequence number information to the peer. RFC
8684 also describes how a single-subflow connection can discard corrupt
data and remain connected under certain conditions using MP_FAIL, but
that feature is not implemented here.

Patches 3-5 implement MP_FAIL transmit and receive, and integrates with
checksum validation.

Patches 6 & 7 add MP_FAIL selftests and the MIBs required for those
tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cb0f8b03

selftests: mptcp: add MP_FAIL mibs check · 6bb3ab49

Geliang Tang authored Aug 24, 2021

This patch added a function chk_fail_nr to check the mibs for MP_FAIL.
Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6bb3ab49

mptcp: add the mibs for MP_FAIL · eb7f3365

Geliang Tang authored Aug 24, 2021

This patch added the mibs for MP_FAIL: MPTCP_MIB_MPFAILTX and
MPTCP_MIB_MPFAILRX.
Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb7f3365

mptcp: send out MP_FAIL when data checksum fails · 478d7700

Geliang Tang authored Aug 24, 2021

When a bad checksum is detected, set the send_mp_fail flag to send out
the MP_FAIL option.

Add a new function mptcp_has_another_subflow() to check whether there's
only a single subflow.

When multiple subflows are in use, close the affected subflow with a RST
that includes an MP_FAIL option and discard the data with the bad
checksum.

Set the sk_state of the subsocket to TCP_CLOSE, then the flag
MPTCP_WORK_CLOSE_SUBFLOW will be set in subflow_sched_work_if_closed,
and the subflow will be closed.

When a single subfow is in use, temporarily handled by sending MP_FAIL
with a RST too.
Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

478d7700

mptcp: MP_FAIL suboption receiving · 5580d41b

Geliang Tang authored Aug 24, 2021

This patch added handling for receiving MP_FAIL suboption.

Add a new members mp_fail and fail_seq in struct mptcp_options_received.
When MP_FAIL suboption is received, set mp_fail to 1 and save the sequence
number to fail_seq.

Then invoke mptcp_pm_mp_fail_received to deal with the MP_FAIL suboption.
Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5580d41b

mptcp: MP_FAIL suboption sending · c25aeb4e

Geliang Tang authored Aug 24, 2021

This patch added the MP_FAIL suboption sending support.

Add a new flag named send_mp_fail in struct mptcp_subflow_context. If
this flag is set, send out MP_FAIL suboption.

Add a new member fail_seq in struct mptcp_out_options to save the data
sequence number to put into the MP_FAIL suboption.

An MP_FAIL option could be included in a RST or on the subflow-level
ACK.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@xiaomi.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c25aeb4e

mptcp: shrink mptcp_out_options struct · d7b26908

Paolo Abeni authored Aug 24, 2021

After the previous patch we can alias with a union several
fields in mptcp_out_options. Such struct is stack allocated and
memset() for each plain TCP out packet. Every saved byted counts.

Before:
pahole -EC mptcp_out_options
 # ...
/* size: 136, cachelines: 3, members: 17 */

After:
pahole -EC mptcp_out_options
 # ...
/* size: 56, cachelines: 1, members: 9 */
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7b26908

mptcp: optimize out option generation · 1bff1e43

Paolo Abeni authored Aug 24, 2021

Currently we have several protocol constraints on MPTCP options
generation (e.g. MPC and MPJ subopt are mutually exclusive)
and some additional ones required by our implementation
(e.g. almost all ADD_ADDR variant are mutually exclusive with
everything else).

We can leverage the above to optimize the out option generation:
we check DSS/MPC/MPJ presence in a mutually exclusive way,
avoiding many unneeded conditionals in the common cases.

Additionally extend the existing constraints on ADD_ADDR opt on
all subvariants, so that it becomes fully mutually exclusive with
the above and we can skip another conditional statement for the
common case.

This change is also needed by the next patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bff1e43

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · d484dc2b

David S. Miller authored Aug 25, 2021

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-08-24

Vinicius Costa Gomes says:

This adds support for PCIe PTM (Precision Time Measurement) to the igc
driver. PCIe PTM allows the NIC and Host clocks to be compared more
precisely, improving the clock synchronization accuracy.

Patch 1/4 reverts a commit that made pci_enable_ptm() private to the
PCI subsystem, reverting makes it possible for it to be called from
the drivers.

Patch 2/4 adds the pcie_ptm_enabled() helper.

Patch 3/4 calls pci_enable_ptm() from the igc driver.

Patch 4/4 implements the PCIe PTM support. Exposing it via the
.getcrosststamp() API implies that the time measurements are made
synchronously with the ioctl(). The hardware was implemented so the
most convenient way to retrieve that information would be
asynchronously. So, to follow the expectations of the ioctl() we have
to use less convenient ways, triggering an PCIe PTM dialog every time
a ioctl() is received.

Some questions are raised (also pointed out in the commit message):

1. Using convert_art_ns_to_tsc() is too x86 specific, there should be
   a common way to create a 'system_counterval_t' from a timestamp.

2. convert_art_ns_to_tsc() says that it should only be used when
   X86_FEATURE_TSC_KNOWN_FREQ is true, but during tests it works even
   when it returns false. Should that check be done?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d484dc2b

Merge branch 'lan7800-improvements' · 38cbd6e7

David S. Miller authored Aug 25, 2021

John Efstathiades says:

====================
LAN7800 driver improvements

This patch set introduces a number of improvements and fixes for
problems found during testing of a modification to add a NAPI-style
approach to packet handling to improve performance.

NOTE: the NAPI changes are not part of this patch set and the issues
      fixed by this patch set are not coupled to the NAPI changes.

Patch 1 fixes white space and style issues

Patch 2 removes an unused timer

Patch 3 introduces macros to set the internal packet FIFO flow
control levels, which makes it easier to update the levels in future.

Patch 4 removes an unused queue

Patch 5 (updated for v2) introduces function return value checks and
error propagation to various parts of the driver where a return
code was captured but then ignored.

This patch is completely different to patch 5 in version 1 of this patch
set. The changes in the v1 patch 5 are being set aside for the time
being.

Patch 6 updates the LAN7800 MAC reset code to ensure there is no
PHY register access in progress when the MAC is reset. This change
prevents a kernel exception that can otherwise occur.

Patch 7 fixes problems with system suspend and resume handling while
the device is transmitting and receiving data.

Patch 8 fixes problems with auto-suspend and resume handling and
depends on changes introduced by patch 7.

Patch 9 fixes problems with device disconnect handling that can result
in kernel exceptions and/or hang.

Patch 10 limits the rate at which driver warning messages are emitted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

38cbd6e7