Commits · 242aaf03dc9be30027d3159a13b935856dab3ba0 · Kirill Smelkov / linux

15 Sep, 2020 13 commits

selftests: add a test for ethtool pause stats · 242aaf03

Jakub Kicinski authored Sep 14, 2020

Make sure the empty nest is reported even without stats.
Make sure reporting only selected stats works fine.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

242aaf03

netdevsim: add pause frame stats · ff1f7c17

Jakub Kicinski authored Sep 14, 2020

Add minimal ethtool interface for testing ethtool pause stats.

v2: add missing static on nsim_ethtool_ops
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff1f7c17

docs: net: include the new ethtool pause stats in the stats doc · 8c00bd93

Jakub Kicinski authored Sep 14, 2020

Tell people that there now is an interface for querying pause frames.
A little bit of restructuring is needed given this is a first source
of such statistics.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c00bd93

ethtool: add standard pause stats · 9a27a330

Jakub Kicinski authored Sep 14, 2020

Currently drivers have to report their pause frames statistics
via ethtool -S, and there is a wide variety of names used for
these statistics.

Add the two statistics defined in IEEE 802.3x to the standard
API. Create a new ethtool request header flag for including
statistics in the response to GET commands.

Always create the ETHTOOL_A_PAUSE_STATS nest in replies when
flag is set. Testing if driver declares the op is not a reliable
way of checking if any stats will actually be included and therefore
we don't want to give the impression that presence of
ETHTOOL_A_PAUSE_STATS indicates driver support.

Note that this patch does not include PFC counters, which may fit
better in dcbnl? But mostly I don't need them/have a setup to test
them so I haven't looked deeply into exposing them :)

v3:
 - add a helper for "uninitializing" stats, rather than a cryptic
   memset() (Andrew)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a27a330

Merge branch 's390-qeth-next' · 0f9ad4e7

David S. Miller authored Sep 15, 2020

Julian Wiedmann says:

====================
s390/qeth: updates 2020-09-10

subject to positive review by the bridge maintainers on patch 5,
please apply the following patch series to netdev's net-next tree.

Alexandra adds BR_LEARNING_SYNC support to qeth. In addition to the
main qeth changes (controlling the feature, and raising switchdev
events), this also needs
- Patch 1 and 2 for some s390/cio infrastructure improvements
  (acked by Heiko to go in via net-next), and
- Patch 5 to introduce a new switchdev_notifier_type, so that a driver
  can clear all previously learned entries from the bridge FDB in case
  things go out-of-sync later on.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0f9ad4e7

s390/qeth: implement ndo_bridge_setlink for learning_sync · 521c65b6

Alexandra Winter authored Sep 10, 2020

Documentation/networking/switchdev.txt and 'man bridge' indicate that the
learning_sync bridge attribute is used to control whether a given
device will sync MAC addresses learned on its device port to a master
bridge FDB, where they will show up as 'extern_learn offload'. So we map
qeth_l2_dev2br_an_set() to the learning_sync bridge link attribute.

Turning off learning_sync will flush all extern_learn entries from the
bridge fdb and all pending events from the card's work queue.

When the hardware interface goes offline with learning_sync on
(e.g. for HW recovery), all extern_learn entries will be flushed from the
bridge fdb and all pending events from the card's work queue. When the
interface goes online again, it will send new notifications for all then
valid MACs. learning_sync attribute can not be modified while interface is
offline. See
'commit e6e771b3 ("s390/qeth: detach netdevice while card is offline")'

An alternative implementation would be to always offload the 'learning'
attribute of a software bridge to the hardware interface attached to it
and thus implicitly enable fdb notification. This was not chosen for 2
reasons:
1) In our case the software bridge is NOT a representation of a hardware
switch. It is just connected to a smart NIC that is able to inform
about the addresses attached to it. It is not necessarily using source
MAC learning for this and other bridgeports can be attached to other
NICs with different properties.
2) We want a means to enable this notification explicitly. There may be
cases where a bridgeport is set to 'learning', but we do not want to
enable the notification.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

521c65b6

s390/qeth: implement ndo_bridge_getlink for learning_sync · 780b6e7d

Alexandra Winter authored Sep 10, 2020

Documentation/networking/switchdev.txt and 'man bridge' indicate that the
learning_sync bridge attribute is used to indicate whether a given
device will sync MAC addresses learned on its device port to a master
bridge FDB.

learning_sync attribute can not be read while interface is offline (down).
See
'commit e6e771b3 ("s390/qeth: detach netdevice while card is offline")'
We return EOPNOTSUPP and not EONODEV in this case, because EONOTSUPP is the
only rc that is tolerated by 'bridge -d link show'.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

780b6e7d

s390/qeth: Reset address notification in case of buffer overflow · 817741a8

Alexandra Winter authored Sep 10, 2020

In case hardware sends more device-to-bridge-address-change notfications
than the qeth-l2 driver can handle, the hardware will send an overflow
event and then stop sending any events. It expects software to flush its
FDB and start over again. Re-enabling address-change-notification will
report all current addresses.

In order to re-enable address-change-notification this patch defines
the functions qeth_l2_dev2br_an_set() and qeth_l2_dev2br_an_set_cb
to enable or disable dev-to-bridge-address-notification.

A following patch will use the learning_sync bridgeport flag to trigger
enabling or disabling of address-change-notification, so we define
priv->brport_features to store the current setting. BRIDGE_INFO and
ADDR_INFO functionality are mutually exclusive, whereas ADDR_INFO and
qeth_l2_vnicc* can be used together.

Alternative implementations to handle buffer overflow:
Just re-enabling notification and adding all newly reported addresses
would cover any lost 'add' events, but not the lost 'delete' events.
Then these invalid addresses would stay in the bridge FDB as long as the
device exists.
Setting the net device down and up, would be an alternative, but is a bit
drastic. If the net device has many secondary addresses this will create
many delete/add events at its peers which could de-stabilize the
network segment.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

817741a8

bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier · d05e8e68

Alexandra Winter authored Sep 10, 2020

so the switchdev can notifiy the bridge to flush non-permanent fdb entries
for this port. This is useful whenever the hardware fdb of the switchdev
is reset, but the netdev and the bridgeport are not deleted.

Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute.

CC: Jiri Pirko <jiri@resnulli.us>
CC: Ivan Vecera <ivecera@redhat.com>
CC: Roopa Prabhu <roopa@nvidia.com>
CC: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d05e8e68

s390/qeth: Translate address events into switchdev notifiers · 10a6cfc0

Alexandra Winter authored Sep 10, 2020

A qeth-l2 HiperSockets card can show switch-ish behaviour in the sense,
that it can report all MACs that are reachable via this interface. Just
like a switch device, it can notify the software bridge about changes
to its fdb. This patch exploits this device-to-bridge-notification and
extracts the relevant information from the hardware events to generate
notifications to an attached software bridge.

There are 2 sources for this information:
1) The reply message of Perform-Network-Subchannel-Operations (PNSO)
(operation code ADDR_INFO) reports all addresses that are currently
reachable (implemented in a later patch).
2) As long as device-to-bridge-notification is enabled, hardware will
generate address change notification events, whenever the content of
the hardware fdb changes (this patch).

The bridge_hostnotify feature (PNSO operation code BRIDGE_INFO) uses
the same address change notification events. We need to distinguish
between qeth_pnso_mode QETH_PNSO_BRIDGEPORT and QETH_PNSO_ADDR_INFO
and call a different handler. In both cases deadlocks must be
prevented, if the workqueue is drained under lock and QETH_PNSO_NONE,
when notification is disabled.

bridge_hostnotify generates udev events, there is no intend to do the same
for dev2br. Instead this patch will generate SWITCHDEV_FDB_ADD_TO_BRIDGE
and SWITCHDEV_FDB_DEL_TO_BRIDGE notifications, that will cause the
software bridge to add (or delete) entries to its fdb as 'extern_learn
offload'.

Documentation/networking/switchdev.txt proposes to add
"depends NET_SWITCHDEV" to driver's Kconfig. This is not done here,
so even in absence of the NET_SWITCHDEV module, the QETH_L2 module will
still be built, but then the switchdev notifiers will have no effect.

No VLAN filtering is done on the entries and VLAN information is not
passed on to the bridge fdb entries. This could be added later.
For now VLAN interfaces can be defined on the upper bridge interface.

Multicast entries are not passed on to the bridge fdb.
This could be added later. For now mcast flooding can be used in the
bridge.

The card reports all MACs that are in its FDB, but we must not pass on
MACs that are registered for this interface.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10a6cfc0

s390/qeth: Detect PNSO OC3 capability · fa115adf

Alexandra Winter authored Sep 10, 2020

This patch detects whether device-to-bridge-notification, provided
by the Perform Network Subchannel Operation (PNSO) operation code
ADDR_INFO (OC3), is supported by this card. A following patch will
map this to the learning_sync bridgeport flag, so we store it in
priv->brport_hw_features in bridgeport flag format.

Only IQD cards provide PNSO.
There is a feature bit to indicate whether the machine provides OC3,
unfortunately it is not set on old machines.
So PNSO is called to find out. As this will disable notification
and is exclusive with bridgeport_notification, this must be done
during card initialisation before previous settings are restored.

PNSO functionality requires some configuration values that are added to
the qeth_card.info structure. Some helper functions are defined to fill
them out when the card is brought online and some other places are
adapted, that can also benefit from these fields.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa115adf

s390/cio: Helper functions to read CSSID, IID, and CHID · b983aa1f

Alexandra Winter authored Sep 10, 2020

Add helper functions to expose Channel Subsystem ID (CSSID), MIF Image Id
(IID), Channel ID (CHID) and Channel Path ID (CHPID).
These values are required by the qeth driver's exploitation of network-
address-change-notifications to determine which entries belong to this
interface.

Store the Partition identifier in System log, as this may be used to map
a Linux view to a Hardware view for debugging purpose.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b983aa1f

s390/cio: Add new Operation Code OC3 to PNSO · 4fea49a7

Alexandra Winter authored Sep 10, 2020

Add support for operation code 3 (OC3) of the
Perform-Network-Subchannel-Operations (PNSO) function
of the Channel-Subsystem-Call (CHSC) instruction.

PNSO provides 2 operation codes:
OC0 - BRIDGE_INFO
OC3 - ADDR_INFO (new)

Extend the function calls to *pnso* to pass the OC and
add new response code 0108.

Support for OC3 is indicated by a flag in the css_general_characteristics.
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fea49a7

14 Sep, 2020 27 commits

tcp: schedule EPOLLOUT after a partial sendmsg · afb83012

Soheil Hassas Yeganeh authored Sep 14, 2020

For EPOLLET, applications must call sendmsg until they get EAGAIN.
Otherwise, there is no guarantee that EPOLLOUT is sent if there was
a failure upon memory allocation.

As a result on high-speed NICs, userspace observes multiple small
sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
1-2 TSOs in between two sendmsg syscalls:

// One large partial send due to memory allocation failure.
sendmsg(20MB)   = 2MB
// Many small sends until EAGAIN.
sendmsg(18MB)   = 64KB
sendmsg(17.9MB) = 128KB
sendmsg(17.8MB) = 64KB
...
sendmsg(...)    = EAGAIN
// At this point, userspace can assume an EPOLLOUT.

To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
to guarantee that we send EPOLLOUT after partial sendmsg.

After this commit userspace can assume that it will receive an EPOLLOUT
after the first partial sendmsg. This EPOLLOUT will benefit from
sk_stream_write_space() logic delaying the EPOLLOUT until significant
space is available in write queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

afb83012

tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit · 8ba3c9d1

Soheil Hassas Yeganeh authored Sep 14, 2020

If there was any event available on the TCP socket, tcp_poll()
will be called to retrieve all the events.  In tcp_poll(), we call
sk_stream_is_writeable() which returns true as long as we are at least
one byte below notsent_lowat.  This will result in quite a few
spurious EPLLOUT and frequent tiny sendmsg() calls as a result.

Similar to sk_stream_write_space(), use __sk_stream_is_writeable
with a wake value of 1, so that we set EPOLLOUT only if half the
space is available for write.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ba3c9d1

ionic: fix up debugfs after queue swap · ed6d9b02

Shannon Nelson authored Sep 13, 2020

Clean and rebuild the debugfs info for the queues being swapped.

Fixes: a34e25ab ("ionic: change the descriptor ring length without full reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed6d9b02

__netif_receive_skb_core: don't untag vlan from skb on DSA master · b14a9fc4

Vladimir Oltean authored Sep 12, 2020

A DSA master interface has upper network devices, each representing an
Ethernet switch port attached to it. Demultiplexing the source ports and
setting skb->dev accordingly is done through the catch-all ETH_P_XDSA
packet_type handler. Catch-all because DSA vendors have various header
implementations, which can be placed anywhere in the frame: before the
DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA
handler acts like an rx_handler more than anything.

It is unlikely for the DSA master interface to have any other upper than
the DSA switch interfaces themselves. Only maybe a bridge upper*, but it
is very likely that the DSA master will have no 8021q upper. So
__netif_receive_skb_core() will try to untag the VLAN, despite the fact
that the DSA switch interface might have an 8021q upper. So the skb will
never reach that.

So far, this hasn't been a problem because most of the possible
placements of the DSA switch header mentioned in the first paragraph
will displace the VLAN header when the DSA master receives the frame, so
__netif_receive_skb_core() will not actually execute any VLAN-specific
code for it. This only becomes a problem when the DSA switch header does
not displace the VLAN header (for example with a tail tag).

What the patch does is it bypasses the untagging of the skb when there
is a DSA switch attached to this net device. So, DSA is the only
packet_type handler which requires seeing the VLAN header. Once skb->dev
will be changed, __netif_receive_skb_core() will be invoked again and
untagging, or delivery to an 8021q upper, will happen in the RX of the
DSA switch interface itself.

*see commit 9eb8eff0 ("net: bridge: allow enslaving some DSA master
network devices". This is actually the reason why I prefer keeping DSA
as a packet_type handler of ETH_P_XDSA rather than converting to an
rx_handler. Currently the rx_handler code doesn't support chaining, and
this is a problem because a DSA master might be bridged.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b14a9fc4

Merge branch 'net-next-dsa-mt7530-add-support-for-MT7531' · 0ca6d8b7

David S. Miller authored Sep 14, 2020

Landen Chao says:

====================
net-next: dsa: mt7530: add support for MT7531

This patch series adds support for MT7531.

MT7531 is the next generation of MT7530 which could be found on Mediatek
router platforms such as MT7622 or MT7629.

It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and
the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface.
Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot
be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII
interface, pll, and pad setting are different from MT7530.

MT7531 SGMII interface can be configured in following mode:
- 'SGMII AN mode' with in-band negotiation capability
    which is compatible with PHY_INTERFACE_MODE_SGMII.
- 'SGMII force mode' without in-band negotiation
    which is compatible with 10B/8B encoding of
    PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
- 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
    which is compatible with 10B/8B encoding of
    PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.

v4 -> v5
- Add fixed-link node to dsa cpu port in dts file by suggestion of
  Vladimir Oltean.

v3 -> v4
- Adjust the coding style by suggestion of Jakub Kicinski.
  Remove unnecessary jumping label, merge continuous numeric 'switch
  cases' into one line, and keep the variables longest to shortest
  (reverse xmas tree).

v2 -> v3
- Keep the same setup logic of mt7530/mt7621 because these series of
  patches is for adding mt7531 hardware.
- Do not adjust rgmii delay when vendor phy driver presents in order to
  prevent double adjustment by suggestion of Andrew Lunn.
- Remove redundant 'Example 4' from dt-bindings by suggestion of
  Rob Herring.
- Fix typo.

v1 -> v2
- change phylink_validate callback function to support full-duplex
  gigabit only to match hardware capability.
- add description of SGMII interface.
- configure mt7531 cpu port in fastest speed by default.
- parse SGMII control word for in-band negotiation mode.
- configure RGMII delay based on phy.rst.
- Rename the definition in the header file to avoid potential conflicts.
- Add wrapper function for mdio read/write to support both C22 and C45.
- correct fixed-link speed of 2500base-x in dts.
- add MT7531 port mirror setting.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0ca6d8b7

arm64: dts: mt7622: add mt7531 dsa to bananapi-bpi-r64 board · 79a675e6

Landen Chao authored Sep 11, 2020

Add mt7531 dsa to bananapi-bpi-r64 board for 5 giga Ethernet ports support.
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Tested-By: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

79a675e6

arm64: dts: mt7622: add mt7531 dsa to mt7622-rfb1 board · 6af06448

Landen Chao authored Sep 11, 2020

Add mt7531 dsa to mt7622-rfb1 board for 5 giga Ethernet ports support.
mt7622 only supports 1 sgmii interface, so either gmac0 or gmac1 can be
configured as sgmii interface. In this patch, change to connect mt7622
gmac0 and mt7531 port6 through sgmii interface.
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6af06448

net: dsa: mt7530: Add the support of MT7531 switch · c288575f

Landen Chao authored Sep 11, 2020

Add new support for MT7531:

MT7531 is the next generation of MT7530. It is also a 7-ports switch with
5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu
port 6 only supports SGMII interface. Cpu port 5 supports either RGMII
or SGMII in different HW sku, but cannot be muxed to PHY of port 0/4 like
mt7530. Due to SGMII interface support, pll, and pad setting are different
from MT7530. This patch adds different initial setting, and SGMII phylink
handlers of MT7531.

MT7531 SGMII interface can be configured in following mode:
- 'SGMII AN mode' with in-band negotiation capability
    which is compatible with PHY_INTERFACE_MODE_SGMII.
- 'SGMII force mode' without in-band negotiation
    which is compatible with 10B/8B encoding of
    PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause.
- 2.5 times faster clocked 'SGMII force mode' without in-band negotiation
    which is compatible with 10B/8B encoding of
    PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause.
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c288575f

dt-bindings: net: dsa: add new MT7531 binding to support MT7531 · 27834b02

Landen Chao authored Sep 11, 2020

Add devicetree binding to support the compatible mt7531 switch as used
in the MediaTek MT7531 switch.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27834b02

net: dsa: mt7530: Extend device data ready for adding a new hardware · 88bdef8b

Landen Chao authored Sep 11, 2020

Add a structure holding required operations for each device such as device
initialization, PHY port read or write, a checker whether PHY interface is
supported on a certain port, MAC port setup for either bus pad or a
specific PHY interface.

The patch is done for ready adding a new hardware MT7531, and keep the
same setup logic of existing hardware.
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88bdef8b

net: dsa: mt7530: Refine message in Kconfig · dc8ef938

Landen Chao authored Sep 11, 2020

Refine message in Kconfig with fixing typo and an explicit MT7621 support.
Signed-off-by: Landen Chao <landen.chao@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc8ef938

drivers/net/wan/x25_asy: Remove an unnecessary x25_type_trans call · 4b468385

Xie He authored Sep 11, 2020

x25_type_trans only needs to be called before we call netif_rx to pass
the skb to upper layers.

It does not need to be called before lapb_data_received. The LAPB module
does not need the fields that are set by calling it.

In the other two X.25 drivers - lapbether and hdlc_x25. x25_type_trans
is only called before netif_rx and not before lapb_data_received.

Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b468385

net: try to avoid unneeded backlog flush · 2de79ee2

Paolo Abeni authored Sep 10, 2020

flush_all_backlogs() may cause deadlock on systems
running processes with FIFO scheduling policy.

The above is critical in -RT scenarios, where user-space
specifically ensure no network activity is scheduled on
the CPU running the mentioned FIFO process, but still get
stuck.

This commit tries to address the problem checking the
backlog status on the remote CPUs before scheduling the
flush operation. If the backlog is empty, we can skip it.

v1 -> v2:
 - explicitly clear flushed cpu mask - Eric
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2de79ee2

Merge branch 'mlxsw-Derive-SBIB-from-maximum-port-speed-and-MTU' · 7b2d1b8d

David S. Miller authored Sep 14, 2020

Ido Schimmel says:

====================
mlxsw: Derive SBIB from maximum port speed & MTU

Petr says:

Internal buffer is a part of port headroom used for packets that are
mirrored due to triggers that the Spectrum ASIC considers "egress". Besides
ACL mirroring on port egresss this includes also packets mirrored due to
ECN marking.

This patchset changes the way the internal mirroring buffer is reserved.
Currently the buffer reflects port MTU and speed accurately. In the future,
mlxsw should support dcbnl_setbuffer hook to allow the users to set buffer
sizes by hand. In that case, there might not be enough space for growth of
the internal mirroring buffer due to MTU and speed changes. While vetoing
MTU changes would be merely confusing, port speed changes cannot be vetoed,
and such change would simply lead to issues in packet mirroring.

For these reasons, with these patches the internal mirroring buffer is
derived from maximum MTU and maximum speed achievable on the port.

Patches #1 and #2 introduce a new callback to determine the maximum speed a
given port can achieve.

With patches #3 and #4, the information about, respectively, maximum MTU
and maximum port speed, is kept in struct mlxsw_sp_port.

In patch #5, maximum MTU and maximum speed are used to determine the size
of the internal buffer. MTU update and speed update hooks are dropped,
because they are no longer necessary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7b2d1b8d

mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU · 532b49e4

Petr Machata authored Sep 13, 2020

The SBIB register configures the size of an internal buffer that the
Spectrum ASICs use when mirroring traffic on egress. This size should be
taken into account when validating that the port headroom buffers are not
larger than the chip can handle. Up until now this was not done, which is
incidentally not a problem, because the priority group buffers that mlxsw
auto-configures are small enough that the boundary condition could not be
violated.

However when dcbnl_setbuffer is implemented, the user has control over
sizes of PG buffers, and they might overshoot the headroom capacity.
However the size of the SBIB buffer depends on port speed, and that cannot
be vetoed. Therefore SBIB size should be deduced from maximum port speed.

Additionally, once the buffers are configured by hand, the user could get
into an uncomfortable situation where their MTU change requests get vetoed,
because the SBIB does not fit anymore. Therefore derive SBIB size from
maximum permissible MTU as well.

Remove all the code that adjusted the SBIB size whenever speed or MTU
changed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

532b49e4

mlxsw: spectrum: Keep maximum speed around · 3232e8c6

Petr Machata authored Sep 13, 2020

The maximum port speed depends on link modes supported by the port, and for
Ethernet ports is constant. The maximum speed will be handy when setting
SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in
struct mlxsw_sp_port for easy access.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3232e8c6

mlxsw: spectrum: Keep maximum MTU around · 2ecf87ae

Petr Machata authored Sep 13, 2020

The maximum port MTU depends on port type. On Spectrum, mlxsw configures
all ports as Ethernet ports, and the maximum MTU therefore never changes.
Besides checking MTU configuration, maximum MTU will also be handy when
setting SBIB, the internal buffer used for traffic mirroring. Therefore,
keep it in struct mlxsw_sp_port for easy access.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ecf87ae

mlxsw: spectrum_ethtool: Introduce ptys_max_speed callback · 60fbc521

Petr Machata authored Sep 13, 2020

The SBIB register configures the size of an internal buffer that the
Spectrum ASICs use when mirroring traffic on egress. This size should be
taken into account when validating that the port headroom buffers are not
larger than the chip can handle. Up until now this was not done, which is
incidentally not a problem, because the priority group buffers that mlxsw
auto-configures are small enough that the boundary condition could not be
violated.

When dcbnl_setbuffer is implemented, the user gets control over sizes of PG
buffers, and they might overshoot the headroom capacity. However the size
of the SBIB buffer depends on port speed, which cannot be vetoed. There is
obviously no way to retroactively push back on requests for overlarge PG
buffers, or reject an overlarge MTU, or cancel losslessness of a certain
PG.

Therefore, instead of taking into account the current speed when
calculating SBIB buffer size, take into account the maximum speed that a
port with given Ethernet protocol capabilities can have.

To that end, add a new ethtool callback, ptys_max_speed, which determines
this maximum speed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60fbc521

mlxsw: spectrum_ethtool: Extract a helper to get Ethernet attributes · d24ca6c0

Petr Machata authored Sep 13, 2020

In order to allow reusing the logic, extract from
mlxsw_sp_port_get_link_ksettings() the code to obtain Ethernet protocol
attributes, mlxsw_sp_port_ptys_query().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d24ca6c0

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7952d7ed

David S. Miller authored Sep 14, 2020

Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2020-09-14

This series contains updates to i40e driver only.

Li RongQing removes binding affinity mask to a fixed CPU and sets
prefetch of Rx buffer page to occur conditionally.

Björn provides AF_XDP performance improvements by not prefetching HW
descriptors, using 16 byte descriptors, and moving buffer allocation
out of Rx processing loop.

v2: Define prefetch_page_address in a common header for patch 2.
Dropped, previous, patch 5 as it is being reworked to be more
generalized.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7952d7ed

Merge tag 'rxrpc-next-20200914' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · e0d9ae69

David S. Miller authored Sep 14, 2020

David Howells says:

====================
rxrpc: Fixes for the connection manager rewrite

Here are some fixes for the connection manager rewrite:

 (1) Fix a goto to the wrong place in error handling.

 (2) Fix a missing NULL pointer check.

 (3) The stored allocation error needs to be stored signed.

 (4) Fix a leak of connection bundle when clearing connections due to
     net namespace exit.

 (5) Fix an overget of the bundle when setting up a new client conn.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e0d9ae69

hinic: add vxlan segmentation and cs offload support · 33acd755

Luo bin authored Sep 14, 2020

Add NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_UDP_TUNNEL_CSUM features
to support vxlan segmentation and checksum offload. Ipip and ipv6
tunnel packets are regarded as non-tunnel pkt for hw and as for other
type of tunnel pkts, checksum offload is disabled.
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

33acd755

net: qlcnic: remove unused variable 'val' in qlcnic_83xx_cam_unlock() · f3694707

Zhang Changzhong authored Sep 14, 2020

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c:661:6: warning:
 variable 'val' set but not used [-Wunused-but-set-variable]
  661 |  u32 val;
      |      ^~~

After commit 7f966452 ("qlcnic: 83xx memory map and HW access
routines"), variable 'val' is never used in qlcnic_83xx_cam_unlock(), so
removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3694707

net: pxa168_eth: remove unused variable 'retval' int pxa168_eth_change_mtu() · f7ab0f04

Zhang Changzhong authored Sep 14, 2020

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/marvell/pxa168_eth.c:1190:6: warning:
 variable 'retval' set but not used [-Wunused-but-set-variable]
 1190 |  int retval;
      |      ^~~~~~

Function pxa168_eth_change_mtu() always return zero, so variable 'retval'
is redundant, just remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7ab0f04

net: fec: ptp: remove unused variable 'ns' in fec_time_keep() · 992bae7e

Zhang Changzhong authored Sep 14, 2020

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/freescale/fec_ptp.c:523:6: warning:
 variable 'ns' set but not used [-Wunused-but-set-variable]
  523 |  u64 ns;
      |      ^~

After commit 6605b730 ("FEC: Add time stamping code and a PTP
hardware clock"), variable 'ns' is never used in fec_time_keep(),
so removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

992bae7e

net: dnet: remove unused variable 'tx_status 'in dnet_start_xmit() · 85743cea

Zhang Changzhong authored Sep 14, 2020

Fixes the following W=1 kernel build warning(s):

drivers/net/ethernet/dnet.c:510:6: warning:
 variable 'tx_status' set but not used [-Wunused-but-set-variable]
  u32 tx_status, irq_enable;
      ^~~~~~~~~

After commit 47964174 ("dnet: Dave DNET ethernet controller driver
(updated)"), variable 'tx_status' is never used in dnet_start_xmit(),
so removing it to avoid build warning.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85743cea

tcp: remove SOCK_QUEUE_SHRUNK · 0cbe6a8f

Eric Dumazet authored Sep 14, 2020

SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state
that remembers if some room has been made in the rtx queue
by an incoming ACK packet.

This is later used from tcp_check_space() before
considering to send EPOLLOUT.

Problem is: If we receive SACK packets, and no packet
is removed from RTX queue, we can send fresh packets, thus
moving them from write queue to rtx queue and eventually
empty the write queue.

This stall can happen if TCP_NOTSENT_LOWAT is used.

With this fix, we no longer risk stalling sends while holes
are repaired, and we can fully use socket sndbuf.

This also removes a cache line dirtying for typical RPC
workloads.

Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cbe6a8f