Commits · 9ee5e5ade033875191a2d2e470033e9cdde44a6a · Kirill Smelkov / linux

08 Jan, 2021 12 commits

tap/tun: add skb_zcopy_init() helper for initialization. · 9ee5e5ad

Jonathan Lemon authored Jan 06, 2021

Replace direct assignments with skb_zcopy_init() for zerocopy
cases where a new skb is initialized, without changing the
reference counts.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9ee5e5ad

skbuff: add flags to ubuf_info for ubuf setup · 04c2d33e

Jonathan Lemon authored Jan 06, 2021

Currently, when an ubuf is attached to a new skb, the shared
flags word is initialized to a fixed value.  Instead of doing
this, set the default flags in the ubuf, and have new skbs
inherit from this default.

This is needed when setting up different zerocopy types.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

04c2d33e

net: group skb_shinfo zerocopy related bits together. · 06b4feb3

Jonathan Lemon authored Jan 06, 2021

In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

06b4feb3

skbuff: rename sock_zerocopy_* to msg_zerocopy_* · 8c793822

Jonathan Lemon authored Jan 06, 2021

At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8c793822

skbuff: Call skb_zcopy_clear() before unref'ing fragments · 70c43167

Jonathan Lemon authored Jan 06, 2021

RX zerocopy fragment pages which are not allocated from the
system page pool require special handling.  Give the callback
in skb_zcopy_clear() a chance to process them first.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

70c43167

skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort · 236a6b1c

Jonathan Lemon authored Jan 06, 2021

The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation.  Add a wrapper
which checks the callback and dispatches apppropriately.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

236a6b1c

skbuff: Add skb parameter to the ubuf zerocopy callback · 36177832

Jonathan Lemon authored Jan 06, 2021

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear().  This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

36177832

skbuff: replace sock_zerocopy_get with skb_zcopy_get · e76d46cf

Jonathan Lemon authored Jan 06, 2021

Rename the get routines for consistency.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e76d46cf

skbuff: replace sock_zerocopy_put() with skb_zcopy_put() · 59776362

Jonathan Lemon authored Jan 06, 2021

Replace sock_zerocopy_put with the generic skb_zcopy_put()
function.  Pass 'true' as the success argument, as this
is identical to no change.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

59776362

skbuff: Push status and refcounts into sock_zerocopy_callback · 75518851

Jonathan Lemon authored Jan 06, 2021

Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.

Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.

This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

75518851

skbuff: simplify sock_zerocopy_put · d6adf1b1

Jonathan Lemon authored Jan 06, 2021

All 'struct ubuf_info' users should have a callback defined
as of commit 0a4a060b ("sock: fix zerocopy_success regression
with msg_zerocopy").

Remove the dead code path to consume_skb(), which makes
assumptions about how the structure was allocated.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d6adf1b1

skbuff: remove unused skb_zcopy_abort function · 424f481f

Jonathan Lemon authored Jan 06, 2021

skb_zcopy_abort() has no in-tree consumers, remove it.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

424f481f

07 Jan, 2021 28 commits

Merge branch 'dwmac-meson8b-picosecond-precision-rx-delay-support' · 7cd1de76

Jakub Kicinski authored Jan 07, 2021

Martin Blumenstingl says:

====================
dwmac-meson8b: picosecond precision RX delay support

with the help of Jianxin Pan (many thanks!) the meaning of the "new"
PRG_ETH1[19:16] register bits on Amlogic Meson G12A, G12B and SM1 SoCs
are finally known. These SoCs allow fine-tuning the RGMII RX delay in
200ps steps (contrary to what I have thought in the past [0] these are
not some "calibration" values).

The vendor u-boot has code to automatically detect the best RX/TX delay
settings. For now we keep it simple and add a device-tree property with
200ps precision to select the "right" RX delay for each board.

While here, deprecate the "amlogic,rx-delay-ns" property as it's not
used on any upstream .dts (yet). The driver is backwards compatible.

I have tested this on an X96 Air 4GB board (not upstream yet). Testing
with iperf3 gives 938 Mbits/sec in both directions (RX and TX). The
following network settings were used in the .dts (2ns TX delay
generated by the PHY, 800ps RX delay generated by the MAC as the PHY
only supports 0ns or 2ns RX delays):
        &ext_mdio {
                external_phy: ethernet-phy@0 {
                        /* Realtek RTL8211F (0x001cc916) */
                        reg = <0>;
                        eee-broken-1000t;

                        reset-assert-us = <10000>;
                        reset-deassert-us = <30000>;
                        reset-gpios = <&gpio GPIOZ_15 (GPIO_ACTIVE_LOW |
                                                GPIO_OPEN_DRAIN)>;

                        interrupt-parent = <&gpio_intc>;
                        /* MAC_INTR on GPIOZ_14 */
                        interrupts = <26 IRQ_TYPE_LEVEL_LOW>;
                };
        };

        &ethmac {
                status = "okay";

                pinctrl-0 = <&eth_pins>, <&eth_rgmii_pins>;
                pinctrl-names = "default";

                phy-mode = "rgmii-txid";
                phy-handle = <&external_phy>;

                amlogic,rgmii-rx-delay-ps = <800>;
        };

To use the same settings from vendor u-boot (which in my case has broken
Ethernet) the following commands can be used:
  mw.l 0xff634540 0x1621
  mw.l 0xff634544 0x30000
  phyreg w 0x0 0x1040
  phyreg w 0x1f 0xd08
  phyreg w 0x11 0x9
  phyreg w 0x15 0x11
  phyreg w 0x1f 0x0
  phyreg w 0x0 0x9200

Also I have tested this on a X96 Max board without any .dts changes
to confirm that other boards with the same IP block still work fine
with these changes.

Changes since v3 at [3].
- added Florian's Reviewed-by to patch 1 (thank you!)
- rebased on top of net-next

Changes since v2 at [2]:
- use the generic property name "rx-internal-delay-ps" as suggested by
  Rob (thanks!). This affects patches #1 and #3. The biggest change is
  is in patch #1 which is why I didn't add Florian's and Andrew's
  Reviewed-by
- added Andrew's and Florian's Reviewed-by to patches 2, 3, 4, 5 (many
  thanks to both!). I decided to do this despite renaming the property
  to the generic name "rx-internal-delay-ps" as it only affects the
  patch description and one line of code
- updated patch description of patch #3 to explain why there's not a
  lot of validation when parsing the old device-tree property (in
  nanosecond precision)
- dropped RFC status

Changes since v1 at [1]:
- updated patch 1 by making it more clear when the RX delay is applied.
  Thanks to Andrew for the suggestion!
- added a fix to enabling the timing-adjustment clock only when really
  needed. Found by Andrew - thanks!
- added testing not about X96 Max
- v1 did not go to the netdev mailing list, v2 fixes this

[0] https://lore.kernel.org/netdev/CAFBinCATt4Hi9rigj52nMf3oygyFbnopZcsakGL=KyWnsjY3JA@mail.gmail.com/
[1] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384279&state=%2A&archive=both
[2] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384491&state=%2A&archive=both
[3] https://patchwork.kernel.org/project/linux-amlogic/list/?series=406005&state=%2A&archive=both
====================

Link: https://lore.kernel.org/r/20210106134251.45264-1-martin.blumenstingl@googlemail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7cd1de76

net: stmmac: dwmac-meson8b: add support for the RGMII RX delay on G12A · de94fc10

Martin Blumenstingl authored Jan 06, 2021

Amlogic Meson G12A (and newer: G12B, SM1) SoCs have a more advanced RX
delay logic. Instead of fine-tuning the delay in the nanoseconds range
it now allows tuning in 200 picosecond steps. This support comes with
new bits in the PRG_ETH1[19:16] register.

Add support for validating the RGMII RX delay as well as configuring the
register accordingly on these platforms.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

de94fc10

net: stmmac: dwmac-meson8b: move RGMII delays into a separate function · 7985244d

Martin Blumenstingl authored Jan 06, 2021

Newer SoCs starting with the Amlogic Meson G12A have more a precise
RGMII RX delay configuration register. This means more complexity in the
code. Extract the existing RGMII delay configuration code into a
separate function to make it easier to read/understand even when adding
more logic in the future.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7985244d

net: stmmac: dwmac-meson8b: use picoseconds for the RGMII RX delay · 140ddf06

Martin Blumenstingl authored Jan 06, 2021

Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
delay register which allows picoseconds precision. Parse the new
"rx-internal-delay-ps" property or fall back to the value from the old
"amlogic,rx-delay-ns" property.

No upstream DTB uses the old "amlogic,rx-delay-ns" property (yet).
Only include minimalistic logic to fall back to the old property,
without any special validation (for example if the old and new
property are given at the same time).
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

140ddf06

net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock · 02582288

Martin Blumenstingl authored Jan 06, 2021

The timing-adjustment clock only has to be enabled when a) there is a
2ns RX delay configured using device-tree and b) the phy-mode indicates
that the RX delay should be enabled.

Only enable the RX delay if both are true, instead of (by accident) also
enabling it when there's the 2ns RX delay configured but the phy-mode
incicates that the RX delay is not used.

Fixes: 9308c476 ("net: stmmac: dwmac-meson8b: add support for the RX delay configuration")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

02582288

dt-bindings: net: dwmac-meson: use picoseconds for the RGMII RX delay · 6b5903f5

Martin Blumenstingl authored Jan 06, 2021

Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
delay register which allows picoseconds precision. Deprecate the old
"amlogic,rx-delay-ns" in favour of the generic "rx-internal-delay-ps"
property.

For older SoCs the only known supported values were 0ns and 2ns. The new
SoCs have support for RGMII RX delays between 0ps and 3000ps in 200ps
steps.

Don't carry over the description for the "rx-internal-delay-ps" property
and inherit that from ethernet-controller.yaml instead.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6b5903f5

Merge branch 'reduce-coupling-between-dsa-and-broadcom-systemport-driver' · 85b277de

Jakub Kicinski authored Jan 07, 2021

Vladimir Oltean says:

====================
Reduce coupling between DSA and Broadcom SYSTEMPORT driver

Upon a quick inspection, it seems that there is some code in the generic
DSA layer that is somehow specific to the Broadcom SYSTEMPORT driver.
The challenge there is that the hardware integration is very tight between
the switch and the DSA master interface. However this does not mean that
the drivers must also be as integrated as the hardware is. We can avoid
creating a DSA notifier just for the Broadcom SYSTEMPORT, and we can
move some Broadcom-specific queue mapping helpers outside of the common
include/net/dsa.h.
====================

Link: https://lore.kernel.org/r/20210107012403.1521114-1-olteanv@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

85b277de

net: dsa: remove the DSA specific notifiers · 1dbb1302

Vladimir Oltean authored Jan 07, 2021

This effectively reverts commit 60724d4b ("net: dsa: Add support for
DSA specific notifiers"). The reason is that since commit 2f1e8ea7
("net: dsa: link interfaces with the DSA master to get rid of lockdep
warnings"), it appears that there is a generic way to achieve the same
purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
converted to use the generic notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1dbb1302

net: systemport: use standard netdevice notifier to detect DSA presence · 1593cd40

Vladimir Oltean authored Jan 07, 2021

The SYSTEMPORT driver maps each port of the embedded Broadcom DSA switch
port to a certain queue of the master Ethernet controller. For that it
currently uses a dedicated notifier infrastructure which was added in
commit 60724d4b ("net: dsa: Add support for DSA specific notifiers").

However, since commit 2f1e8ea7 ("net: dsa: link interfaces with the
DSA master to get rid of lockdep warnings"), DSA is actually an upper of
the Broadcom SYSTEMPORT as far as the netdevice adjacency lists are
concerned. So naturally, the plain NETDEV_CHANGEUPPER net device notifiers
are emitted. It looks like there is enough API exposed by DSA to the
outside world already to make the call_dsa_notifiers API redundant. So
let's convert its only user to plain netdev notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1593cd40

net: dsa: export dsa_slave_dev_check · a5e3c9ba

Vladimir Oltean authored Jan 07, 2021

Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
they are enslaved to e.g. a bridge by calling netif_is_bridge_master().

Export this helper from DSA to get the equivalent functionality of
determining whether the upper interface of a CHANGEUPPER notifier is a
DSA switch interface or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a5e3c9ba

net: dsa: move the Broadcom tag information in a separate header file · f46b9b8e

Vladimir Oltean authored Jan 07, 2021

It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f46b9b8e

Merge branch 'offload-software-learnt-bridge-addresses-to-dsa' · c214cc3a

Jakub Kicinski authored Jan 07, 2021

Vladimir Oltean says:

====================
Offload software learnt bridge addresses to DSA

This series tries to make DSA behave a bit more sanely when bridged with
"foreign" (non-DSA) interfaces and source address learning is not
supported on the hardware CPU port (which would make things work more
seamlessly without software intervention). When a station A connected to
a DSA switch port needs to talk to another station B connected to a
non-DSA port through the Linux bridge, DSA must explicitly add a route
for station B towards its CPU port.

Initial RFC was posted here:
https://patchwork.ozlabs.org/project/netdev/cover/20201108131953.2462644-1-olteanv@gmail.com/

v2 was posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/20201213024018.772586-1-vladimir.oltean@nxp.com/

v3 was posted here:
https://patchwork.kernel.org/project/netdevbpf/cover/20201213140710.1198050-1-vladimir.oltean@nxp.com/

This is a resend of the previous v3 with some added Reviewed-by tags.
====================

Link: https://lore.kernel.org/r/20210106095136.224739-1-olteanv@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c214cc3a

net: dsa: ocelot: request DSA to fix up lack of address learning on CPU port · c54913c1

Vladimir Oltean authored Jan 06, 2021

Given the following setup:

ip link add br0 type bridge
ip link set eno0 master br0
ip link set swp0 master br0
ip link set swp1 master br0
ip link set swp2 master br0
ip link set swp3 master br0

Currently, packets received on a DSA slave interface (such as swp0)
which should be routed by the software bridge towards a non-switch port
(such as eno0) are also flooded towards the other switch ports (swp1,
swp2, swp3) because the destination is unknown to the hardware switch.

This patch addresses the issue by monitoring the addresses learnt by the
software bridge on eno0, and adding/deleting them as static FDB entries
on the CPU port accordingly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c54913c1

net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors · d5f19486

Vladimir Oltean authored Jan 06, 2021

Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.

This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).

There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
  unknown
- the risk of stale routes, which can lead to temporary packet loss

To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                       ^        ^
       |                                                       |        |
       |                                                       |        |
       |                                                    Client A  Client B
       |
       |
       |
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
  the software entries on other bridge ports, so it will not know that
  clients A and B are reachable via the CPU port UNLESS the hardware
  switch itself performs SA learning from traffic injected from the CPU.
  Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
  Client A and B are reachable through swp0. Therefore, the software br0
  of AP 2 also may or may not learn this. In the example we're
  illustrating, some Ethernet traffic has been going on, and br0 from AP
  2 has indeed learnt that it can reach Client B through swp0.

One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                            ^
       |                                                            |
       |                                                         Client A
       |
       |
       |                                                         Client B
       |                                                            |
       |                                                            v
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
  are cases where it might never find out though. Either way, DSA today
  does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
  Client B can be reached via swp0, if it receives any packet with
  Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
  can be reached via swp0. It does not know that it has roamed to wlan0,
  because it doesn't perform SA learning from the CPU port.

Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.

This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.

The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.

Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.

In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.

We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.

This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.
Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d5f19486

net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB · 5fb4a451

Vladimir Oltean authored Jan 06, 2021

Right now, the following would happen for a switch driver that does not
implement .port_fdb_add or .port_fdb_del.

dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
-> dsa_slave_switchdev_event_work
   -> dsa_port_fdb_add
      -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
         -> dsa_switch_fdb_add
            -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
   -> an error is printed with dev_dbg, and
      dsa_fdb_offload_notify(switchdev_work) is not called.

We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
Because we don't call dsa_fdb_offload_notify, the static FDB entry will
remain just in the software bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5fb4a451

net: dsa: move switchdev event implementation under the same switch/case statement · 447d290a

Vladimir Oltean authored Jan 06, 2021

We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
events even for interfaces where dsa_slave_dev_check returns false, so
we need that check inside the switch-case statement for SWITCHDEV_FDB_*.

This movement also avoids a useless allocation / free of switchdev_work
on the untreated "default event" case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

447d290a

net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work · c4bb76a9

Vladimir Oltean authored Jan 06, 2021

Currently DSA doesn't add FDB entries on the CPU port, because it only
does so through switchdev, which is associated with a net_device, and
there are none of those for the CPU port.

But actually FDB addresses on the CPU port have some use cases of their
own, if the switchdev operations are initiated from within the DSA
layer. There is just one problem with the existing code: it passes a
structure in dsa_switchdev_event_work which was retrieved directly from
switchdev, so it contains a net_device. We need to generalize the
contents to something that covers the CPU port as well: the "ds, port"
tuple is fine for that.

Note that the new procedure for notifying the successful FDB offload is
inspired from the rocker model.

Also, nothing was being done if added_by_user was false. Let's check for
that a lot earlier, and don't actually bother to schedule the worker
for nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c4bb76a9

net: dsa: be louder when a non-legacy FDB operation fails · 2fd18650

Vladimir Oltean authored Jan 06, 2021

The dev_close() call was added in commit c9eb3e0f ("net: dsa: Add
support for learning FDB through notification") "to indicate inconsistent
situation" when we could not delete an FDB entry from the port.

bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master

It is a bit drastic and at the same time not helpful if the above fails
to only print with netdev_dbg log level, but on the other hand to bring
the interface down.

So increase the verbosity of the error message, and drop dev_close().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2fd18650

net: bridge: notify switchdev of disappearance of old FDB entry upon migration · 90dc8fd3

Vladimir Oltean authored Jan 06, 2021

Currently the bridge emits atomic switchdev notifications for
dynamically learnt FDB entries. Monitoring these notifications works
wonders for switchdev drivers that want to keep their hardware FDB in
sync with the bridge's FDB.

For example station A wants to talk to station B in the diagram below,
and we are concerned with the behavior of the bridge on the DUT device:

                   DUT
 +-------------------------------------+
 |                 br0                 |
 | +------+ +------+ +------+ +------+ |
 | |      | |      | |      | |      | |
 | | swp0 | | swp1 | | swp2 | | eth0 | |
 +-------------------------------------+
      |        |                  |
  Station A    |                  |
               |                  |
         +--+------+--+    +--+------+--+
         |  |      |  |    |  |      |  |
         |  | swp0 |  |    |  | swp0 |  |
 Another |  +------+  |    |  +------+  | Another
  switch |     br0    |    |     br0    | switch
         |  +------+  |    |  +------+  |
         |  |      |  |    |  |      |  |
         |  | swp1 |  |    |  | swp1 |  |
         +--+------+--+    +--+------+--+
                                  |
                              Station B

Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
the following property: frames injected from its control interface bypass
the internal address analyzer logic, and therefore, this hardware does
not learn from the source address of packets transmitted by the network
stack through it. So, since bridging between eth0 (where Station B is
attached) and swp0 (where Station A is attached) is done in software,
the switchdev hardware will never learn the source address of Station B.
So the traffic towards that destination will be treated as unknown, i.e.
flooded.

This is where the bridge notifications come in handy. When br0 on the
DUT sees frames with Station B's MAC address on eth0, the switchdev
driver gets these notifications and can install a rule to send frames
towards Station B's address that are incoming from swp0, swp1, swp2,
only towards the control interface. This is all switchdev driver private
business, which the notification makes possible.

All is fine until someone unplugs Station B's cable and moves it to the
other switch:

                   DUT
 +-------------------------------------+
 |                 br0                 |
 | +------+ +------+ +------+ +------+ |
 | |      | |      | |      | |      | |
 | | swp0 | | swp1 | | swp2 | | eth0 | |
 +-------------------------------------+
      |        |                  |
  Station A    |                  |
               |                  |
         +--+------+--+    +--+------+--+
         |  |      |  |    |  |      |  |
         |  | swp0 |  |    |  | swp0 |  |
 Another |  +------+  |    |  +------+  | Another
  switch |     br0    |    |     br0    | switch
         |  +------+  |    |  +------+  |
         |  |      |  |    |  |      |  |
         |  | swp1 |  |    |  | swp1 |  |
         +--+------+--+    +--+------+--+
               |
           Station B

Luckily for the use cases we care about, Station B is noisy enough that
the DUT hears it (on swp1 this time). swp1 receives the frames and
delivers them to the bridge, who enters the unlikely path in br_fdb_update
of updating an existing entry. It moves the entry in the software bridge
to swp1 and emits an addition notification towards that.

As far as the switchdev driver is concerned, all that it needs to ensure
is that traffic between Station A and Station B is not forever broken.
If it does nothing, then the stale rule to send frames for Station B
towards the control interface remains in place. But Station B is no
longer reachable via the control interface, but via a port that can
offload the bridge port learning attribute. It's just that the port is
prevented from learning this address, since the rule overrides FDB
updates. So the rule needs to go. The question is via what mechanism.

It sure would be possible for this switchdev driver to keep track of all
addresses which are sent to the control interface, and then also listen
for bridge notifier events on its own ports, searching for the ones that
have a MAC address which was previously sent to the control interface.
But this is cumbersome and inefficient. Instead, with one small change,
the bridge could notify of the address deletion from the old port, in a
symmetrical manner with how it did for the insertion. Then the switchdev
driver would not be required to monitor learn/forget events for its own
ports. It could just delete the rule towards the control interface upon
bridge entry migration. This would make hardware address learning be
possible again. Then it would take a few more packets until the hardware
and software FDB would be in sync again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

90dc8fd3

Merge branch 'r8169-improve-rtl8168g-phy-suspend-quirk' · dd15c4a0

Jakub Kicinski authored Jan 07, 2021

Heiner Kallweit says:

====================
r8169: improve RTL8168g PHY suspend quirk

According to Realtek the ERI register 0x1a8 quirk is needed to work
around a hw issue with the PHY on RTL8168g. The register needs to be
changed before powering down the PHY. Currently we don't meet this
requirement, however I'm not aware of any problems caused by this.
Therefore I see the change as an improvement.

The PHY driver has no means to access the chip ERI registers,
therefore we have to intercept MDIO writes to the BMCR register.
If the BMCR_PDOWN bit is going to be set, then let's apply the
quirk before actually powering down the PHY.
====================

Link: https://lore.kernel.org/r/9303c2cf-c521-beea-c09f-63b5dfa91b9c@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dd15c4a0

r8169: improve RTL8168g PHY suspend quirk · acb58657

Heiner Kallweit authored Jan 06, 2021

According to Realtek the ERI register 0x1a8 quirk is needed to work
around a hw issue with the PHY on RTL8168g. The register needs to be
changed before powering down the PHY. Currently we don't meet this
requirement, however I'm not aware of any problems caused by this.
Therefore I see the change as an improvement.

The PHY driver has no means to access the chip ERI registers,
therefore we have to intercept MDIO writes to BMCR register.
If the BMCR_PDOWN bit is going to be set, then let's apply the
quirk before actually powering down the PHY.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

acb58657

r8169: move ERI access functions to avoid forward declaration · c6cff9df

Heiner Kallweit authored Jan 06, 2021

No functional change here. We just move a code block to avoid a
function forward declaration in a subsequent change.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c6cff9df

net: phy: replace mutex_is_locked with lockdep_assert_held in phylib · e6e918d4

Heiner Kallweit authored Jan 06, 2021

Switch to lockdep_assert_held(_once), similar to what is being done
in other subsystems. One advantage is that there's zero runtime
overhead if lockdep support isn't enabled.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/ccc40b9d-8ee0-43a1-5009-2cc95ca79c85@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e6e918d4

net: phy: bcm7xxx: Add an entry for BCM72116 · 8b86850b

Florian Fainelli authored Jan 06, 2021

BCM72116 features a 28nm integrated EPHY, add an entry to match this PHY
OUI.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210106170944.1253046-1-f.fainelli@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8b86850b

Merge branch 'udp_tunnel_nic-post-conversion-cleanup' · 0b86235d

Jakub Kicinski authored Jan 07, 2021

udp_tunnel_nic: post conversion cleanup

It has been two releases since we added the common infra for UDP
tunnel port offload, and we have not heard of any major issues.
Remove the old direct driver NDOs completely, and perform minor
simplifications in the tunnel drivers.

Link: https://lore.kernel.org/r/20210106210637.1839662-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0b86235d

udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks · b9ef3fec

Jakub Kicinski authored Jan 06, 2021

Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into
udp_tunnel_nic_*_port() helpers, since they're always
done right before the call.

Add similar checks before calling the notifier.
udp_tunnel_nic invokes the notifier without checking
features which could result in some wasted cycles.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b9ef3fec

net: remove ndo_udp_tunnel_* callbacks · 30bfce10

Jakub Kicinski authored Jan 06, 2021

All UDP tunnel port management is now routed via udp_tunnel_nic
infra directly. Remove the old callbacks.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

30bfce10

udp_tunnel: remove REGISTER/UNREGISTER handling from tunnel drivers · dedc33e7

Jakub Kicinski authored Jan 06, 2021

udp_tunnel_nic handles REGISTER and UNREGISTER event, now that all
drivers use that infra we can drop the event handling in the tunnel
drivers.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dedc33e7