Commits · 5ebfb4cc3048380b43506ffc71b9cf8b83128989 · Kirill Smelkov / linux

05 Aug, 2021 30 commits

Coco Li authored Aug 05, 2021

To verify that this hash implements the Toeplitz hash function.

Additionally, provide a script toeplitz.sh to run the test in loopback mode
on a networking device of choice (see setup_loopback.sh). Since the
script modifies the NIC setup, it will not be run by selftests
automatically.

Tested:
./toeplitz.sh -i eth0 -irq_prefix <eth0_pattern> -t -6
carrier ready
rxq 0: cpu 14
rxq 1: cpu 20
rxq 2: cpu 17
rxq 3: cpu 23
cpu 14: rx_hash 0x69103ebc [saddr fda8::2 daddr fda8::1 sport 58938 dport 8000] OK rxq 0 (cpu 14)
...
cpu 20: rx_hash 0x257118b9 [saddr fda8::2 daddr fda8::1 sport 59258 dport 8000] OK rxq 1 (cpu 20)
count: pass=111 nohash=0 fail=0
Test Succeeded!
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ebfb4cc

selftests/net: GRO coalesce test · 7d157501

Coco Li authored Aug 05, 2021

Implement a GRO testsuite that expects Linux kernel GRO behavior.
All tests pass with the kernel software GRO stack. Run against a device
with hardware GRO to verify that it matches the software stack.

gro.c generates packets and sends them out through a packet socket. The
receiver in gro.c (run separately) receives the packets on a packet
socket, filters them by destination ports using BPF and checks the
packet geometry to see whether GRO was applied.

gro.sh provides a wrapper to run the gro.c in NIC loopback mode.
It is not included in continuous testing because it modifies network
configuration around a physical NIC: gro.sh sets the NIC in loopback
mode, creates macvlan devices on the physical device in separate
namespaces, and sends traffic generated by gro.c between the two
namespaces to observe coalescing behavior.

GRO coalescing is time sensitive.
Some tests may prove flaky on some hardware.

Note that this test suite tests for software GRO unless hardware GRO is
enabled (ethtool -K $DEV rx-gro-hw on).

To test, run ./gro.sh.
The wrapper will output success or failed test names, and generate
log.txt and stderr.

Sample log.txt result:
...
pure data packet of same size: Test succeeded

large data packets followed by a smaller one: Test succeeded

small data packets followed by a larger one: Test succeeded
...

Sample stderr result:
...
carrier ready
running test ipv4 data
Expected {200 }, Total 1 packets
Received {200 }, Total 1 packets.
...
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d157501

wwan: mhi: Fix build. · ab996c42

David S. Miller authored Aug 05, 2021

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab996c42

net/ipv6/mcast: Use struct_size() helper · e11c0e25

Gustavo A. R. Silva authored Aug 04, 2021

Replace IP6_SFLSIZE() with struct_size() helper in order to avoid any
potential type mistakes or integer overflows that, in the worst
scenario, could lead to heap overflows.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e11c0e25

net/ipv4/igmp: Use struct_size() helper · e6a1f7e0

Gustavo A. R. Silva authored Aug 04, 2021

Replace IP_SFLSIZE() with struct_size() helper in order to avoid any
potential type mistakes or integer overflows that, in the worst
scenario, could lead to heap overflows.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6a1f7e0

net/ipv4/ipv6: Replace one-element arraya with flexible-array members · db243b79

Gustavo A. R. Silva authored Aug 04, 2021

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged and refactor the related code accordingly:

$ pahole -C group_filter net/ipv4/ip_sockglue.o
struct group_filter {
	union {
		struct {
			__u32      gf_interface_aux;     /*     0     4 */

			/* XXX 4 bytes hole, try to pack */

			struct __kernel_sockaddr_storage gf_group_aux; /*     8   128 */
			/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
			__u32      gf_fmode_aux;         /*   136     4 */
			__u32      gf_numsrc_aux;        /*   140     4 */
			struct __kernel_sockaddr_storage gf_slist[1]; /*   144   128 */
		};                                       /*     0   272 */
		struct {
			__u32      gf_interface;         /*     0     4 */

			/* XXX 4 bytes hole, try to pack */

			struct __kernel_sockaddr_storage gf_group; /*     8   128 */
			/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
			__u32      gf_fmode;             /*   136     4 */
			__u32      gf_numsrc;            /*   140     4 */
			struct __kernel_sockaddr_storage gf_slist_flex[0]; /*   144     0 */
		};                                       /*     0   144 */
	};                                               /*     0   272 */

	/* size: 272, cachelines: 5, members: 1 */
	/* last cacheline: 16 bytes */
};

$ pahole -C compat_group_filter net/ipv4/ip_sockglue.o
struct compat_group_filter {
	union {
		struct {
			__u32      gf_interface_aux;     /*     0     4 */
			struct __kernel_sockaddr_storage gf_group_aux __attribute__((__aligned__(4))); /*     4   128 */
			/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
			__u32      gf_fmode_aux;         /*   132     4 */
			__u32      gf_numsrc_aux;        /*   136     4 */
			struct __kernel_sockaddr_storage gf_slist[1] __attribute__((__aligned__(4))); /*   140   128 */
		} __attribute__((__packed__)) __attribute__((__aligned__(4)));                     /*     0   268 */
		struct {
			__u32      gf_interface;         /*     0     4 */
			struct __kernel_sockaddr_storage gf_group __attribute__((__aligned__(4))); /*     4   128 */
			/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
			__u32      gf_fmode;             /*   132     4 */
			__u32      gf_numsrc;            /*   136     4 */
			struct __kernel_sockaddr_storage gf_slist_flex[0] __attribute__((__aligned__(4))); /*   140     0 */
		} __attribute__((__packed__)) __attribute__((__aligned__(4)));                     /*     0   140 */
	} __attribute__((__aligned__(1)));               /*     0   268 */

	/* size: 268, cachelines: 5, members: 1 */
	/* forced alignments: 1 */
	/* last cacheline: 12 bytes */
} __attribute__((__packed__));

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

db243b79

Merge branch 'bridge-ioctl-fixes' · d15040a3

David S. Miller authored Aug 05, 2021

Nikolay Aleksandrov says:

====================
net: bridge: fix recent ioctl changes

These are three fixes for the recent bridge removal of ndo_do_ioctl
done by commit ad2f99ae ("net: bridge: move bridge ioctls out of
.ndo_do_ioctl"). Patch 01 fixes a deadlock of the new bridge ioctl
hook lock and rtnl by taking a netdev reference and always taking the
bridge ioctl lock first then rtnl from within the bridge hook.
Patch 02 fixes old_deviceless() bridge calls device name argument, and
patch 03 checks in dev_ifsioc()'s SIOCBRADD/DELIF cases if the netdevice is
actually a bridge before interpreting its private ptr as net_bridge.

Patch 01 was tested by running old bridge-utils commands with lockdep
enabled. Patch 02 was tested again by using bridge-utils and using the
respective ioctl calls on a "up" bridge device. Patch 03 was tested by
using the addif ioctl on a non-bridge device (e.g. loopback).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d15040a3

net: core: don't call SIOCBRADD/DELIF for non-bridge devices · 9384eacd

Nikolay Aleksandrov authored Aug 05, 2021

Commit ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
changed SIOCBRADD/DELIF to use bridge's ioctl hook (br_ioctl_hook)
without checking if the target netdevice is actually a bridge which can
cause crashes and generally interpreting other devices' private pointers
as net_bridge pointers.

Crash example (lo - loopback):
$ brctl addif lo ens16
 BUG: kernel NULL pointer dereference, address: 000000000000059898
 #PF: supervisor read access in kernel modede
 #PF: error_code(0x0000) - not-present pagege
 PGD 0 P4D 0 ^Ac
 Oops: 0000 [#1] SMP NOPTI
 CPU: 2 PID: 1376 Comm: brctl Kdump: loaded Tainted: G        W         5.14.0-rc3+ #405
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
 RIP: 0010:add_del_if+0x1f/0x7c [bridge]
 Code: 80 bf 1b a0 41 5c e9 c0 3c 03 e1 0f 1f 44 00 00 41 55 41 54 41 89 f4 be 0c 00 00 00 55 48 89 fd 53 48 8b 87 88 00 00 00 89 d3 <4c> 8b a8 98 05 00 00 49 8b bd d0 00 00 00 e8 17 d7 f3 e0 84 c0 74
 RSP: 0018:ffff888109d97cb0 EFLAGS: 00010202^Ac
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff888101239bc0
 RBP: ffff888101239bc0 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff888109d97cd8 R11: 00000000000000a3 R12: 0000000000000012
 R13: 0000000000000000 R14: ffff888101239bc0 R15: ffff888109d97e10
 FS:  00007fc1e365b540(0000) GS:ffff88822be80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000598 CR3: 0000000106506000 CR4: 00000000000006e0
 Call Trace:
  br_ioctl_stub+0x7c/0x441 [bridge]
  br_ioctl_call+0x6d/0x8a
  dev_ifsioc+0x325/0x4e8
  dev_ioctl+0x46b/0x4e1
  sock_do_ioctl+0x7b/0xad
  sock_ioctl+0x2de/0x2f2
  vfs_ioctl+0x1e/0x2b
  __do_sys_ioctl+0x63/0x86
  do_syscall_64+0xcb/0xf2
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fc1e3589427
 Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffc8d501d38 EFLAGS: 00000202 ORIG_RAX: 000000000000001010
 RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fc1e3589427
 RDX: 00007ffc8d501d60 RSI: 00000000000089a3 RDI: 0000000000000003
 RBP: 00007ffc8d501d60 R08: 0000000000000000 R09: fefefeff77686d74
 R10: fffffffffffff8f9 R11: 0000000000000202 R12: 00007ffc8d502e06
 R13: 00007ffc8d502e06 R14: 0000000000000000 R15: 0000000000000000
 Modules linked in: bridge stp llc bonding ipv6 virtio_net [last unloaded: llc]^Ac
 CR2: 0000000000000598

Reported-by: syzbot+79f4a8692e267bdb7227@syzkaller.appspotmail.com
Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9384eacd

net: bridge: fix ioctl old_deviceless bridge argument · cbd7ad29

Nikolay Aleksandrov authored Aug 05, 2021

Commit ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
changed the source of the argument copy in bridge's old_deviceless() from
args[1] (user ptr to device name) to uarg (ptr to ioctl arguments) causing
wrong device name to be used.

Example (broken, bridge exists but is up):
$ brctl delbr bridge
bridge bridge doesn't exist; can't delete it

Example (working):
$ brctl delbr bridge
bridge bridge is still up; can't delete it

Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbd7ad29

net: bridge: fix ioctl locking · 893b1958

Nikolay Aleksandrov authored Aug 05, 2021

Before commit ad2f99ae ("net: bridge: move bridge ioctls out of
.ndo_do_ioctl") the bridge ioctl calls were divided in two parts:
one was deviceless called by sock_ioctl and didn't expect rtnl to be held,
the other was with a device called by dev_ifsioc() and expected rtnl to be
held. After the commit above they were united in a single ioctl stub, but
it didn't take care of the locking expectations.
For sock_ioctl now we acquire  (1) br_ioctl_mutex, (2) rtnl
and for dev_ifsioc we acquire  (1) rtnl,           (2) br_ioctl_mutex

The fix is to get a refcnt on the netdev for dev_ifsioc calls and drop rtnl
then to reacquire it in the bridge ioctl stub after br_ioctl_mutex has
been acquired. That will avoid playing locking games and make the rules
straight-forward: we always take br_ioctl_mutex first, and then rtnl.

Reported-by: syzbot+34fe5894623c4ab1b379@syzkaller.appspotmail.com
Fixes: ad2f99ae ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

893b1958

net/ipv4: Revert use of struct_size() helper · 4167a960

Gustavo A. R. Silva authored Aug 04, 2021

Revert the use of structr_size() and stay with IP_MSFILTER_SIZE() for
now, as in this case, the size of struct ip_msfilter didn't change with
the addition of the flexible array imsf_slist_flex[]. So, if we use
struct_size() we will be allocating and calculating the size of
struct ip_msfilter with one too many items for imsf_slist_flex[].

We might use struct_size() in the future, but for now let's stay
with IP_MSFILTER_SIZE().

Fixes: 2d3e5caf ("net/ipv4: Replace one-element array with flexible-array member")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4167a960

net: fix GRO skb truesize update · af352460

Paolo Abeni authored Aug 04, 2021

commit 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock
reference") introduces a serious regression at the GRO layer setting
the wrong truesize for stolen-head skbs.

Restore the correct truesize: SKB_DATA_ALIGN(...) instead of
SKB_TRUESIZE(...)
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Fixes: 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock reference")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af352460

Merge branch 'ipa-runtime-pm' · 83945480

David S. Miller authored Aug 05, 2021

Alex Elder says:

====================
net: ipa: more work toward runtime PM

The first two patches in this series are basically bug fixes, but in
practice I don't think we've seen the problems they might cause.

The third patch moves clock and interconnect related error messages
around a bit, reporting better information and doing so in the
functions where they are enabled or disabled (rather than those
functions' callers).

The last three patches move power-related code into "ipa_clock.c",
as a step toward generalizing the purpose of that source file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

83945480

net: ipa: move IPA flags field · afb08b7e

Alex Elder authored Aug 04, 2021

The ipa->flags field is only ever used in "ipa_clock.c", related to
suspend/resume activity.

Move the definition of the ipa_flag enumerated type to "ipa_clock.c".
And move the flags field from the ipa structure and to the ipa_clock
structure.  Rename the type and its values to include "power" or
"POWER" in the name.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

afb08b7e

net: ipa: move ipa_suspend_handler() · afe1baa8

Alex Elder authored Aug 04, 2021

Move ipa_suspend_handler() into "ipa_clock.c" from "ipa_main.c", to
group with the reset of the suspend/resume code.  This IPA interrupt
is triggered if an IPA RX endpoint is suspended but has a packet to
be delivered.

Introduce ipa_power_setup() and ipa_power_teardown() to add and
remove the handler for the IPA SUSPEND interrupt at the same place
as before, while allowing the handler to remain private.

The "power" naming convention will be adopted elsewhere in this
file as well (soon).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

afe1baa8

net: ipa: move IPA power operations to ipa_clock.c · 73ff316d

Alex Elder authored Aug 04, 2021

Move ipa_suspend() and ipa_resume(), as well as the definition of
the ipa_pm_ops structure into "ipa_clock.c".  Make ipa_pm_ops public
and declare it as extern in "ipa_clock.h".

This is part of centralizing IPA power management functionality into
"ipa_clock.c" (the file will eventually get a name change).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

73ff316d

net: ipa: improve IPA clock error messages · 8ee7c40a

Alex Elder authored Aug 04, 2021

Rearrange messages reported when errors occur in the IPA clock code,
so that the specific interconnect is identified when an error occurs
enabling or disabling it, or the core clock is indicated when an
error occurs enabling it.

Have ipa_interconnect_disable() return zero or the negative error
value returned by the first interconnect that produced an error
when disabled.  For now, the callers ignore the returned value.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ee7c40a

net: ipa: reorder netdev pointer assignments · 10cc73c4

Alex Elder authored Aug 04, 2021

Assign the ipa->modem_netdev and endpoint->netdev pointers *before*
registering the network device.  As soon as the device is
registered it can be opened, and by that time we'll want those
pointers valid.

Similarly, don't make those pointers NULL until *after* the modem
network device is unregistered in ipa_modem_stop().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

10cc73c4

net: ipa: don't suspend/resume modem if not up · 30c2515b

Alex Elder authored Aug 04, 2021

The modem network device is set up by ipa_modem_start().  But its
TX queue is not actually started and endpoints enabled until it is
opened.

So avoid stopping the modem network device TX queue and disabling
endpoints on suspend or stop unless the netdev is marked UP.  And
skip attempting to resume unless it is UP.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

30c2515b

Merge branch 'sja1105-H' · 1f52247e

David S. Miller authored Aug 05, 2021

Vladimir Oltean says:

====================
NXP SJA1105 driver support for "H" switch topologies

Changes in v3:
Preserve the behavior of dsa_tree_setup_default_cpu() which is to pick
the first CPU port and not the last.

Changes in v2:
Send as non-RFC, drop the patches for discarding DSA-tagged packets on
user ports and DSA-untagged packets on DSA and CPU ports for now.

NXP builds boards like the Bluebox 3 where there are multiple SJA1110
switches connected to an LX2160A, but they are also connected to each
other. I call this topology an "H" tree because of the lateral
connection between switches. A piece extracted from a non-upstream
device tree looks like this:

&spi_bridge {
        /* SW1 */
        ethernet-switch@0 {
                compatible = "nxp,sja1110a";
                reg = <0>;
                dsa,member = <0 0>;

                ethernet-ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        /* SW1_P1 */
                        port@1 {
                                reg = <1>;
                                label = "con_2x20";
                                phy-mode = "sgmii";

                                fixed-link {
                                        speed = <1000>;
                                        full-duplex;
                                };
                        };

                        port@2 {
                                reg = <2>;
                                ethernet = <&dpmac17>;
                                phy-mode = "rgmii-id";

                                fixed-link {
                                        speed = <1000>;
                                        full-duplex;
                                };
                        };

                        port@3 {
                                reg = <3>;
                                label = "1ge_p1";
                                phy-mode = "rgmii-id";
                                phy-handle = <&sw1_mii3_phy>;
                        };

                        sw1p4: port@4 {
                                reg = <4>;
                                link = <&sw2p1>;
                                phy-mode = "sgmii";

                                fixed-link {
                                        speed = <1000>;
                                        full-duplex;
                                };
                        };

                        port@5 {
                                reg = <5>;
                                label = "trx1";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port5_base_t1_phy>;
                        };

                        port@6 {
                                reg = <6>;
                                label = "trx2";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port6_base_t1_phy>;
                        };

                        port@7 {
                                reg = <7>;
                                label = "trx3";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port7_base_t1_phy>;
                        };

                        port@8 {
                                reg = <8>;
                                label = "trx4";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port8_base_t1_phy>;
                        };

                        port@9 {
                                reg = <9>;
                                label = "trx5";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port9_base_t1_phy>;
                        };

                        port@a {
                                reg = <10>;
                                label = "trx6";
                                phy-mode = "internal";
                                phy-handle = <&sw1_port10_base_t1_phy>;
                        };
                };
        };

        /* SW2 */
        ethernet-switch@2 {
                compatible = "nxp,sja1110a";
                reg = <2>;
                dsa,member = <0 1>;

                ethernet-ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        sw2p1: port@1 {
                                reg = <1>;
                                link = <&sw1p4>;
                                phy-mode = "sgmii";

                                fixed-link {
                                        speed = <1000>;
                                        full-duplex;
                                };
                        };

                        port@2 {
                                reg = <2>;
                                ethernet = <&dpmac18>;
                                phy-mode = "rgmii-id";

                                fixed-link {
                                        speed = <1000>;
                                        full-duplex;
                                };
                        };

                        port@3 {
                                reg = <3>;
                                label = "1ge_p2";
                                phy-mode = "rgmii-id";
                                phy-handle = <&sw2_mii3_phy>;
                        };

                        port@4 {
                                reg = <4>;
                                label = "to_sw3";
                                phy-mode = "2500base-x";

                                fixed-link {
                                        speed = <2500>;
                                        full-duplex;
                                };
                        };

                        port@5 {
                                reg = <5>;
                                label = "trx7";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port5_base_t1_phy>;
                        };

                        port@6 {
                                reg = <6>;
                                label = "trx8";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port6_base_t1_phy>;
                        };

                        port@7 {
                                reg = <7>;
                                label = "trx9";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port7_base_t1_phy>;
                        };

                        port@8 {
                                reg = <8>;
                                label = "trx10";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port8_base_t1_phy>;
                        };

                        port@9 {
                                reg = <9>;
                                label = "trx11";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port9_base_t1_phy>;
                        };

                        port@a {
                                reg = <10>;
                                label = "trx12";
                                phy-mode = "internal";
                                phy-handle = <&sw2_port10_base_t1_phy>;
                        };
                };
        };
};

Basically it is a single DSA tree with 2 "ethernet" properties, i.e. a
multi-CPU-port system. There is also a DSA link between the switches,
but it is not a daisy chain topology, i.e. there is no "upstream" and
"downstream" switch, the DSA link is only to be used for the bridge data
plane (autonomous forwarding between switches, between the RJ-45 ports
and the automotive Ethernet ports), otherwise all traffic that should
reach the host should do so through the dedicated CPU port of the switch.

Of course, plain forwarding in this topology is bound to create packet
loops. I have thought long and hard about strategies to cut forwarding
in such a way as to prevent loops but also not impede normal operation
of the network on such a system, and I believe I have found a solution
that does work as expected. This relies heavily on DSA's recent ability
to perform RX filtering towards the host by installing MAC addresses as
static FDB entries. Since we have 2 distinct DSA masters, we have 2
distinct MAC addresses, and if the bridge is configured to have its own
MAC address that makes it 3 distinct MAC addresses. The bridge core,
plus the switchdev_handle_fdb_add_to_device() extension, handle each MAC
address by replicating it to each port of the DSA switch tree. So the
end result is that both switch 1 and switch 2 will have static FDB
entries towards their respective CPU ports for the 3 MAC addresses
corresponding to the DSA masters and to the bridge net device (and of
course, towards any station learned on a foreign interface).

So I think the basic design works, and it is basically just as fragile
as any other multi-CPU-port system is bound to be in terms of reliance
on static FDB entries towards the host (if hardware address learning on
the CPU port is to be used, MAC addresses would randomly bounce between
one CPU port and the other otherwise). In fact, I think it is even
better to start DSA's support of multi-CPU-port systems with something
small like the NXP Bluebox 3, because we allow some time for the code
paths like dsa_switch_host_address_match(), which were specifically
designed for it, to break in, and this board needs no user space
configuration of CPU ports, like static assignments between user and CPU
ports, or bonding between the CPU ports/DSA masters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1f52247e

net: dsa: sja1105: enable address learning on cascade ports · 81d45898

Vladimir Oltean authored Aug 04, 2021

Right now, address learning is disabled on DSA ports, which means that a
packet received over a DSA port from a cross-chip switch will be flooded
to unrelated ports.

It is desirable to eliminate that, but for that we need a breakdown of
the possibilities for the sja1105 driver. A DSA port can be:

- a downstream-facing cascade port. This is simple because it will
  always receive packets from a downstream switch, and there should be
  no other route to reach that downstream switch in the first place,
  which means it should be safe to learn that MAC address towards that
  switch.

- an upstream-facing cascade port. This receives packets either:
  * autonomously forwarded by an upstream switch (and therefore these
    packets belong to the data plane of a bridge, so address learning
    should be ok), or
  * injected from the CPU. This deserves further discussion, as normally,
    an upstream-facing cascade port is no different than the CPU port
    itself. But with "H" topologies (a DSA link towards a switch that
    has its own CPU port), these are more "laterally-facing" cascade
    ports than they are "upstream-facing". Here, there is a risk that
    the port might learn the host addresses on the wrong port (on the
    DSA port instead of on its own CPU port), but this is solved by
    DSA's RX filtering infrastructure, which installs the host addresses
    as static FDB entries on the CPU port of all switches in a "H" tree.
    So even if there will be an attempt from the switch to migrate the
    FDB entry from the CPU port to the laterally-facing cascade port, it
    will fail to do that, because the FDB entry that already exists is
    static and cannot migrate. So address learning should be safe for
    this configuration too.

Ok, so what about other MAC addresses coming from the host, not
necessarily the bridge local FDB entries? What about MAC addresses
dynamically learned on foreign interfaces, isn't there a risk that
cascade ports will learn these entries dynamically when they are
supposed to be delivered towards the CPU port? Well, that is correct,
and this is why we also need to enable the assisted learning feature, to
snoop for these addresses and write them to hardware as static FDB
entries towards the CPU, to make the switch's learning process on the
cascade ports ineffective for them. With assisted learning enabled, the
hardware learning on the CPU port must be disabled.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

81d45898

net: dsa: sja1105: suppress TX packets from looping back in "H" topologies · 0f9b762c

Vladimir Oltean authored Aug 04, 2021

H topologies like this one have a problem:

         eth0                                                     eth1
          |                                                        |
       CPU port                                                CPU port
          |                        DSA link                        |
 sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
   |             |      |                            |      |             |
 user          user   user                         user   user          user
 port          port   port                         port   port          port

Basically any packet sent by the eth0 DSA master can be flooded on the
interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
eth1 DSA master too. Basically we are talking to ourselves.

In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
which _can_ be decoded by the tagger running on eth1, so it will attempt
to reinject that packet into the network stack (the bridge, if there is
any port under eth1 that is under a bridge). In the case where the ports
under eth1 are under the same cross-chip bridge as the ports under eth0,
the TX packets will even be learned as RX packets. The only thing that
will prevent loops with the software bridging path, and therefore
disaster, is that the source port and the destination port are in the
same hardware domain, and the bridge will receive packets from the
driver with skb->offload_fwd_mark = true and will not forward between
the two.

The proper solution to this problem is to detect H topologies and
enforce that all packets are received through the local switch and we do
not attempt to receive packets on our CPU port from switches that have
their own. This is a viable solution which works thanks to the fact that
MAC addresses which should be filtered towards the host are installed by
DSA as static MAC addresses towards the CPU port of each switch.

TX from a CPU port towards the DSA port continues to be allowed, this is
because sja1105 supports bridge TX forwarding offload, and the skb->dev
used initially for xmit does not have any direct correlation with where
the station that will respond to that packet is connected. It may very
well happen that when we send a ping through a br0 interface that spans
all switch ports, the xmit packet will exit the system through a DSA
switch interface under eth1 (say sw1p2), but the destination station is
connected to a switch port under eth0, like sw0p0. So the switch under
eth1 needs to communicate on TX with the switch under eth0. The
response, however, will not follow the same path, but instead, this
patch enforces that the response is sent by the first switch directly to
its DSA master which is eth0.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f9b762c

net: dsa: sja1105: increase MTU to account for VLAN header on DSA ports · 777e55e3

Vladimir Oltean authored Aug 04, 2021

Since all packets are transmitted as VLAN-tagged over a DSA link (this
VLAN tag represents the tag_8021q header), we need to increase the MTU
of these interfaces to account for the possibility that we are already
transporting a user-visible VLAN header.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

777e55e3

net: dsa: sja1105: manage VLANs on cascade ports · c5130029

Vladimir Oltean authored Aug 04, 2021

Since commit ed040abc ("net: dsa: sja1105: use 4095 as the private
VLAN for untagged traffic"), this driver uses a reserved value as pvid
for the host port (DSA CPU port). Control packets which are sent as
untagged get classified to this VLAN, and all ports are members of it
(this is to be expected for control packets).

Manage all cascade ports in the same way and allow control packets to
egress everywhere.

Also, all VLANs need to be sent as egress-tagged on all cascade ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5130029

net: dsa: sja1105: manage the forwarding domain towards DSA ports · 3fa21270

Vladimir Oltean authored Aug 04, 2021

Manage DSA links towards other switches, be they host ports or cascade
ports, the same as the CPU port, i.e. allow forwarding and flooding
unconditionally from all user ports.

We send packets as always VLAN-tagged on a DSA port, and we rely on the
cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
port only on the proper remote ports of another switch (the ports that
are in the same bridging domain). So if there is no cross-chip bridging
in the system, the flooded packets will be sent on the DSA ports too,
but they will be dropped by the remote switches due to either
(a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
(b) a lack of valid destinations for those packets, due to a lack of the
    RX VLAN on the user ports of the switch

Note that switches which only transport packets in a cross-chip bridge,
but have no user ports of their own as part of that bridge, such as
switch 1 in this case:

                    DSA link                   DSA link
  sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3

ip link set sw0p0 master br0
ip link set sw2p3 master br0

will still work, because the tag_8021q cross-chip notifiers keep the RX
VLANs installed on all DSA ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fa21270

net: dsa: sja1105: configure the cascade ports based on topology · 30a100e6

Vladimir Oltean authored Aug 04, 2021

The sja1105 switch family has a feature called "cascade ports" which can
be used in topologies where multiple SJA1105/SJA1110 switches are daisy
chained. Upstream switches set this bit for the DSA link towards the
downstream switches. This is used when the upstream switch receives a
control packet (PTP, STP) from a downstream switch, because if the
source port for a control packet is marked as a cascade port, then the
source port, switch ID and RX timestamp will not be taken again on the
upstream switch, it is assumed that this has already been done by the
downstream switch (the leaf port in the tree) and that the CPU has
everything it needs to decode the information from this packet.

We need to distinguish between an upstream-facing DSA link and a
downstream-facing DSA link, because the upstream-facing DSA links are
"host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
DSA links are "cascade ports".

Note that SJA1105 supports a single cascade port, so only daisy chain
topologies work. With SJA1110, there can be more complex topologies such
as:

                    eth0
                     |
                 host port
                     |
 sw0p0    sw0p1    sw0p2    sw0p3    sw0p4
   |        |                 |        |
 cascade  cascade            user     user
  port     port              port     port
   |        |
   |        |
   |        |
   |       host
   |       port
   |        |
   |      sw1p0    sw1p1    sw1p2    sw1p3    sw1p4
   |                 |        |        |        |
   |                user     user     user     user
  host              port     port     port     port
  port
   |
 sw2p0    sw2p1    sw2p2    sw2p3    sw2p4
            |        |        |        |
           user     user     user     user
           port     port     port     port
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30a100e6

net: dsa: give preference to local CPU ports · 2c0b0325

Vladimir Oltean authored Aug 04, 2021

Be there an "H" switch topology, where there are 2 switches connected as
follows:

         eth0                                                     eth1
          |                                                        |
       CPU port                                                CPU port
          |                        DSA link                        |
 sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
   |             |      |                            |      |             |
 user          user   user                         user   user          user
 port          port   port                         port   port          port

basically one where each switch has its own CPU port for termination,
but there is also a DSA link in case packets need to be forwarded in
hardware between one switch and another.

DSA insists to see this as a daisy chain topology, basically registering
all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding
eth1 as a valid DSA master.

This is only half the story, since when asked using dsa_port_is_cpu(),
DSA will respond that sw1p1 is a CPU port, however one which has no
dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used.

Furthermore, be there a driver for switches which support only one
upstream port. This driver iterates through its ports and checks using
dsa_is_upstream_port() whether the current port is an upstream one.
For switch 1, two ports pass the "is upstream port" checks:

- sw1p4 is an upstream port because it is a routing port towards the
  dedicated CPU port assigned using dsa_tree_setup_default_cpu()

- sw1p1 is also an upstream port because it is a CPU port, albeit one
  that is disabled. This is because dsa_upstream_port() returns:

	if (!cpu_dp)
		return port;

  which means that if @dp does not have a ->cpu_dp pointer (which is a
  characteristic of CPU ports themselves as well as unused ports), then
  @dp is its own upstream port.

So the driver for switch 1 rightfully says: I have two upstream ports,
but I don't support multiple upstream ports! So let me error out, I
don't know which one to choose and what to do with the other one.

Generally I am against enforcing any default policy in the kernel in
terms of user to CPU port assignment (like round robin or such) but this
case is different. To solve the conundrum, one would have to:

- Disable sw1p1 in the device tree or mark it as "not a CPU port" in
  order to comply with DSA's view of this topology as a daisy chain,
  where the termination traffic from switch 1 must pass through switch 0.
  This is counter-productive because it wastes 1Gbps of termination
  throughput in switch 1.
- Disable the DSA link between sw0p4 and sw1p4 and do software
  forwarding between switch 0 and 1, and basically treat the switches as
  part of disjoint switch trees. This is counter-productive because it
  wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1.
- Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could
  work, but it makes cross-chip bridging impossible. In this setup we
  would need to have 2 separate bridges, br0 spanning the ports of
  switch 0, and br1 spanning the ports of switch 1, and the "DSA links
  treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are
  the gateway ports between one bridge and another. This is hard to
  manage from a user's perspective, who wants to have a unified view of
  the switching fabric and the ability to transparently add ports to the
  same bridge. VLANs would also need to be explicitly managed by the
  user on these gateway ports.

So it seems that the only reasonable thing to do is to make DSA prefer
CPU ports that are local to the switch. Meaning that by default, the
user and DSA ports of switch 0 will get assigned to the CPU port from
switch 0 (sw0p1) and the user and DSA ports of switch 1 will get
assigned to the CPU port from switch 1.

The way this solves the problem is that sw1p4 is no longer an upstream
port as far as switch 1 is concerned (it no longer views sw0p1 as its
dedicated CPU port).

So here we are, the first multi-CPU port that DSA supports is also
perhaps the most uneventful one: the individual switches don't support
multiple CPUs, however the DSA switch tree as a whole does have multiple
CPU ports. No user space assignment of user ports to CPU ports is
desirable, necessary, or possible.

Ports that do not have a local CPU port (say there was an extra switch
hanging off of sw0p0) default to the standard implementation of getting
assigned to the first CPU port of the DSA switch tree. Is that good
enough? Probably not (if the downstream switch was hanging off of switch
1, we would most certainly prefer its CPU port to be sw1p1), but in
order to support that use case too, we would need to traverse the
dst->rtable in search of an optimum dedicated CPU port, one that has the
smallest number of hops between dp->ds and dp->cpu_dp->ds. At the
moment, the DSA routing table structure does not keep the number of hops
between dl->dp and dl->link_dp, and while it is probably deducible,
there is zero justification to write that code now. Let's hope DSA will
never have to support that use case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c0b0325

net: dsa: rename teardown_default_cpu to teardown_cpu_ports · 0e8eb9a1

Vladimir Oltean authored Aug 04, 2021

There is nothing specific to having a default CPU port to what
dsa_tree_teardown_default_cpu() does. Even with multiple CPU ports,
it would do the same thing: iterate through the ports of this switch
tree and reset the ->cpu_dp pointer to NULL. So rename it accordingly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e8eb9a1

net: ipa: fix IPA v4.9 interconnects · 0fd75f57

Alex Elder authored Aug 04, 2021

Three interconnects are defined for IPA version 4.9, but there
should only be two.  They should also use names that match what's
used for other platforms (and specified in the Device Tree binding).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fd75f57

mctp: remove duplicated assignment of pointer hdr · df7ba0eb

Colin Ian King authored Aug 04, 2021

The pointer hdr is being initialized and also re-assigned with the
same value from the call to function mctp_hdr. Static analysis reports
that the initializated value is unused. The second assignment is
duplicated and can be removed.

Addresses-Coverity: ("Unused value").
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df7ba0eb

04 Aug, 2021 10 commits

net: Replace deprecated CPU-hotplug functions. · 372bbdd5

Sebastian Andrzej Siewior authored Aug 03, 2021

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

372bbdd5

virtio_net: Replace deprecated CPU-hotplug functions. · a0d1d0f4

Sebastian Andrzej Siewior authored Aug 03, 2021

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a0d1d0f4

pktgen: Remove redundant clone_skb override · c2eecaa1

Nick Richardson authored Aug 03, 2021

When the netif_receive xmit_mode is set, a line is supposed to set
clone_skb to a default 0 value. This line is made redundant due to a
preceding line that checks if clone_skb is more than zero and returns
-ENOTSUPP.

Overriding clone_skb to 0 does not make any difference to the behavior
because if it was positive we return error. So it can be either 0 or
negative, and in both cases the behavior is the same.

Remove redundant line that sets clone_skb to zero.
Signed-off-by: Nick Richardson <richardsonnick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2eecaa1

ptp: ocp: Expose various resources on the timecard. · 773bda96

Jonathan Lemon authored Aug 03, 2021

The OpenCompute timecard driver has additional functionality besides
a clock.  Make the following resources available:

 - The external timestamp channels (ts0/ts1)
 - devlink support for flashing and health reporting
 - GPS and MAC serial ports
 - board serial number (obtained from i2c device)

Also add watchdog functionality for when GNSS goes into holdover.

The resources are collected under a timecard class directory:

  [jlemon@timecard ~]$ ls -g /sys/class/timecard/ocp1/
  total 0
  -r--r--r--. 1 root 4096 Aug  3 19:49 available_clock_sources
  -rw-r--r--. 1 root 4096 Aug  3 19:49 clock_source
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 device -> ../../../0000:04:00.0/
  -r--r--r--. 1 root 4096 Aug  3 19:49 gps_sync
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 i2c -> ../../xiic-i2c.1024/i2c-2/
  drwxr-xr-x. 2 root    0 Aug  3 19:49 power/
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 pps ->
  ../../../../../virtual/pps/pps1/
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 ptp -> ../../ptp/ptp2/
  -r--r--r--. 1 root 4096 Aug  3 19:49 serialnum
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 subsystem ->
  ../../../../../../class/timecard/
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 ttyGPS -> ../../tty/ttyS7/
  lrwxrwxrwx. 1 root    0 Aug  3 19:49 ttyMAC -> ../../tty/ttyS8/
  -rw-r--r--. 1 root 4096 Aug  3 19:39 uevent

The labeling is needed at the minimum, in order to tell the serial
devices apart.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

773bda96

sock: allow reading and changing sk_userlocks with setsockopt · 04190bf8

Pavel Tikhomirov authored Aug 04, 2021

SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
tcp_sndbuf_expand()). If we've just created a new socket this adjustment
is enabled on it, but if one changes the socket buffer size by
setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.

CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
restore as it first needs to increase buffer sizes for packet queues
restore and second it needs to restore back original buffer sizes. So
after CRIU restore all sockets become non-auto-adjustable, which can
decrease network performance of restored applications significantly.

CRIU need to be able to restore sockets with enabled/disabled adjustment
to the same state it was before dump, so let's add special setsockopt
for it.

Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
that using these interface one can reenable automatic socket buffer
adjustment on their sockets.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04190bf8

tc-testing: Add control-plane selftests for sch_mq · 625af9f0

Peilin Ye authored Aug 03, 2021

Recently we added multi-queue support to netdevsim in commit d4861fc6
("netdevsim: Add multi-queue support"); add a few control-plane selftests
for sch_mq using this new feature.

Use nsPlugin.py to avoid network interface name collisions.
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

625af9f0

Revert "net: build all switchdev drivers as modules when the bridge is a module" · a54182b2

Vladimir Oltean authored Aug 03, 2021

This reverts commit b0e81817. Explicit
driver dependency on the bridge is no longer needed since
switchdev_bridge_port_{,un}offload() is no longer implemented by the
bridge driver but by switchdev.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a54182b2

net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge · 957e2235

Vladimir Oltean authored Aug 03, 2021

With the introduction of explicit offloading API in switchdev in commit
2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.

This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.

There was an attempt to solve the dependency issue in the form of commit
b0e81817 ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:

| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing. But is this enough? Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.

In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.

Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/

but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.

So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.

Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).

There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.

The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.

This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
functions called br_switchdev_port_{,un}offload. These functions
contain in fact the core of the logic that was previously in
switchdev_bridge_port_{,un}offload, just that now we go through an
extra indirection layer to reach them.

Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

957e2235

Merge tag 'linux-can-next-for-5.15-20210804' of... · 9c0532f9

David S. Miller authored Aug 04, 2021

Merge tag 'linux-can-next-for-5.15-20210804' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2021-08-04

this is a pull request of 5 patches for net-next/master.

The first patch is by me and fixes a typo in a comment in the CAN
J1939 protocol.

The next 2 patches are by Oleksij Rempel and update the CAN J1939
protocol to send RX status updates via the error queue mechanism.

The next patch is by me and adds a missing variable initialization to
the flexcan driver (the problem was introduced in the current net-next
cycle).

The last patch is by Aswath Govindraju and adds power-domains to the
Bosch m_can DT binding documentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9c0532f9

dt-bindings: net: can: Document power-domains property · d85165b2

Aswath Govindraju authored Aug 02, 2021

Document power-domains property for adding the Power domain provider.

Link: https://lore.kernel.org/r/20210802091822.16407-1-a-govindraju@ti.comSigned-off-by: Aswath Govindraju <a-govindraju@ti.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

d85165b2