- 17 Feb, 2022 36 commits
-
-
Vladimir Oltean authored
OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN overlaps with OCELOT_VCAP_IS2_MRP_REDIRECT. To avoid this, make OCELOT_VCAP_IS2_MRP_REDIRECT take the cookie region from N to 2 * N - 1 (where N is ocelot->num_phys_ports). To avoid any risk that the singleton (not per port) VCAP IS2 filters overlap with per-port VCAP IS2 filters, we must ensure that the number of singleton filters is smaller than the number of physical ports. This is true right now, but may change in the future as switches with less ports get supported, or more singleton filters get added. So to be future-proof, let's move the singleton filters at the end of the range, where they won't overlap with anything to their right. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The MRP assist code installs a VCAP IS2 trapping rule for each port, but since the key and the action is the same, just the ingress port mask differs, there isn't any need to do this. We can save some space in the TCAM by using a single filter and adjusting the ingress port mask. Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this purpose. Now that the cookies are no longer per port, we need to change the allocation scheme such that MRP traps use a fixed number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
MRP frames are configured to be trapped to the CPU queue 7, and this number is reflected in the extraction header. However, the information isn't used anywhere, so just leave MRP frames to go to CPU queue 0 unless needed otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP, PTP traps) has hardcoded filter identifiers that worked well enough for that use case alone. But when two or more of those use cases would be used together, some of those identifiers would overlap, leading to breakage. Add definitions for each cookie and centralize them in ocelot_vcap.h, such that the overlaps are more obvious. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The driver uses an identifier equal to (ocelot->num_phys_ports + port) for MRP traps installed when the system is in the role of an MRC, and an identifier equal to (port) otherwise. Use the same identifier in both cases as a consolidation for the various cookie values spread throughout the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2022-02-16 Misc updates for mlx5: 1) Alex Liu Adds support for using xdp->data_meta 2) Aya Levin Adds PTP counters and port time stamp mode for representors and switchdev mode. 3) Tariq Toukan, Striding RQ simple improvements. 4) Roi Dayan (7): Create multiple attr instances per flow Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr instance after each multi table action and create a post action rule for it as a generic parsing step. Now multi table actions like CT, sample don't require to do it. When flow has multiple attr instances, the first flow attr is being offloaded normally and linked to the next attr (post action rule) by setting an id on reg_c for matching. Post action rule (rule created from second attr instance) match the id on reg_c and does rest of the actions. Example rule with actions CT,goto will be created with 2 attr instances as following: attr1(CT)->attr2(goto) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roi Dayan authored
Allow sample+CT actions but still block sample+CT NAT as it is not supported. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Before this commit post_act can be used for normal rules and didn't handle special cases like CT and sample. With this commit post_act rule can also handle the special cases when needed. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
When tc actions being parsed only the last flow attr created needs the counter flag and the previous flags being reset. Clean the flag from the tc action parsers. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
CT and sample actions use post actions for their implementation. Flag those actions as multi table actions so the post act infrastructure will handle the post actions allocation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr after each multi table action and create a post action rule for it. First flow attr being offloaded normally and linked to the next attr (post action rule) with setting an id on reg_c. Post action rules match the id on reg_c and continue to the next one. The flow counter is allocated on the last rule. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Introduce mlx5e_tc_post_act_offload() and mlx5e_tc_post_act_unoffload() to be able to unoffload and reoffload existing post action rules handles. For example in neigh update events, the driver removes and readds rules in hardware. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Currently the mlx5_flow object contains a single mlx5_attr instance. However, multi table actions (e.g. CT) instantiate multiple attr instances. Currently action_match_supported() reads the actions flag from the flow's attribute instance. Modify the function to receive the action flags as a parameter which is set by the calling function and pass the aggregated actions to actions_match_supported(). Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
To allow shared tc block offload between two or more reps of the same eswitch, move the tc flow hashtable to be per rep, instead of per eswitch. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
When turning on tx_port_ts (private flag) a PTP-SQ is created. Consider this queue when adding rules matching SQs to VPORTs. Otherwise the traffic on this queue won't reach the wire. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
There is a configuration where the uplink interface is the synchronizer. Add PTP counters for this interface for monitoring. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
In RQs of type multi-packet WQE (Striding RQ), each WQE is relatively large (typically 256KB) but their number is relatively small (8 in default). Re-mapping the descriptors' buffers before re-posting them is done via UMR (User-Mode Memory Registration) operations. On the one hand, posting UMR WQEs in bulks reduces communication overhead with the HW and better utilizes its processing units. On the other hand, delaying the WQE repost operations for a small RQ (say, of 4 WQEs) might drastically hit its performance, causing packet drops due to no receive buffer, for high or bursty incoming packets rate. Here we restrict the bulk size for too small RQs. Effectively, with the current constants, RQ of size 4 (minimum allowed) would have no bulking, while larger RQs will continue working with bulks of 2. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
CQE compression is turned on by default on slow pci systems to help reduce the load on pci. In this case, Striding RQ was turned off as CQEs of packets that span several strides were not compressed, significantly reducing the compression effectiveness. This issue does not exist when using the newer mini_cqe format "stride_index". Hence, allow defaulting to Striding RQ in this case. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Tariq Toukan authored
Update the old error message for LRO state modify with the new general name. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Alex Liu authored
Add support for using xdp->data_meta for cross-program communication Pass "true" to the last argument of xdp_prepare_buff(). After SKB is built, call skb_metadata_set() if metadata was pushed. Signed-off-by: Alex Liu <liualex@fb.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Colin Ian King authored
There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message. Fix it. Fixes: 3b49a7ed ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Petr Machata authored
Both get and dump handlers for RTM_GETSTATS require that a filter_mask, a mask of which attributes should be emitted in the netlink response, is unset. rtnl_stats_dump() does include an extack in the bounce, rtnl_stats_get() however does not. Fix the omission. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/01feb1f4bbd22a19f6629503c4f366aed6424567.1645020876.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: SO_SNDTIMEO and misc. cleanup Patch 1 adds support for the SO_SNDTIMEO socket option on MPTCP sockets. The remaining patches are various small cleanups: Patch 2 removes an obsolete declaration. Patches 3 and 5 remove unnecessary function parameters. Patch 4 removes an extra cast. Patches 6 and 7 add some const and ro_after_init modifiers. Patch 8 removes extra storage of TCP helpers. ==================== Link: https://lore.kernel.org/r/20220216021130.171786-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Assign the helpers directly rather than save/restore in the context structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
These structures are initialised from the init hooks, so we can't make them 'const'. But no writes occur afterwards, so we can use ro_after_init. Also, remove bogus EXPORT_SYMBOL, the only access comes from ip stack, not from kernel modules. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
A few pm-related helpers don't touch arguments which lacking the const modifier, let's constify them. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
Drop the port parameter of mptcp_pm_add_addr_signal() and reflect it to avoid passing too many parameters. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
Drop the unneeded type casts to 'unsigned long long' for printing out the hmac values in add_addr_hmac_valid() and subflow_thmac_valid(). Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
The parameter 'sk' became useless since the code using it was dropped from mptcp_get_options() in the commit 8d548ea1 ("mptcp: do not set unconditionally csum_reqd on incoming opt"). Let's drop it. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts authored
Options parsing in now done from mptcp_incoming_options(). mptcp_parse_option() has been removed from mptcp.h when CONFIG_MPTCP is defined but not when it is not. Fixes: cfde141e ("mptcp: move option parsing into mptcp_incoming_options()") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
Add setsockopt support for SO_SNDTIMEO_OLD and SO_SNDTIMEO_NEW to fix this error reported by the mptcp bpf selftest: (network_helpers.c:64: errno: Operation not supported) Failed to set SO_SNDTIMEO test_mptcp:FAIL:115 All error logs: (network_helpers.c:64: errno: Operation not supported) Failed to set SO_SNDTIMEO test_mptcp:FAIL:115 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Li authored
The return from the call to dm9051_get_regs() is int, it can be a negative error code, however this is being assigned to an unsigned int variable 'ret', so making 'ret' an int. Eliminate the following coccicheck warning: ./drivers/net/ethernet/davicom/dm9051.c:527:5-8: WARNING: Unsigned expression compared with zero: ret < 0 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220216014507.109117-1-yang.lee@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
__skb_vlan_pop() needs skb->data to point at the mac_header, while skb_vlan_tag_present() and skb_vlan_tag_get() don't, because they don't look at skb->data at all. So we can avoid uselessly moving around skb->data for the case where the VLAN tag was offloaded by the DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220215204722.2134816-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tom Rix authored
Replacements: queueing to queuing trasfer to transfer aditional to additional adaptor to adapter transactino to transaction Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220215213802.3043178-1-trix@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
D. Wythe authored
When smc_connect_clc() times out, it will return -EAGAIN(tcp_recvmsg retuns -EAGAIN while timeout), then this value will passed to the application, which is quite confusing to the applications, makes inconsistency with TCP. From the manual of connect, ETIMEDOUT is more suitable, and this patch try convert EAGAIN to ETIMEDOUT in that case. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1644913490-21594-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
YueHaibing authored
This is unused since commit 8e2288ca ("net: hns3: refactor PF cmdq init and uninit APIs with new common APIs"). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Jie Wang <wangjie125@huawei.com> Link: https://lore.kernel.org/r/20220216113507.22368-1-yuehaibing@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 16 Feb, 2022 4 commits
-
-
David S. Miller authored
Vladimir Oltean says: ==================== Replay and offload host VLAN entries in DSA v2->v3: - make the bridge stop notifying switchdev for !BRENTRY VLANs - create precommit and commit wrappers around __vlan_add_flags(). - special-case the BRENTRY transition from false to true, instead of treating it as a change of flags and letting drivers figure out that it really isn't. - avoid setting *changed unless we know that functions will not error out later. - drop "old_flags" from struct switchdev_obj_port_vlan, nobody needs it now, in v2 only DSA needed it to filter out BRENTRY transitions, that is now solved cleaner. - no BRIDGE_VLAN_INFO_BRENTRY flag checks and manipulations in DSA whatsoever, use the "bool changed" bit as-is after changing what it means. - merge dsa_slave_host_vlan_{add,del}() with dsa_slave_foreign_vlan_{add,del}(), since now they do the same thing, because the host_vlan functions no longer need to mangle the vlan BRENTRY flags and bool changed. v1->v2: - prune switchdev VLAN additions with no actual change differently - no longer need to revert struct net_bridge_vlan changes on error from switchdev - no longer need to first delete a changed VLAN before readding it - pass 'bool changed' and 'u16 old_flags' through switchdev_obj_port_vlan so that DSA can do some additional post-processing with the BRIDGE_VLAN_INFO_BRENTRY flag - support VLANs on foreign interfaces - fix the same -EOPNOTSUPP error in mv88e6xxx, this time on removal, due to VLAN deletion getting replayed earlier than FDB deletion The motivation behind these patches is that Rafael reported the following error with mv88e6xxx when the first switch port joins a bridge: mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95 (-EOPNOTSUPP) The FDB entry that's added is the MAC address of the bridge, in VID 1 (the default_pvid), being replayed as part of br_add_if() -> ... -> nbp_switchdev_sync_objs(). -EOPNOTSUPP is the mv88e6xxx driver's way of saying that VID 1 doesn't exist in the VTU, so it can't program the ATU with a FID, something which it needs. It appears to be a race, but it isn't, since we only end up installing VID 1 in the VTU by coincidence. DSA's approximation of programming VLANs on the CPU port together with the user ports breaks down with host FDB entries on mv88e6xxx, since that strictly requires the VTU to contain the VID. But the user may freely add VLANs pointing just towards the bridge, and FDB entries in those VLANs, and DSA will not be aware of them, because it only listens for VLANs on user ports. To create a solution that scales properly to cross-chip setups and doesn't leak entries behind, some changes in the bridge driver are required. I believe that these are for the better overall, but I may be wrong. Namely, the same refcounting procedure that DSA has in place for host FDB and MDB entries can be replicated for VLANs, except that it's garbage in, garbage out: the VLAN addition and removal notifications from switchdev aren't balanced. So the first 2 patches attempt to deal with that. This patch set has been superficially tested on a board with 3 mv88e6xxx switches in a daisy chain and appears to produce the primary desired effect - the driver no longer returns -EOPNOTSUPP when the first port joins a bridge, and is successful in performing local termination under a VLAN-aware bridge. As an additional side effect, it silences the annoying "p%d: already a member of VLAN %d\n" warning messages that the mv88e6xxx driver produces when coupled with systemd-networkd, and a few VLANs are configured. Furthermore, it advances Florian's idea from a few years back, which never got merged: https://lore.kernel.org/lkml/20180624153339.13572-1-f.fainelli@gmail.com/ v2 has also been tested on the NXP LS1028A felix switch. Some testing: root@debian:~# bridge vlan add dev br0 vid 101 pvid self [ 100.709220] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_add: port 9 vlan 101 [ 100.873426] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_add: port 10 vlan 101 [ 100.892314] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_add: port 9 vlan 101 [ 101.053392] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_add: port 10 vlan 101 [ 101.076994] mv88e6085 d0032004.mdio-mii:12: mv88e6xxx_port_vlan_add: port 9 vlan 101 root@debian:~# bridge vlan add dev br0 vid 101 pvid self root@debian:~# bridge vlan add dev br0 vid 101 pvid self root@debian:~# bridge vlan port vlan-id eth0 1 PVID Egress Untagged lan9 1 PVID Egress Untagged lan10 1 PVID Egress Untagged lan11 1 PVID Egress Untagged lan12 1 PVID Egress Untagged lan13 1 PVID Egress Untagged lan14 1 PVID Egress Untagged lan15 1 PVID Egress Untagged lan16 1 PVID Egress Untagged lan17 1 PVID Egress Untagged lan18 1 PVID Egress Untagged lan19 1 PVID Egress Untagged lan20 1 PVID Egress Untagged lan21 1 PVID Egress Untagged lan22 1 PVID Egress Untagged lan23 1 PVID Egress Untagged lan24 1 PVID Egress Untagged sfp 1 PVID Egress Untagged lan1 1 PVID Egress Untagged lan2 1 PVID Egress Untagged lan3 1 PVID Egress Untagged lan4 1 PVID Egress Untagged lan5 1 PVID Egress Untagged lan6 1 PVID Egress Untagged lan7 1 PVID Egress Untagged lan8 1 PVID Egress Untagged br0 1 Egress Untagged 101 PVID root@debian:~# bridge vlan del dev br0 vid 101 pvid self [ 108.340487] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_del: port 9 vlan 101 [ 108.379167] mv88e6085 d0032004.mdio-mii:11: mv88e6xxx_port_vlan_del: port 10 vlan 101 [ 108.402319] mv88e6085 d0032004.mdio-mii:12: mv88e6xxx_port_vlan_del: port 9 vlan 101 [ 108.425866] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_del: port 9 vlan 101 [ 108.452280] mv88e6085 d0032004.mdio-mii:10: mv88e6xxx_port_vlan_del: port 10 vlan 101 root@debian:~# bridge vlan del dev br0 vid 101 pvid self root@debian:~# bridge vlan del dev br0 vid 101 pvid self root@debian:~# bridge vlan port vlan-id eth0 1 PVID Egress Untagged lan9 1 PVID Egress Untagged lan10 1 PVID Egress Untagged lan11 1 PVID Egress Untagged lan12 1 PVID Egress Untagged lan13 1 PVID Egress Untagged lan14 1 PVID Egress Untagged lan15 1 PVID Egress Untagged lan16 1 PVID Egress Untagged lan17 1 PVID Egress Untagged lan18 1 PVID Egress Untagged lan19 1 PVID Egress Untagged lan20 1 PVID Egress Untagged lan21 1 PVID Egress Untagged lan22 1 PVID Egress Untagged lan23 1 PVID Egress Untagged lan24 1 PVID Egress Untagged sfp 1 PVID Egress Untagged lan1 1 PVID Egress Untagged lan2 1 PVID Egress Untagged lan3 1 PVID Egress Untagged lan4 1 PVID Egress Untagged lan5 1 PVID Egress Untagged lan6 1 PVID Egress Untagged lan7 1 PVID Egress Untagged lan8 1 PVID Egress Untagged br0 1 Egress Untagged root@debian:~# bridge vlan del dev br0 vid 101 pvid self ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
DSA now explicitly handles VLANs installed with the 'self' flag on the bridge as host VLANs, instead of just replicating every bridge port VLAN also on the CPU port and never deleting it, which is what it did before. However, this leaves a corner case uncovered, as explained by Tobias Waldekranz: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 Forwarding towards a bridge port VLAN installed on a bridge port foreign to DSA (separate NIC, Wi-Fi AP) used to work by virtue of the fact that DSA itself needed to have at least one port in that VLAN (therefore, it also had the CPU port in said VLAN). However, now that the CPU port may not be member of all VLANs that user ports are members of, we need to ensure this isn't the case if software forwarding to a foreign interface is required. The solution is to treat bridge port VLANs on standalone interfaces in the exact same way as host VLANs. From DSA's perspective, there is no difference between local termination and software forwarding; packets in that VLAN must reach the CPU in both cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it does so on user ports. This is good for basic functionality but has several limitations: - the VLAN group which must reach the CPU may be radically different from the VLAN group that must be autonomously forwarded by the switch. In other words, the admin may want to isolate noisy stations and avoid traffic from them going to the control processor of the switch, where it would just waste useless cycles. The bridge already supports independent control of VLAN groups on bridge ports and on the bridge itself, and when VLAN-aware, it will drop packets in software anyway if their VID isn't added as a 'self' entry towards the bridge device. - Replaying host FDB entries may depend, for some drivers like mv88e6xxx, on replaying the host VLANs as well. The 2 VLAN groups are approximately the same in most regular cases, but there are corner cases when timing matters, and DSA's approximation of replicating VLANs on shared ports simply does not work. - If a user makes the bridge (implicitly the CPU port) join a VLAN by accident, there is no way for the CPU port to isolate itself from that noisy VLAN except by rebooting the system. This is because for each VLAN added on a user port, DSA will add it on shared ports too, but for each VLAN deletion on a user port, it will remain installed on shared ports, since DSA has no good indication of whether the VLAN is still in use or not. Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN addition and removal events, DSA has a simple and straightforward task of separating the bridge port VLANs (these have an orig_dev which is a DSA slave interface, or a LAG interface) from the host VLANs (these have an orig_dev which is a bridge interface), and to keep a simple reference count of each VID on each shared port. Forwarding VLANs must be installed on the bridge ports and on all DSA ports interconnecting them. We don't have a good view of the exact topology, so we simply install forwarding VLANs on all DSA ports, which is what has been done until now. Host VLANs must be installed primarily on the dedicated CPU port of each bridge port. More subtly, they must also be installed on upstream-facing and downstream-facing DSA ports that are connecting the bridge ports and the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB entry may be absent from VTU) is still addressed even if that switch is in a cross-chip setup, and it has no local CPU port. Therefore: - user ports contain only bridge port (forwarding) VLANs, and no refcounting is necessary - DSA ports contain both forwarding and host VLANs. Refcounting is necessary among these 2 types. - CPU ports contain only host VLANs. Refcounting is also necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The switchdev_handle_port_obj_add() helper is good for replicating a port object on the lower interfaces of @dev, if that object was emitted on a bridge, or on a bridge port that is a LAG. However, drivers that use this helper limit themselves to a box from which they can no longer intercept port objects notified on neighbor ports ("foreign interfaces"). One such driver is DSA, where software bridging with foreign interfaces such as standalone NICs or Wi-Fi APs is an important use case. There, a VLAN installed on a neighbor bridge port roughly corresponds to a forwarding VLAN installed on the DSA switch's CPU port. To support this use case while also making use of the benefits of the switchdev_handle_* replication helper for port objects, introduce a new variant of these functions that crawls through the neighbor ports of @dev, in search of potentially compatible switchdev ports that are interested in the event. The strategy is identical to switchdev_handle_fdb_event_to_device(): if @dev wasn't a switchdev interface, then go one step upper, and recursively call this function on the bridge that this port belongs to. At the next recursion step, __switchdev_handle_port_obj_add() will iterate through the bridge's lower interfaces. Among those, some will be switchdev interfaces, and one will be the original @dev that we came from. To prevent infinite recursion, we must suppress reentry into the original @dev, and just call the @add_cb for the switchdev_interfaces. It looks like this: br0 / | \ / | \ / | \ swp0 swp1 eth0 1. __switchdev_handle_port_obj_add(eth0) -> check_cb(eth0) returns false -> eth0 has no lower interfaces -> eth0's bridge is br0 -> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb)) finds br0 2. __switchdev_handle_port_obj_add(br0) -> check_cb(br0) returns false -> netdev_for_each_lower_dev -> check_cb(swp0) returns true, so we don't skip this interface 3. __switchdev_handle_port_obj_add(swp0) -> check_cb(swp0) returns true, so we call add_cb(swp0) (back to netdev_for_each_lower_dev from 2) -> check_cb(swp1) returns true, so we don't skip this interface 4. __switchdev_handle_port_obj_add(swp1) -> check_cb(swp1) returns true, so we call add_cb(swp1) (back to netdev_for_each_lower_dev from 2) -> check_cb(eth0) returns false, so we skip this interface to avoid infinite recursion Note: eth0 could have been a LAG, and we don't want to suppress the recursion through its lowers if those exist, so when check_cb() returns false, we still call switchdev_lower_dev_find() to estimate whether there's anything worth a recursion beneath that LAG. Using check_cb() and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures out whether the lowers of the LAG are switchdev, but also whether they actively offload the LAG or not (whether the LAG is "foreign" to the switchdev interface or not). The port_obj_info->orig_dev is preserved across recursive calls, so switchdev drivers still know on which device was this notification originally emitted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-