- 18 Mar, 2022 6 commits
-
-
Vladimir Oltean authored
Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an offload for tc-flower) is done by setting the MIRROR_ENA bit from the action vector of the filter. The packet is mirrored to the port mask configured in the ANA:ANA:MIRRORPORTS register (the same port mask as the destinations for port-based mirroring). Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress protocol ip \ flower skip_sw ip_proto icmp \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
Some VCAP filters utilize resources which are global to the switch, like for example VCAP IS2 policers take an index into a global policer pool. In commit c9a7fe12 ("net: mscc: ocelot: add action of police on vcap_is2"), Xiaoliang expressed this by hooking into the low-level ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter() functions, and allocating/freeing the policers from there. Evaluating the code, there probably isn't a better place, but we'll need to do something similar for the mirror ports, and the code will start to look even more hacked up than it is right now. Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to contain the madness, and pollute less the body of other functions such as ocelot_vcap_filter_add_to_block(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
Ocelot switches perform port-based ingress mirroring if ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if the port is in ANA:ANA:EMIRRORPORTS. Both ingress-mirrored and egress-mirrored frames are copied to the port mask from ANA:ANA:MIRRORPORTS. So the choice of limiting to a single mirror port via ocelot_mirror_get() and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't map very well to the user space model. If the user wants to mirror the ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there are no tc-matchall rules to describe those actions. Now, we could offload a matchall rule with multiple mirred actions, one per desired mirror port, and force the user to stick to the multi-action rule format for subsequent matchall filters. But both DSA and ocelot have the flow_offload_has_one_action() check for the matchall offload, plus the fact that it will get cumbersome to cross-check matchall mirrors with flower mirrors (which will be added in the next patch). As a result, we limit the configuration to a single mirror port, with the possibility of lifting the restriction in the future. Frames injected from the CPU don't get egress-mirrored, since they are sent with the BYPASS bit in the injection frame header, and this bypasses the analyzer module (effectively also the mirroring logic). I don't know what to do/say about this. Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress \ matchall skip_sw \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
In preparation for adding port mirroring support to the ocelot driver, the dispatching function ocelot_setup_tc_cls_matchall() must be free of action-specific code. Move port policer creation and deletion to separate functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jonathan Lemon authored
An earlier patch mistakenly changed these variables from u32 to u16, leading to unintended truncation. Restore the original logic. Fixes: a509a7c6 ("ptp: ocp: Add support for selectable SMA directions.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220316165347.599154-1-jonathan.lemon@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oleksij Rempel authored
According to erratum described in DS80000687C[1]: "Module 2: Link drops with some EEE link partners.", we need to "Disable the EEE next page exchange in EEE Global Register 2" 1 - https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ87xx-Errata-DS80000687C.pdfSigned-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220316125529.1489045-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 17 Mar, 2022 34 commits
-
-
Jakub Kicinski authored
Tobias Waldekranz says: ==================== net: bridge: Multiple Spanning Trees The bridge has had per-VLAN STP support for a while now, since: https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/ The current implementation has some problems: - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN is managed independently. This is awkward from an MSTP (802.1Q-2018, Clause 13.5) point of view, where the model is that multiple VLANs are grouped into MST instances. Because of the way that the standard is written, presumably, this is also reflected in hardware implementations. It is not uncommon for a switch to support the full 4k range of VIDs, but that the pool of MST instances is much smaller. Some examples: Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs Marvell Prestera: 4k VLANs, but only 128 MSTIs Microchip SparX-5i: 4k VLANs, but only 128 MSTIs - By default, the feature is enabled, and there is no way to disable it. This makes it hard to add offloading in a backwards compatible way, since any underlying switchdevs have no way to refuse the function if the hardware does not support it - The port-global STP state has precedence over per-VLAN states. In MSTP, as far as I understand it, all VLANs will use the common spanning tree (CST) by default - through traffic engineering you can then optimize your network to group subsets of VLANs to use different trees (MSTI). To my understanding, the way this is typically managed in silicon is roughly: Incoming packet: .----.----.--------------.----.------------- | DA | SA | 802.1Q VID=X | ET | Payload ... '----'----'--------------'----'------------- | '->|\ .----------------------------. | +--> | VID | Members | ... | MSTI | PVID -->|/ |-----|---------|-----|------| | 1 | 0001001 | ... | 0 | | 2 | 0001010 | ... | 10 | | 3 | 0001100 | ... | 10 | '----------------------------' | .-----------------------------' | .------------------------. '->| MSTI | Fwding | Lrning | |------|--------|--------| | 0 | 111110 | 111110 | | 10 | 110111 | 110111 | '------------------------' What this is trying to show is that the STP state (whether MSTP is used, or ye olde STP) is always accessed via the VLAN table. If STP is running, all MSTI pointers in that table will reference the same index in the STP stable - if MSTP is running, some VLANs may point to other trees (like in this example). The fact that in the Linux bridge, the global state (think: index 0 in most hardware implementations) is supposed to override the per-VLAN state, is very awkward to offload. In effect, this means that when the global state changes to blocking, drivers will have to iterate over all MSTIs in use, and alter them all to match. This also means that you have to cache whether the hardware state is currently tracking the global state or the per-VLAN state. In the first case, you also have to cache the per-VLAN state so that you can restore it if the global state transitions back to forwarding. This series adds a new mst_enable bridge setting (as suggested by Nik) that can only be changed when no VLANs are configured on the bridge. Enabling this mode has the following effect: - The port-global STP state is used to represent the CST (Common Spanning Tree) (1/15) - Ingress STP filtering is deferred until the frame's VLAN has been resolved (1/15) - The preexisting per-VLAN states can no longer be controlled directly (1/15). They are instead placed under the MST module's control, which is managed using a new netlink interface (described in 3/15) - VLANs can br mapped to MSTIs in an arbitrary M:N fashion, using a new global VLAN option (2/15) Switchdev notifications are added so that a driver can track: - MST enabled state - VID to MSTI mappings - MST port states An offloading implementation is this provided for mv88e6xxx. ==================== Link: https://lore.kernel.org/r/20220316150857.2442916-1-tobias@waldekranz.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Allocate a SID in the STU for each MSTID in use by a bridge and handle the mapping of MSTIDs to VLANs using the SID field of each VTU entry. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Export the raw STU data in a devlink region so that it can be inspected from userspace and compared to the current bridge configuration. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states were kept in the VTU - there was no concept of a SID. Later, the information was split into two tables, where the VTU only tracked memberships and deferred the STP state tracking to the STU via a pointer (SID). This meant that a group of VLANs could share the same STU entry. Most likely, this was done to align with MSTP (802.1Q-2018, Clause 13), which is built on this principle. While the VTU is still 4k lines on most devices, the STU is capped at 64 entries. This means that the current stategy, updating STU info whenever a VTU entry is updated, can not easily support MSTP because: - The maximum number of VIDs would also be capped at 64, as we would have to allocate one SID for every VTU entry - even if many VLANs would effectively share the same MST. - MSTP updates would be unnecessarily slow as you would have to iterate over all VLANs that share the same MST. In order to support MSTP offloading in the future, manage the STU as a separate entity from the VTU. Only add support for newer hardware with separate VTU and STU. VTU-only devices can also be supported, but essentially this requires a software implementation of an STU (fanning out state changed to all VLANs tied to the same MST). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tobias Waldekranz authored
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
There's a number of systems supporting DASH remote management. Driver unload and system shutdown can result in the PHY suspending, thus making DASH unusable. Improve this by handling DASH being enabled very similar to WoL being enabled. Tested-by: Yanko Kaneti <yaneti@declera.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1de3b176-c09c-1654-6f00-9785f7a4f954@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski authored
Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-16 This series contains updates to gtp and ice driver. Wojciech fixes smatch reported inconsistent indenting for gtp and ice. Yang Yingliang fixes a couple of return value checks for GNSS to IS_PTR instead of null. Jacob adds support for trace events on tx timestamps. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: add trace events for tx timestamps ice: fix return value check in ice_gnss.c ice: Fix inconsistent indenting in ice_switch gtp: Fix inconsistent indenting ==================== Link: https://lore.kernel.org/r/20220316204024.3201500-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Colin Ian King authored
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316234620.55885-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Colin Ian King authored
There is a spelling mistake in a dev_err message and the MAX_SKB_FRAGS value does not need to be printed between parentheses. Fix this. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316233455.54541-1-colin.i.king@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bill Wendling authored
When compiling with -Wformat, clang emits the following warning: net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] mp->priority, ((mp->vlan_qos >> 13) & 0x7)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bill Wendling authored
When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/xgmac_mdio.c:243:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213114.2352352-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bill Wendling authored
When compiling with -Wformat, clang emits the following warnings: drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:40: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num); ~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:51: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hx.%hx", num >> 16, num); ~~~ ^~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:47: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:58: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:68: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~ %x The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213104.2351651-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Bill Wendling authored
When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/enetc/enetc_mdio.c:151:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Bill Wendling <morbo@google.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220316213109.2352015-1-morbo@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, ipsec, and wireless. A few last minute revert / disable and fix patches came down from our sub-trees. We're not waiting for any fixes at this point. Current release - regressions: - Revert "netfilter: nat: force port remap to prevent shadowing well-known ports", restore working conntrack on asymmetric paths - Revert "ath10k: drop beacon and probe response which leak from other channel", restore working AP and mesh mode on QCA9984 - eth: intel: fix hang during reboot/shutdown Current release - new code bugs: - netfilter: nf_tables: disable register tracking, it needs more work to cover all corner cases Previous releases - regressions: - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only) extension headers get specified - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return value more selectively - bnx2x: fix driver load failure when FW not present in initrd Previous releases - always broken: - vsock: stop destroying unrelated sockets in nested virtualization - packet: fix slab-out-of-bounds access in packet_recvmsg() Misc: - add Paolo Abeni to networking maintainers!" * tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits) iavf: Fix hang during reboot/shutdown net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload net: bcmgenet: skip invalid partial checksums bnx2x: fix built-in kernel driver load failure net: phy: mscc: Add MODULE_FIRMWARE macros net: dsa: Add missing of_node_put() in dsa_port_parse_of net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() Revert "ath10k: drop beacon and probe response which leak from other channel" hv_netvsc: Add check for kvmalloc_array iavf: Fix double free in iavf_reset_task ice: destroy flow director filter mutex after releasing VSIs ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats() Add Paolo Abeni to networking maintainers atm: eni: Add check for dma_map_single net/packet: fix slab-out-of-bounds access in packet_recvmsg() net: mdio: mscc-miim: fix duplicate debugfs entry net: phy: marvell: Fix invalid comparison in the resume and suspend functions esp6: fix check on ipv6_skip_exthdr's return value net: dsa: microchip: add spi_device_id tables netfilter: nf_tables: disable register tracking ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fix from Rafael Wysocki: "Revert recent commit that caused multiple systems to misbehave due to firmware issues" * tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "Four patches. Subsystems affected by this patch series: mm/swap, kconfig, ocfs2, and selftests" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: selftests: vm: fix clang build error multiple output files ocfs2: fix crash when initialize filecheck kobj fails configs/debug: restore DEBUG_INFO=y for overriding mm: swap: get rid of livelock in swapin readahead
-
Yosry Ahmed authored
When building the vm selftests using clang, some errors are seen due to having headers in the compilation command: clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test clang: error: cannot specify -o when generating multiple output files make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1 Rework to add the header files to LOCAL_HDRS before including ../lib.mk, since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in file lib.mk. Link: https://lkml.kernel.org/r/20220304000645.1888133-1-yosryahmed@google.comSigned-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joseph Qi authored
Once s_root is set, genric_shutdown_super() will be called if fill_super() fails. That means, we will call ocfs2_dismount_volume() twice in such case, which can lead to kernel crash. Fix this issue by initializing filecheck kobj before setting s_root. Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com Fixes: 5f483c4a ("ocfs2: add kobject for online file check") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qian Cai authored
Previously, I failed to realize that Kees' patch [1] has not been merged into the mainline yet, and dropped DEBUG_INFO=y too eagerly from the mainline. As the results, "make debug.config" won't be able to flip DEBUG_INFO=n from the existing .config. This should close the gaps of a few weeks before Kees' patch is there, and work regardless of their merging status anyway. Link: https://lore.kernel.org/all/20220125075126.891825-1-keescook@chromium.org/ [1] Link: https://lkml.kernel.org/r/20220308153524.8618-1-quic_qiancai@quicinc.comSigned-off-by: Qian Cai <quic_qiancai@quicinc.com> Reported-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Guo Ziliang authored
In our testing, a livelock task was found. Through sysrq printing, same stack was found every time, as follows: __swap_duplicate+0x58/0x1a0 swapcache_prepare+0x24/0x30 __read_swap_cache_async+0xac/0x220 read_swap_cache_async+0x58/0xa0 swapin_readahead+0x24c/0x628 do_swap_page+0x374/0x8a0 __handle_mm_fault+0x598/0xd60 handle_mm_fault+0x114/0x200 do_page_fault+0x148/0x4d0 do_translation_fault+0xb0/0xd4 do_mem_abort+0x50/0xb0 The reason for the livelock is that swapcache_prepare() always returns EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it cannot jump out of the loop. We suspect that the task that clears the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the priority of the task stuck in a livelock so that the task that clears the SWAP_HAS_CACHE flag will run. The results show that the system returns to normal after the priority is lowered. In our testing, multiple real-time tasks are bound to the same core, and the task in the livelock is the highest priority task of the core, so the livelocked task cannot be preempted. Although cond_resched() is used by __read_swap_cache_async, it is an empty function in the preemptive system and cannot achieve the purpose of releasing the CPU. A high-priority task cannot release the CPU unless preempted by a higher-priority task. But when this task is already the highest priority task on this core, other tasks will not be able to be scheduled. So we think we should replace cond_resched() with schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will call set_current_state first to set the task state, so the task will be removed from the running queue, so as to achieve the purpose of giving up the CPU and prevent it from running in kernel mode for too long. (akpm: ugly hack becomes uglier. But it fixes the issue in a backportable-to-stable fashion while we hopefully work on something better) Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.comSigned-off-by: Guo Ziliang <guo.ziliang@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Acked-by: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roger Quadros <rogerq@kernel.org> Cc: Ziliang Guo <guo.ziliang@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ivan Vecera authored
Recent commit 97457801 ("iavf: Add waiting so the port is initialized in remove") adds a wait-loop at the beginning of iavf_remove() to ensure that port initialization is finished prior unregistering net device. This causes a regression in reboot/shutdown scenario because in this case callback iavf_shutdown() is called and this callback detaches the device, makes it down if it is running and sets its state to __IAVF_REMOVE. Later shutdown callback of associated PF driver (e.g. ice_shutdown) is called. That callback calls among other things sriov_disable() that calls indirectly iavf_remove() (see stack trace below). As the adapter state is already __IAVF_REMOVE then the mentioned loop is end-less and shutdown process hangs. The patch fixes this by checking adapter's state at the beginning of iavf_remove() and skips the rest of the function if the adapter is already in remove state (shutdown is in progress). Reproducer: 1. Create VF on PF driven by ice or i40e driver 2. Ensure that the VF is bound to iavf driver 3. Reboot [52625.981294] sysrq: SysRq : Show Blocked State [52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2 [52625.996732] Call Trace: [52625.999187] __schedule+0x2d1/0x830 [52626.007400] schedule+0x35/0xa0 [52626.010545] schedule_hrtimeout_range_clock+0x83/0x100 [52626.020046] usleep_range+0x5b/0x80 [52626.023540] iavf_remove+0x63/0x5b0 [iavf] [52626.027645] pci_device_remove+0x3b/0xc0 [52626.031572] device_release_driver_internal+0x103/0x1f0 [52626.036805] pci_stop_bus_device+0x72/0xa0 [52626.040904] pci_stop_and_remove_bus_device+0xe/0x20 [52626.045870] pci_iov_remove_virtfn+0xba/0x120 [52626.050232] sriov_disable+0x2f/0xe0 [52626.053813] ice_free_vfs+0x7c/0x340 [ice] [52626.057946] ice_remove+0x220/0x240 [ice] [52626.061967] ice_shutdown+0x16/0x50 [ice] [52626.065987] pci_device_shutdown+0x34/0x60 [52626.070086] device_shutdown+0x165/0x1c5 [52626.074011] kernel_restart+0xe/0x30 [52626.077593] __do_sys_reboot+0x1d2/0x210 [52626.093815] do_syscall_64+0x5b/0x1a0 [52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 97457801 ("iavf: Add waiting so the port is initialized in remove") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since the blamed commit, through a chain index whose number encodes a specific PAG (Policy Action Group) and lookup number. The chain number is translated through ocelot_chain_to_pag() into a PAG, and through ocelot_chain_to_lookup() into a lookup number. The problem with the blamed commit is that the above 2 functions don't have special treatment for chain 0. So ocelot_chain_to_pag(0) returns filter->pag = 224, which is in fact -32, but the "pag" field is an u8. So we end up programming the hardware with VCAP IS2 entries having a PAG of 224. But the way in which the PAG works is that it defines a subset of VCAP IS2 filters which should match on a packet. The default PAG is 0, and previous VCAP IS1 rules (which we offload using 'goto') can modify it. So basically, we are installing filters with a PAG on which no packet will ever match. This is the hardware equivalent of adding filters to a chain which has no 'goto' to it. Restore the previous functionality by making ACL filters offloaded to chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly correct, but the choice of lookup number isn't "as before" (which was to leave the lookup a "don't care"). However, lookup 0 should be fine, since even though there are ACL actions (policers) which have a requirement to be used in a specific lookup, that lookup is 0. Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220316192117.2568261-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-