- 21 Jul, 2023 29 commits
-
-
Naveen Mamindlapalli authored
unlike strict priority, where number of classes are limited to max 8, there is no restriction on the number of dwrr child nodes unless the count increases the max number of child nodes supported. Hardware expects strict priority transmit schedular indexes mapped to their priority. This patch adds defines transmit schedular allocation algorithm such that the above requirement is honored. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Petr Machata says: ==================== mlxsw: Permit enslavement to netdevices with uppers The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. Similarly, detaching a front panel port from a configured topology means unoffloading of this whole topology -- VLAN uppers, next hops, etc. Attaching the port back is then not permitted at all. If it were, it would not result in a working configuration, because much of mlxsw is written to react to changes in immediate configuration. There is nothing that would go visit netdevices in the attached-to topology and offload existing routes and VLAN memberships, for example. In this patchset, introduce a number of replays to be invoked so that this sort of post-hoc offload is supported. Then remove the vetoes that disallowed enslavement of front panel ports to other netdevices with uppers. The patchset progresses as follows: - In patch #1, fix an issue in the bridge driver. To my knowledge, the issue could not have resulted in a buggy behavior previously, and thus is packaged with this patchset instead of being sent separately to net. - In patch #2, add a new helper to the switchdev code. - In patch #3, drop mlxsw selftests that will not be relevant after this patchset anymore. - Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother introduction of the rest of the code. - Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper configuration when a front panel port is introduced into a topology. Individual patches take care of bridge and LAG RIF memberships, switchdev replay, nexthop and neighbors replay, and MACVLAN offload. - Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a front panel port is enslaved (in which case all uppers are newly relevant), or, respectively, deslaved (in which case the newly-relevant netdevice is the one being deslaved). - Up until this point, the introduced scaffolding was not really used, because mlxsw still forbids enslavement of mlxsw netdevices to uppers with uppers. In patch #17, this condition is finally relaxed. A sizable selftest suite is available to test all this new code. That will be sent in a separate patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. In the previous patches, a number of replays have been added. Those ensure that various bits of state, such as next hops or switchdev objects, are offloaded when they become relevant due to a mlxsw lower being introduced into the topology. However the act of actually, for example, enslaving a front-panel port to a bridge with uppers, has been vetoed so far. In this patch, remove the vetoes and permit the operation. mlxsw currently validates creation of "interesting" uppers. Thus creating VLAN netdevices on top of 802.1ad bridges is forbidden if the bridge has an mlxsw lower, but permitted in general. This validation code never gets run when a port is introduced as a lower of an existing netdevice structure. Thus when enslaving an mlxsw netdevice to netdevices with uppers, invoke the PRECHANGEUPPER event handler for each netdevice above the one that the front panel port is being enslaved to. This way the tower of netdevices above the attachment point is validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When a netdevice is removed from a bridge or a LAG, and it has an IP address, it should join the router and gain a RIF. Do that by replaying address addition event on the netdevice. When handling deslavement of LAG or its upper from a bridge device, the replay should be done after all the lowers of the LAG have left the bridge. Thus these scenarios are handled by passing replay_deslavement of false, and by invoking, after the lowers have been processed, a new helper, mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper handling, and in particular invokes the replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Enslaving of front panel ports (and their uppers) to netdevices that already have uppers is currently forbidden. When this is permitted, any uppers with IP addresses need to have the NETDEV_UP inetaddr event replayed, so that any RIFs are created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
As neighbours are created, mlxsw is involved through the netevent notifications. When at the time there is no RIF for a given neighbour, the notification is not acted upon. When the RIF is later created, these outstanding neighbours are left unoffloaded and cause traffic to go through the SW datapath. In order to fix this issue, as a RIF is created, walk the ARP and ND tables and find neighbours for the netdevice that represents the RIF. Then schedule neighbour work for them, allowing them to be offloaded. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
If IP address is added to a MACVLAN netdevice, the effect is of configuring VRRP on the RIF for the netdevice linked to the MACVLAN. Because the MACVLAN offload is tied to existence of a RIF at the linked netdevice, adding a MACVLAN is currently not allowed until a RIF is present. If this requirement stays, it will never be possible to attach a first port into a topology that involves a MACVLAN. Thus topologies would need to be built in a certain order, which is impractical. Additionally, IP address removal, which leads to disappearance of the RIF that the MACVLAN depends on, cannot be vetoed. Thus even as things stand now it is possible to get to a state where a MACVLAN netdevice exists without a RIF, despite having mlxsw lowers. And once the MACVLAN is un-offloaded due to RIF getting destroyed, recreating the RIF does not bring it back. In this patch, accept that MACVLAN can be created out of order and support that use case. One option would seem to be to simply recognize MACVLAN netdevices as "interesting", and let the existing replay mechanisms take care of the offload. However, that does not address the necessity to reoffload MACVLAN once a RIF is created. Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(), called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the existing offloads, applies the configuration as if the netdevice were created just now. Additionally, remove all vetoes and warning messages that checked for presence of a RIF at the linked device. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
As RIF is created, refresh each netxhop group tracked at the CRIF for which the RIF was created. Note that nothing needs to be done for IPIP nexthops. The RIF for these is either available from the get-go, or will never be available, so no after the fact offloading needs to be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
In the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join not only RIF for the immediate LAG, as is currently the case, but also RIFs for VLAN netdevices upper to the LAG. In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the uppers of a LAG being joined, and also join any VLAN ones. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. At that point there are no bridge objects (such as port VLANs) to replay. Those are added afterwards, and notified as they are created. This holds even for the PVID. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to replay the existing bridge objects. Without this replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which causes issues later, as a lot of code relies on their presence. To that end, add a new notifier block whose sole role is to filter out events related to the one relevant upper, and forward those to the existing switchdev notifier block. Pass the new notifier block to switchdev_bridge_port_offload() when the bridge port is created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG already has an upper, enslaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to join bridges of LAG uppers. Without this replay, the mlxsw bridge_port objects are not instantiated, which causes issues later, as a lot of code relies on their presence. Therefore in this patch, when the first mlxsw physical netdevice is enslaved to a LAG, consider bridges upper to the LAG (both the direct master, if any, and any bridge masters of VLAN uppers), and have the relevant netdevices join their bridges. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When handling deslavement of LAG or its upper from a bridge device, when the deslaved netdevice has an IP address, it should join the router. This should be done after all the lowers of the LAG have left the bridge. The replay intended to cause the device to join the router therefore cannot be invoked unconditionally in the event handlers themselves. It can be done right away if the handler is invoked for a sole device, but when it is invoked repeated for each LAG lower, the replay needs to be postponed until after this processing is done. To that end, add a boolean parameter, replay_deslavement, to mlxsw_sp_netdevice_port_upper_event(), mlxsw_sp_netdevice_port_vlan_event() and one helper on the call path. Have the invocations that are done for sole netdevices pass true, and those done for LAG lowers pass false. Nothing depends on this flag at this point, but it removes some noise from the patch that introduces the replay itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Currently the bridge-related handlers bail out when the event is related to a netdevice that is not an upper of one of the front-panel ports. In order to allow enslavement of front-panel ports to bridges that already have uppers, it will be necessary to replay CHANGEUPPER events to validate that the configuration is offloadable. In order for the replay to be effective, it must be possible to ignore unsupported configuration in the context of an actual notifier event, but to still "veto" these configurations when the validation is performed. To that end, introduce two parameters to a number of handlers: mlxsw_sp, because it will not be possible to deduce that from the netdevice lowers; and process_foreign to indicate whether netdevices that are not front panel uppers should be validated. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Move the meat of mlxsw_sp_netdevice_event() to a separate function that does just the validation. This separate helper will be possible to call later for recursive ascent when validating attachment of a front panel port to a bridge with uppers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
This will come in handy for neighbour replay. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Currently the IP address event handlers bail out when the event is related to a netdevice that is a bridge port or a member of a LAG. In order to create a RIF when a bridged or LAG'd port is unenslaved, these event handlers will be replayed. However, at the point in time when the NETDEV_CHANGEUPPER event is delivered, informing of the loss of enslavement, the port is still formally enslaved. In order for the operation to have any effect, these handlers need an extra parameter to indicate that the check for bridge or LAG membership should not be done. In this patch, add an argument "nomaster" to several event handlers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Support for enslaving ports to LAGs with uppers will be added in the following patches. Selftests to make sure it actually does the right thing are ready and will be sent as a follow-up. Similarly, ordering of MACVLAN creation and RIF creation will be relaxed and it will be permitted to create a MACVLAN first. Thus these two tests are obsolete. Drop them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones. Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function. Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
There are two kinds of MDB entries to be replayed: port MDB entries, and host MDB entries. They are both replayed by br_switchdev_mdb_replay(). If the driver supports one kind, but lacks the other, the first -EOPNOTSUPP returned terminates the whole replay, including any further still-supported objects in the list. For this to cause issues, there must be MDB entries for both the host and the port being replayed. In that case, if the driver bails out from handling the host entry, the port entries are never replayed. However, the replay is currently only done when a switchdev port joins a bridge. There would be no port memberships at that point. Thus despite being erroneous, the code does not cause observable bugs. This is not an issue with other object kinds either, because there, each function replays one object kind. If a driver does not support that kind, it makes sense to bail out early. -EOPNOTSUPP is then ignored in nbp_switchdev_sync_objs(). For MDB, suppress the -EOPNOTSUPP error code in br_switchdev_mdb_replay() already, so that the whole list gets replayed. The reason we need this patch is that a future patch will introduce a replay that should be used when a front-panel port netdevice is enslaved to a bridge lower, in particular a LAG. The LAG netdevice can already have both host and port MDB entries. The port entries need to be replayed so that they are offloaded on the port that joins the LAG. Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lorenzo Bianconi authored
Introduce MTK_FOE_ENTRY_V{1,2}_SIZE macros in order to make more explicit foe_entry size for different chipset revisions. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Benjamin Poirier says: ==================== nexthop: Refactor and fix nexthop selection for multipath routes In order to select a nexthop for multipath routes, fib_select_multipath() is used with legacy nexthops and nexthop_select_path_hthr() is used with nexthop objects. Those two functions perform a validity test on the neighbor related to each nexthop but their logic is structured differently. This causes a divergence in behavior and nexthop_select_path_hthr() may return a nexthop that failed the neighbor validity test even if there was one that passed. Refactor nexthop_select_path_hthr() to make it more similar to fib_select_multipath() and fix the problem mentioned above. v1: https://lore.kernel.org/netdev/20230529201914.69828-1-bpoirier@nvidia.com/ ==================== Link: https://lore.kernel.org/r/20230719-nh_select-v2-0-04383e89f868@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Benjamin Poirier authored
Add test cases for hash threshold (multipath) nexthop groups with invalid neighbors. Check that a nexthop with invalid neighbor is not selected when there is another nexthop with a valid neighbor. Check that there is no crash when there is no nexthop with a valid neighbor. The first test fails before the previous commit in this series. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230719-nh_select-v2-4-04383e89f868@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Benjamin Poirier authored
With legacy nexthops, when net.ipv4.fib_multipath_use_neigh is set, fib_select_multipath() will never set res->nhc to a nexthop that is not good (as per fib_good_nh()). OTOH, with nexthop objects, nexthop_select_path_hthr() may return a nexthop that failed the nexthop_is_good_nh() test even if there was one that passed. Refactor nexthop_select_path_hthr() to follow a selection logic more similar to fib_select_multipath(). The issue can be demonstrated with the following sequence of commands. The first block shows that things work as expected with legacy nexthops. The last sequence of `ip rou get` in the second block shows the problem case - some routes still use the .2 nexthop. sysctl net.ipv4.fib_multipath_use_neigh=1 ip link add dummy1 up type dummy ip rou add 198.51.100.0/24 nexthop via 192.0.2.1 dev dummy1 onlink nexthop via 192.0.2.2 dev dummy1 onlink for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip neigh add 192.0.2.1 dev dummy1 nud failed echo ".1 failed:" # results should not use .1 for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip neigh del 192.0.2.1 dev dummy1 ip neigh add 192.0.2.2 dev dummy1 nud failed echo ".2 failed:" # results should not use .2 for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip link del dummy1 ip link add dummy1 up type dummy ip nexthop add id 1 via 192.0.2.1 dev dummy1 onlink ip nexthop add id 2 via 192.0.2.2 dev dummy1 onlink ip nexthop add id 1001 group 1/2 ip rou add 198.51.100.0/24 nhid 1001 for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip neigh add 192.0.2.1 dev dummy1 nud failed echo ".1 failed:" # results should not use .1 for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip neigh del 192.0.2.1 dev dummy1 ip neigh add 192.0.2.2 dev dummy1 nud failed echo ".2 failed:" # results should not use .2 for i in {10..19}; do ip -o rou get 198.51.100.$i; done ip link del dummy1 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230719-nh_select-v2-3-04383e89f868@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Benjamin Poirier authored
For legacy nexthops, there is fib_good_nh() to check the neighbor validity. In order to make the nexthop object code more similar to the legacy nexthop code, factor out the nexthop object neighbor validity check into its own function. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230719-nh_select-v2-2-04383e89f868@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Benjamin Poirier authored
The loop in nexthop_select_path_hthr() includes code to check for neighbor validity. Since this does not apply to fdb nexthops, simplify the loop by moving the fdb nexthop selection to its own function. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230719-nh_select-v2-1-04383e89f868@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== eth: bnxt: handle invalid Tx completions more gracefully bnxt trusts the events generated by the device which may lead to kernel crashes. These are extremely rare but they do happen. For a while I thought crashing may be intentional, because device reporting invalid completions should never happen, and having a core dump could be useful if it does. But in practice I haven't found any clues in the core dumps, and panic_on_warn exists. Series was tested by forcing the recovery path manually. Because of how rare the real crashes are I can't confirm it works for the actual device errors until it's been widely deployed. v1: https://lore.kernel.org/all/20230710205611.1198878-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230720010440.1967136-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Invalid Tx completions should never happen (tm) but when they do they crash the host, because driver blindly trusts that there is a valid skb pointer on the ring. The completions I've seen appear to be some form of FW / HW miscalculation or staleness, they have typical (small) values (<100), but they are most often higher than number of queued descriptors. They usually happen after boot. Instead of crashing, print a warning and schedule a reset. Link: https://lore.kernel.org/r/20230720010440.1967136-4-kuba@kernel.orgReviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Most callers of bnxt_queue_sp_work() set a bit to indicate what work to perform right before calling it. Pass it to the function instead. Link: https://lore.kernel.org/r/20230720010440.1967136-3-kuba@kernel.orgReviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move the reset helpers, subsequent patches will need some of them on the Tx path. While at it rename bnxt_sched_reset(), on more recent chips it schedules a queue reset, instead of a fuller reset. Link: https://lore.kernel.org/r/20230720010440.1967136-2-kuba@kernel.orgReviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 20 Jul, 2023 11 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from BPF, netfilter, bluetooth and CAN. Current release - regressions: - eth: r8169: multiple fixes for PCIe ASPM-related problems - vrf: fix RCU lockdep splat in output path Previous releases - regressions: - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set - dsa: mv88e6xxx: do a final check before timing out when polling - nf_tables: fix sleep in atomic in nft_chain_validate Previous releases - always broken: - sched: fix undoing tcf_bind_filter() in multiple classifiers - bpf, arm64: fix BTI type used for freplace attached functions - can: gs_usb: fix time stamp counter initialization - nft_set_pipapo: fix improper element removal (leading to UAF) Misc: - net: support STP on bridge in non-root netns, STP prevents packet loops so not supporting it results in freezing systems of unsuspecting users, and in turn very upset noises being made - fix kdoc warnings - annotate various bits of TCP state to prevent data races" * tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: phy: prevent stale pointer dereference in phy_init() tcp: annotate data-races around fastopenq.max_qlen tcp: annotate data-races around icsk->icsk_user_timeout tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->linger2 tcp: annotate data-races around icsk->icsk_syn_retries tcp: annotate data-races around tp->keepalive_probes tcp: annotate data-races around tp->keepalive_intvl tcp: annotate data-races around tp->keepalive_time tcp: annotate data-races around tp->tsoffset tcp: annotate data-races around tp->tcp_tx_delay Bluetooth: MGMT: Use correct address for memcpy() Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 Bluetooth: SCO: fix sco_conn related locking and validity issues Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() Bluetooth: coredump: fix building with coredump disabled Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_event: call disconnect callback before deleting conn ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothJakub Kicinski authored
Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix building with coredump disabled - Fix use-after-free in hci_remove_adv_monitor - Use RCU for hci_conn_params and iterate safely in hci_sync - Fix locking issues on ISO and SCO - Fix bluetooth on Intel Macbook 2014 * tag 'for-net-2023-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Use correct address for memcpy() Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 Bluetooth: SCO: fix sco_conn related locking and validity issues Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() Bluetooth: coredump: fix building with coredump disabled Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_event: call disconnect callback before deleting conn Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync ==================== Link: https://lore.kernel.org/r/20230720190201.446469-1-luiz.dentz@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski authored
Florian Westphal says: ==================== Netfilter fixes for net: The following patchset contains Netfilter fixes for net: 1. Fix spurious -EEXIST error from userspace due to padding holes, this was broken since 4.9 days when 'ignore duplicate entries on insert' feature was added. 2. Fix a sched-while-atomic bug, present since 5.19. 3. Properly remove elements if they lack an "end range". nft userspace always sets an end range attribute, even when its the same as the start, but the abi doesn't have such a restriction. Always broken since it was added in 5.6, all three from myself. 4 + 5: Bound chain needs to be skipped in netns release and on rule flush paths, from Pablo Neira. * tag 'nf-23-07-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: skip bound chain on rule flush netfilter: nf_tables: skip bound chain in netns release path netfilter: nft_set_pipapo: fix improper element removal netfilter: nf_tables: can't schedule in nft_chain_validate netfilter: nf_tables: fix spurious set element insertion failure ==================== Link: https://lore.kernel.org/r/20230720165143.30208-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
mdio_bus_init() and phy_driver_register() both have error paths, and if those are ever hit, ethtool will have a stale pointer to the phy_ethtool_phy_ops stub structure, which references memory from a module that failed to load (phylib). It is probably hard to force an error in this code path even manually, but the error teardown path of phy_init() should be the same as phy_exit(), which is now simply not the case. Fixes: 55d8f053 ("net: phy: Register ethtool PHY operations") Link: https://lore.kernel.org/netdev/ZLaiJ4G6TaJYGJyU@shell.armlinux.org.uk/Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230720000231.1939689-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== tcp: add missing annotations This series was inspired by one syzbot (KCSAN) report. do_tcp_getsockopt() does not lock the socket, we need to annotate most of the reads there (and other places as well). This is a first round, another series will come later. ==================== Link: https://lore.kernel.org/r/20230719212857.3943972-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
This field can be read locklessly. Fixes: 1536e285 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
This field can be read locklessly from do_tcp_getsockopt() Fixes: dca43c75 ("tcp: Add TCP_USER_TIMEOUT socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-11-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
tp->notsent_lowat can be read locklessly from do_tcp_getsockopt() and tcp_poll(). Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-10-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
do_tcp_getsockopt() reads rskq_defer_accept while another cpu might change its value. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-9-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
do_tcp_getsockopt() reads tp->linger2 while another cpu might change its value. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-8-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-