- 25 Jun, 2022 1 commit
-
-
Eric Dumazet authored
I accidentally broke IPv4 traceroute, by swapping iph->saddr and iph->daddr. Probably because raw_icmp_error() and raw_v4_input() use different order for iph->saddr and iph->daddr. Fixes: ba44f818 ("raw: use more conventional iterators") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220623193540.2851799-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 24 Jun, 2022 39 commits
-
-
Lukas Wunner authored
usbnet uses the work usbnet_deferred_kevent() to perform tasks which may sleep. On disconnect, completion of the work was originally awaited in ->ndo_stop(). But in 2003, that was moved to ->disconnect() by historic commit "[PATCH] USB: usbnet, prevent exotic rtnl deadlock": https://git.kernel.org/tglx/history/c/0f138bbfd83c The change was made because back then, the kernel's workqueue implementation did not allow waiting for a single work. One had to wait for completion of *all* work by calling flush_scheduled_work(), and that could deadlock when waiting for usbnet_deferred_kevent() with rtnl_mutex held in ->ndo_stop(). The commit solved one problem but created another: It causes a use-after-free in USB Ethernet drivers aqc111.c, asix_devices.c, ax88179_178a.c, ch9200.c and smsc75xx.c: * If the drivers receive a link change interrupt immediately before disconnect, they raise EVENT_LINK_RESET in their (non-sleepable) ->status() callback and schedule usbnet_deferred_kevent(). * usbnet_deferred_kevent() invokes the driver's ->link_reset() callback, which calls netif_carrier_{on,off}(). * That in turn schedules the work linkwatch_event(). Because usbnet_deferred_kevent() is awaited after unregister_netdev(), netif_carrier_{on,off}() may operate on an unregistered netdev and linkwatch_event() may run after free_netdev(), causing a use-after-free. In 2010, usbnet was changed to only wait for a single instance of usbnet_deferred_kevent() instead of *all* work by commit 23f333a2 ("drivers/net: don't use flush_scheduled_work()"). Unfortunately the commit neglected to move the wait back to ->ndo_stop(). Rectify that omission at long last. Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/netdev/20220315113841.GA22337@pengutronix.de/Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/d1c87ebe9fc502bffcd1576e238d685ad08321e4.1655987888.git.lukas@wunner.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ciara Loftus authored
Similar to how it's done in the ice driver since 'eb087cd8 ("ice: propagate xdp_ring onto rx_ring")', read the XDP program once per NAPI instead of once per descriptor cleaned. I measured an improvement in throughput of 2% for the AF_XDP xdpsock l2fwd benchmark for zero copy mode and 1% for copy mode. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Link: https://lore.kernel.org/r/20220623100852.7867-1-ciara.loftus@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jilin Yuan authored
Delete the redundant word 'set'. Delete the redundant word 'a'. Delete the redundant word 'in'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Link: https://lore.kernel.org/r/20220623043115.60482-1-yuanjilin@cdjrlc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Richard Gobert authored
Move the len fields manipulation in the skbs to a helper function. There is a comment specifically requesting this and there are several other areas in the code displaying the same pattern which can be refactored. This improves code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Link: https://lore.kernel.org/r/20220622160853.GA6478@debianSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
liujing authored
when we modfying kernel, commit it to our environment building. we find a error that is "tools/testing/selftests/tc-testing/plugins" failed: No such file or directory" we find plugins directory is ignored in "tools/testing/selftests/tc-testing/.gitignore", but the plugins directory is need in "tools/testing/selftests/tc-testing/Makefile" Signed-off-by: liujing <liujing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20220622121237.5832-1-liujing@cmss.chinamobile.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dimitris Michailidis authored
Handle skbs with SKB_GSO_UDP_L4, advertise the offload in features, and add an ethtool counter for it. Small change to existing TSO code due to UDP's different header length. Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Link: https://lore.kernel.org/r/20220622223703.59886-1-dmichail@fungible.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Russell King says: ==================== net: pcs: lynx: consolidate gigabit code This series consolidates the gigabit setup code in the Lynx PCS driver. In order to do this properly, we first need to fix phylink's advertisement encoding function to handle QSGMII. ==================== Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/YrRbjOEEww38JFIK@shell.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Consolidate lynx_pcs_config_1000basex() and lynx_pcs_config_sgmii() into a single function. The differences between these two are: - The value that the link timer is set to. - The value of the IF_MODE register. Everything else is identical. This patch depends on "net: phylink: add QSGMII support to phylink_mii_c22_pcs_encode_advertisement()". Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
The QSGMII MAC-to-PHY reply is the same as the SGMII MAC-to-PHY reply. Add support for this to phylink_mii_c22_pcs_encode_advertisement(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dan Carpenter authored
There is a copy and paste bug in lan743x_sgmii_config() so it checks if (ret < 0) instead of if (mii_ctl < 0). Fixes: 46b777ad ("net: lan743x: Add support to SGMII 1G and 2.5G") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/YrRry7K66BzKezl8@kiliSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: Unified bridge conversion - part 3/6 This is the third part of the conversion of mlxsw to the unified bridge model. Like the second part, this patchset does not begin the conversion, but instead prepares the FID code for it. The individual changes are relatively small and self-contained with detailed description and motivation in the commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
The function was designed to configure both VLAN and FID RIFs, but currently the driver does not use VLAN RIFs. Instead, it emulates VLAN RIFs using FID RIFs. As part of the conversion to the unified bridge model, the driver will need to use VLAN RIFs, but they will be configured differently from FID RIFs. As a preparation for this change, rename the function to reflect the fact that it is specific to FID RIFs and do not pass the RIF type as an argument. This leaves mlxsw_reg_ritr_fid_set() unused, so remove it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
Currently, the driver emulates 802.1Q FIDs using 802.1D FIDs. As such, the RIFs configured on top of these FIDs are FID RIFs and not VLAN RIFs. As part of converting the driver to the unified bridge model, 802.1Q FIDs and VLAN RIFs will be used. As a preparation for this change, rename the emulated VLAN RIFs from 'MLXSW_SP_RIF_TYPE_VLAN' to 'MLXSW_SP_RIF_TYPE_VLAN_EMU'. After the conversion the emulated VLAN RIFs will be removed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
Egress VID for layer 2 multicast is determined from two tables, the MPE and PGT tables. The MPE table is a two dimensional table indexed by local port and SMPE index, which should be thought of as a FID index. In Spectrum-1 the SMPE index is derived from the PGT entry, whereas in Spectrum-2 and newer ASICs the SMPE index is a FID attribute configured via the SFMR register. The validity of the SMPE index in SFMR is influenced from two factors: 1. FID family. SMPE index is reserved for rFIDs, as their flooding is handled by firmware. 2. ASIC generation. SMPE index is always reserved for Spectrum-1. As such, the validity of the SMPE index should be an attribute of the FID family and have different arrays of FID families per-ASIC type. As a preparation for SMPE index configuration, create separate arrays of FID families for different ASICs. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
The function configures {Port, VID}->FID classification entries using the SVFA register. In the unified bridge model such entries will need to be programmed with an ingress RIF parameter, which is a FID attribute. As a preparation for this change, pass the FID structure itself to the function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
The function gets several arguments derived from the FID structure itself. In the future, it will need to be extended to configure additional FID attributes. Prepare for that change and reduce the arguments list by passing the FID structure itself. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
After the previous patch, all the callers of the function pass arguments extracted from the FID structure itself. Reduce the arguments list by simply passing the FID structure itself. This makes the function more generic as it can be easily extended to edit any FID attributes. Rename it to mlxsw_sp_fid_edit_op() to reflect that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
Currently, the only FID attributes that are edited after FID creation are its VNI and NVE tunnel flood pointer. This is achieved by eventually invoking mlxsw_sp_fid_vni_op() with an updated set of arguments. In the future, more FID attributes will need to be edited, such as the ingress RIF configured on top of the FID. Therefore, it makes sense to encapsulate all the FID edit logic in one function that will perform the edit based on an updated FID structure. To that end, update the FID structure before invoking the various edit operations that eventually call into mlxsw_sp_fid_vni_op(). Use the updated structure as the sole argument of the edit operations. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
In the unified bridge model, FID classification mappings (e.g., {Port, VID}->FID) and layer 3 egress VID classification mappings (i.e., {eRIF, ePort}->VID) will need to be updated when a RIF is configured on top of a FID. This requires the driver to be aware of all the {Port, VID} pairs mapped to a FID. To that end, extend the FID structure with a linked list of {Port, VID} pairs. Add an entry to the list when a {Port, VID} is mapped to a FID and remove it upon unmap. Keep the list sorted by local port as it will be useful for {eRIF, ePort}->VID mappings via REIV register in the future. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== ipmr: get rid of rwlocks We need to get rid of rwlocks in networking stacks, if read_lock() is (ab)used from softirq context. As discussed recently [1], rwlock are unfair by design in this case, and writers can starve and trigger soft lockups. This series convert ipmr code (both IPv4 and IPv6 families) to RCU and spinlocks. [1] https://lkml.org/lkml/2022/6/17/272 v2: fixed two typos, and resent because patch 19/19 did not make it to patchwork. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
mrt_lock is only held in write mode, from process context only. We can switch to a mere spinlock, and avoid blocking BH. Also, vif_dev_read() is always called under standard rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
mrt_lock is only held in write mode, from process context only. We can switch to a mere spinlock, and avoid blocking BH. Also, vif_dev_read() is always called under standard rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We can use standard rcu_read_lock(), to get rid of last read_lock(&mrt_lock) call points. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We no longer need to acquire mrt_lock() in mr_dump, using rcu_read_lock() is enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Like ipmr_get_route(), we can use standard RCU here. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ip6_mr_forward() uses standard RCU protection already. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is good enough. ip6mr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is good enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is more than enough. vif_dev_read() supports either mrt_lock or rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ip6mr_cache_report() first argument can be marked const, and we change the caller convention about which lock needs to be held. Instead of read_lock(&mrt_lock), we can use rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
mr_fill_mroute() uses standard rcu_read_unlock(), no need to grab mrt_lock anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ip_mr_forward() uses standard RCU protection already. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is good enough. ipmr_cache_unresolved() uses a dedicated spinlock (mfc_unres_lock) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is good enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
rcu_read_lock() protection is more than enough. vif_dev_read() supports either mrt_lock or rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ipmr_cache_report() first argument can be marked const, and we change the caller convention about which lock needs to be held. Instead of read_lock(&mrt_lock), we can use rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
igmpmsg_netlink_event() first argument can be marked const. igmpmsg_netlink_event() reads mrt->net and mrt->id, both being set once in mr_table_alloc(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We will soon use RCU instead of rwlock in ipmr & ip6mr This preliminary patch adds proper rcu verbs to read/write (struct vif_device)->dev Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
pim6_rcv() is called under rcu_read_lock(), there is no need to use dev_hold()/dev_put() pair. IPv4 side was handled in commit 55747a0a ("ipmr: __pim_rcv() is called under rcu_read_lock") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-