- 06 May, 2022 25 commits
-
-
Michal Swiatkowski authored
Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
David S. Miller authored
Jakub Kicinski says: ==================== net: disambiguate the TSO and GSO limits This series separates the device-reported TSO limitations from the user space-controlled GSO limits. It used to be that we only had the former (HW limits) but they were named GSO. This probably lead to confusion and letting user override them. The problem came up in the BIG TCP discussion between Eric and Alex, and seems like something we should address. Targeting net-next because (a) nobody is reporting problems; and (b) there is a tiny but non-zero chance that some actually wants to lift the HW limitations. ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
These are now internal to the core, no need to expose them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Drivers should call the TSO setting helper, GSO is controllable by user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Up until commit 46e6b992 ("rtnetlink: allow GSO maximums to be set on device creation") the gso_max_segs and gso_max_size of a device were not controlled from user space. The quoted commit added the ability to control them because of the following setup: netns A | netns B veth<->veth eth0 If eth0 has TSO limitations and user wants to efficiently forward traffic between eth0 and the veths they should copy the TSO limitations of eth0 onto the veths. This would happen automatically for macvlans or ipvlan but veth users are not so lucky (given the loose coupling). Unfortunately the commit in question allowed users to also override the limits on real HW devices. It may be useful to control the max GSO size and someone may be using that ability (not that I know of any user), so create a separate set of knobs to reliably record the TSO limitations. Validate the user requests. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
To make later patches smaller create a helper for inheriting the TSO limitations of a lower device. The TSO in the name is not an accident, subsequent patches will replace GSO with TSO in more names. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Simon Horman says: ==================== nfp: flower: decap neighbour table rework Louis Peens says: This patch series reworks the way in which flow rules that outputs to OVS internal ports gets handled by the nfp driver. Previously this made use of a small pre_tun_table, but this only used destination MAC addresses, and made the implicit assumption that there is only a single source MAC":"destination MAC" mapping per tunnel. In hindsight this seems to be a pretty obvious oversight, but this was hidden in plain sight for quite some time. This series changes the implementation to make use of the same Neighbour table for decap that is in use for the tunnel encap solution. It stores any new Neighbour updates in this table. Previously this path was only triggered for encapsulation candidates, and the entries were send and forget, not saved on the host as it is after this series. It also keeps track of any flow rule that outputs to OVS internal ports (and some other criteria not worth mentioning here), very similar to how it was done previously, except now these flows are kept track of in a list. When a new Neighbour entry gets added this list gets iterated for potential matches, in which case the table gets updated with a reference to the flow, and the Neighbour entry on the card gets updated with the relevant host_ctx. The same happens when a new qualifying flow gets added - the Neighbour table gets iterated for applicable matches, and once again the firmware gets updated with the host_ctx when any matches are found. Since this also requires a firmware change we add a new capability bit, and keep the old behaviour in case of older firmware without this bit set. This series starts by doing some preparation, then adding the new list and table entries. Next the functionality to link/unlink these entries are added, and finally this new functionality is enabled by adding the DECAP_V2 bit to the driver feature list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
Finally enable the decap_v2 feature bit now that all the other bits are in place to configure it correctly. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
With the neighbour entries now stored in a dedicated table there is no use to make use of the tunnel route cache anymore, so remove this. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
Add helper functions that can create links between flow rules and cached neighbour entries. Also add the relevant calls to these functions. * When a new neighbour entry gets added cycle through the saved pre_tun flow list and link any relevant matches. Update the neighbour table on the nfp with this new information. * When a new pre_tun flow rule gets added iterate through the save neighbour entries and link any relevant matches. Once again update the nfp neighbour table with any new links. * Do the inverse when deleting - remove any created links and also inform the nfp of this. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
This patch updates the way in which the tunnel neighbour entries are handled. Previously they were mostly send-and-forget, with just the destination IP's cached in a list. This update changes to a scheme where the neighbour entry information is stored in a hash table. The reason for this is that the neighbour table will now also be used on the decapsulation path, whereas previously it was only used for encapsulation. We need to save more of the neighbour information in order to link them with flower flows in follow up patches. Updating of the neighbour table is now also handled by the same function, instead of separate *_write_neigh_vX functions. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
Prepare for more rework in following patches by updating the existing nfp_neigh_structs. The update allows for the same headers to be used for both old and new firmware, with a slight length adjustment when sending the control message to the firmware. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
When a callback is received to invalidate a neighbour entry there is no need to try and populate any other flow information. Only the flowX->daddr information is needed as lookup key to delete an entry from the NFP neighbour table. Fix this by only doing the lookup if the callback is for a new entry. As part of this cleanup remove the setting of flow6.flowi6_proto, as this is not needed either, it looks to be a possible leftover from a previous implementation. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
Make sure that the rule also matches on source MAC address. On top of that also now save the src and dst MAC addresses similar to how vlan_tci is saved - this will be used in later comparisons with neighbour entries. Indicate if the flow matched on ipv4 or ipv6. Populate the vlan_tpid field that got added to the pre_run_rule struct as well. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
Add calls to add and remove flows to the predt_table. This very simply just allocates and add a new pretun entry if detected as such, and removes it when encountered on a delete flow. Compatibility for older firmware is kept in place through the DECAP_V2 feature bit. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Louis Peens authored
The previous implementation of using a pre_tun_table for decap has some limitations, causing flows to end up unoffloaded when in fact we are able to offload them. This is because the pre_tun_table does not have enough matching resolution. The next step is to instead make use of the neighbour table which already exists for the encap direction. This patch prepares for this by: - Moving nfp_tun_neigh/_v6 to main.h. - Creating two new "wrapping" structures, one to keep track of neighbour entries (previously they were send-and-forget), and another to keep track of pre_tun flows. - Create a new list in nfp_flower_priv to keep track of pre_tunnel flows - Create a new table in nfp_flower_priv to keep track of next neighbour entries - Initialising and destroying these new list/tables - Extending nfp_fl_payload->pre_tun_rule to save more information for future use. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueDavid S. Miller authored
Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-05 This series contains updates to ice driver only. Wan Jiabing converts an open coded min selection to min_t(). Maciej commonizes on a single find VSI function and removes the duplicated implementation. Wojciech adjusts the return value when exceeding ICE_MAX_CHAIN_WORDS to, a more appropriate, -ENOSPC and allows for the error to be propagated. Michal adds support for ndo_get_devlink_port(). Jake does some cleanup related to virtualization code. Mainly involving function header comments and wording changes. NULL checks are added to ice_get_vf_vsi() calls in order to prevent static analysis tools from complaining that a NULL value could be dereferenced. --- v2: Dropped patch 1: "ice: Add support for classid based queue selection" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: Improve MPTCP-level window tracking This series improves MPTCP receive window compliance with RFC 8684 and helps increase throughput on high-speed links. Note that patch 3 makes a change in tcp_output.c For the details, Paolo says: I've been chasing bad/unstable performance with multiple subflows on very high speed links. It looks like the root cause is due to the current mptcp-level congestion window handling. There are apparently a few different sub-issues: - the rcv_wnd is not effectively shared on the tx side, as each subflow takes in account only the value received by the underlaying TCP connection. This is addressed in patch 1/5 - The mptcp-level offered wnd right edge is currently allowed to shrink. Reading section 3.3.4.: """ The receive window is relative to the DATA_ACK. As in TCP, a receiver MUST NOT shrink the right edge of the receive window (i.e., DATA_ACK + receive window). The receiver will use the data sequence number to tell if a packet should be accepted at the connection level. """ I read the above as we need to reflect window right-edge tracking on the wire, see patch 4/5. - The offered window right edge tracking can happen concurrently on multiple subflows, but there is no mutex protection. We need an additional atomic operation - still patch 4/5 This series additionally bumps a few new MIBs to track all the above (ensure/observe that the suspected races actually take place). I could not access again the host where the issue was so noticeable, still in the current setup the tput changes from [6-18] Gbps to 19Gbps very stable. ==================== Link: https://lore.kernel.org/r/20220504215408.349318-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Track the exceptional handling of MPTCP-level offered window with a few more counters for observability. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
As per RFC, the offered MPTCP-level window should never shrink. While we currently track the right edge, we don't enforce the above constraint on the wire. Additionally, concurrent xmit on different subflows can end-up in erroneous right edge update. Address the above explicitly updating the announced window and protecting the update with an additional atomic operation (sic) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
The MPTCP RFC requires that the MPTCP-level receive window's right edge never moves backward. Currently the MPTCP code enforces such constraint while tracking the right edge, but it does not reflects it on the wire, as MPTCP lacks a suitable hook to update accordingly the TCP header. This change modifies the existing mptcp_write_options() hook, providing the current packet's TCP header to the MPTCP protocol, so that the next patch could implement the above mentioned constraint. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Bump a counter for counter when snd_wnd is shared among subflow, for observability's sake. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
As per RFC, mptcp subflows use a "shared" snd_wnd: the effective window is the maximum among the current values received on all subflows. Without such feature a data transfer using multiple subflows could block. Window sharing is currently implemented in the RX side: __tcp_select_window uses the mptcp-level receive buffer to compute the announced window. That is not enough: the TCP stack will stick to the window size received on the given subflow; we need to propagate the msk window value on each subflow at xmit time. Change the packet scheduler to ignore the subflow level window and use instead the msk level one Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Andy Shevchenko authored
There is export_uuid() function which exports uuid_t to the u8 array. Use it instead of open coding variant. This allows to hide the uuid_t internals. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220504091407.70661-1-andriy.shevchenko@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David Ahern authored
msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the export and make static in skbuff.c Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 05 May, 2022 15 commits
-
-
Jakub Kicinski authored
Make the drivers with custom tx napi weight call netif_napi_add_tx_weight(). Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Switch net callers to the new API not requiring the NAPI_POLL_WEIGHT argument. Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Remove a define which looks like a OS abstraction layer and makes spatch conversions on this driver problematic. Link: https://lore.kernel.org/r/20220504163939.551231-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe Leroy authored
powerpc's asm/prom.h includes some headers that it doesn't need itself. In order to clean powerpc's asm/prom.h up in a further step, first clean all files that include asm/prom.h Some files don't need asm/prom.h at all. For those ones, just remove inclusion of asm/prom.h Some files don't need any of the items provided by asm/prom.h, but need some of the headers included by asm/prom.h. For those ones, add the needed headers that are brought by asm/prom.h at the moment and remove asm/prom.h Some files really need asm/prom.h but also need some of the headers included by asm/prom.h. For those one, leave asm/prom.h but also add the needed headers so that they can be removed from asm/prom.h in a later step. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.euSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe Leroy authored
powerpc's <asm/prom.h> includes some headers that it doesn't need itself. In order to clean powerpc's <asm/prom.h> up in a further step, first clean all files that include <asm/prom.h> sungem_phy.c doesn't use any object provided by <asm/prom.h>. But removing inclusion of <asm/prom.h> leads to the following errors: CC drivers/net/sungem_phy.o drivers/net/sungem_phy.c: In function 'bcm5421_init': drivers/net/sungem_phy.c:448:42: error: implicit declaration of function 'of_get_parent'; did you mean 'dget_parent'? [-Werror=implicit-function-declaration] 448 | struct device_node *np = of_get_parent(phy->platform_data); | ^~~~~~~~~~~~~ | dget_parent drivers/net/sungem_phy.c:448:42: warning: initialization of 'struct device_node *' from 'int' makes pointer from integer without a cast [-Wint-conversion] drivers/net/sungem_phy.c:450:35: error: implicit declaration of function 'of_get_property' [-Werror=implicit-function-declaration] 450 | if (np == NULL || of_get_property(np, "no-autolowpower", NULL)) | ^~~~~~~~~~~~~~~ Remove <asm/prom.h> from included headers but add <linux/of.h> to handle the above. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/f7a7fab3ec5edf803d934fca04df22631c2b449d.1651662885.git.christophe.leroy@csgroup.euSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eyal Birger authored
The commit referenced in the "Fixes" tag added the SO_RCVMARK socket option for receiving the skb mark in the ancillary data. Since this is a new capability, and exposes admin configured details regarding the underlying network setup to sockets, let's align the needed capabilities with those of SO_MARK. Fixes: 6fd1d51c ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20220504095459.2663513-1-eyal.birger@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
This reverts commit 5e927a9f, reversing changes made to cfc1d91a. The discussion is still ongoing so let's remove the uAPI until the discussion settles. Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
tools/testing/selftests/net/forwarding/Makefile f62c5acc ("selftests/net/forwarding: add missing tests to Makefile") 50fe062c ("selftests: forwarding: new test, verify host mdb entries") https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jacob Keller authored
The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
This function definition was missing a comment describing its implementation. Add one. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
The comment explaining ice_reset_vf has an extraneous "the" with the "if the resets are disabled". Remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
Since commit fe99d1c0 ("ice: make ice_reset_all_vfs void"), the ice_reset_all_vfs function has not returned anything. The function comment still indicated it did. Fix this. While here, also add a line to clarify the function resets all VFs at once in response to hardware resets such as a PF reset. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
The ice_get_vf_vsi function can return NULL in some cases, such as if handling messages during a reset where the VSI is being removed and recreated. Several places throughout the driver do not bother to check whether this VSI pointer is valid. Static analysis tools maybe report issues because they detect paths where a potentially NULL pointer could be dereferenced. Fix this by checking the return value of ice_get_vf_vsi everywhere. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
The debug print in ice_vf_fdir_dump_info does not end in newlines. This can look confusing when reading the kernel log, as the next print will immediately continue on the same line. Fix this by adding the forgotten newline. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Michal Swiatkowski authored
Switch id should be the same for each netdevice on a driver. The id must be unique between devices on the same system, but does not need to be unique between devices on different systems. The switch id is used to locate ports on a switch and to know if aggregated ports belong to the same switch. To meet this requirements, use pci_get_dsn as switch id value, as this is unique value for each devices on the same system. Implementing switch id is needed by automatic tools for kubernetes. Set switch id by setting devlink port attribiutes and calling devlink_port_attrs_set while creating pf (for uplink) and vf (for representator) devlink port. To get switch id (in switchdev mode): cat /sys/class/net/$PF0/phys_switch_id Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-