- 05 Feb, 2022 39 commits
-
-
Eric Dumazet authored
We have plans for increasing MAX_SKB_FRAGS, but sk_msg_sg::copy is currently an unsigned long, limiting MAX_SKB_FRAGS to 30 on 32bit arches. Convert it to a bitmap, as Jakub suggested. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Instead of disabling TSO at compile time if MAX_SKB_FRAGS > 32, implement ndo_features_check() method for this driver for a more dynamic handling. If skb has more than 32 frags and is a GSO packet, force software segmentation. Most locally generated packets will use a small number of fragments anyway. For forwarding workloads, we can limit gro_max_size at ingress, we might also implement gro_max_segs if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
It seems this one-element array is not actually being used as an array of variable size, so we can just replace it with just a non-array object of type struct desc_frag and refactor a bit the rest of the code. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). This issue was found with the help of Coccinelle and audited and fixed, manually. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). This issue was found with the help of Coccinelle and audited and fixed, manually. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Haiyang Zhang says: ==================== net: mana: Add handling of CQE_RX_TRUNCATED and a cleanup Add handling of CQE_RX_TRUNCATED and a cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
The switch statement already ensures cqe_type == CQE_RX_OKAY at that point. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
The proper way to drop this kind of CQE is advancing rxq tail without indicating the packet to the upper network layer. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== net: device tracking improvements Main goal of this series is to be able to detect the following case which apparently is still haunting us. dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here (as before this series) v2: third patch: I replaced the dev_put() in linkwatch_do_dev() with __dev_put(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We are still chasing some syzbot reports where we think a rogue dev_put() is called with no corresponding prior dev_hold(). Unfortunately it eats a reference on dev->dev_refcnt taken by innocent dev_hold_track(), meaning that the refcount saturation splat comes too late to be useful. Make sure that 'not tracked' dev_put() and dev_hold() better use CONFIG_NET_DEV_REFCNT_TRACKER=y debug infrastructure: Prior patch in the series allowed ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and to use a separate refcount only to detect too many put() even in the following case: dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here Add clarification about netdev_tracker_alloc() role. v2: I replaced the dev_put() in linkwatch_do_dev() with __dev_put() because callers called netdev_tracker_free(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We are still chasing a netdev refcount imbalance, and we suspect we have one rogue dev_put() that is consuming a reference taken from a dev_hold_track() To detect this case, allow ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and use a dedicated refcount_t just for them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Whenever ref_tracker_dir_init() is called, mark the struct ref_tracker_dir as dead. Test the dead status from ref_tracker_alloc() and ref_tracker_free() This should detect buggy dev_put()/dev_hold() happening too late in netdevice dismantle process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== ipv6: mc_forwarding changes First patch removes minor data-races, as mc_forwarding can be locklessly read in fast path. Second patch adds a short cut in ip6mr_sk_done() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
In many cases, ip6mr_sk_done() is called while no ipmr socket has been registered. This removes 4 rtnl acquisitions per netns dismantle, with following callers: igmp6_net_exit(), tcpv6_net_exit(), ndisc_net_exit() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We generally default the vendor to y and the drivers itself to n. NET_DSA_REALTEK, however, selects a whole bunch of things, so it's not a pure "vendor selection" knob. Let's default it all to n. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
Phylink will use PCS polling whenever phylink_config.pcs_poll or the phylink_pcs poll member is set. As this driver sets both, remove the former. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
phylink_set_10g_modes() is no longer used with the conversion of drivers to phylink_generic_validate(), so we can remove it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Paolo Abeni says: ==================== gro: a couple of minor optimization This series collects a couple of small optimizations for the GRO engine, reducing slightly the number of cycles for dev_gro_receive(). The delta is within noise range in tput tests, but with big TCP coming every cycle saved from the GRO engine will count - I hope ;) v1 -> v2: - a few cleanup suggested from Alexander(s) - moved away the more controversial 3rd patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
While inspecting some perf report, I noticed that the compiler emits suboptimal code for the napi CB initialization, fetching and storing multiple times the memory for flags bitfield. This is with gcc 10.3.1, but I observed the same with older compiler versions. We can help the compiler to do a nicer work clearing several fields at once using an u32 alias. The generated code is quite smaller, with the same number of conditional. Before: objdump -t net/core/gro.o | grep " F .text" 0000000000000bb0 l F .text 0000000000000357 dev_gro_receive After: 0000000000000bb0 l F .text 000000000000033c dev_gro_receive v1 -> v2: - use struct_group (Alexander and Alex) RFC -> v1: - use __struct_group to delimit the zeroed area (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
After commit 5e10da53 ("skbuff: allow 'slow_gro' for skb carring sock reference") and commit af352460 ("net: fix GRO skb truesize update") the truesize of the skb with stolen head is properly updated by the GRO engine, we don't need anymore resetting it at recycle time. v1 -> v2: - clarify the commit message (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
This is a copy and paste bug. It was supposed to check "clear_skb" instead of "write_skb". Fixes: 2cd54856 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Horatiu Vultur says: ==================== net: lan966x: add support for mcast snooping Implement the switchdev callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED to allow to enable/disable multicast snooping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When the multicast snooping is disabled, the mdb entries should be removed from the HW, but they still need to be kept in memory for when the mcast_snooping will be enabled again. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
The callback allows to enable/disable multicast snooping. When the snooping is enabled, all IGMP and MLD frames are redirected to the CPU, therefore make sure not to set the skb flag 'offload_fwd_mark'. The HW will not flood multicast ipv4/ipv6 data frames. When the snooping is disabled, the HW will flood IGMP, MLD and multicast ipv4/ipv6 frames according to the mcast_flood flag. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
When enabling the multicast snooping, the forwarding of the IPV6 frames has it's own forwarding mask. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paul Blakey authored
Currently tc skb extension is used to send miss info from tc to ovs datapath module, and driver to tc. For the tc to ovs miss it is currently always allocated even if it will not be used by ovs datapath (as it depends on a requested feature). Export the static key which is used by openvswitch module to guard this code path as well, so it will be skipped if ovs datapath doesn't need it. Enable this code path once ovs datapath needs it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: Improve set-flags command and update self tests Patches 1-3 allow more flexibility in the combinations of features and flags allowed with the MPTCP_PM_CMD_SET_FLAGS netlink command, and add self test case coverage for the new functionality. Patches 4-6 and 9 refactor the mptcp_join.sh self tests to allow them to configure all of the test cases using either the pm_nl_ctl utility (part of the mptcp self tests) or the 'ip mptcp' command (from iproute2). The default remains to use pm_nl_ctl. Patches 7 and 8 update the pm_netlink.sh self tests to cover the use of endpoint ids to set endpoint flags (instead of just addresses). ==================== Link: https://lore.kernel.org/r/20220205000337.187292-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added a command line option '-i' for mptcp_join.sh to use 'ip mptcp' commands instead of using 'pm_nl_ctl' commands to deal with PM netlink. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added the setting flags test cases, using both addr-based and id-based lookups for the setting address. The output looks like this: set flags (backup) [ OK ] (nobackup) [ OK ] (fullmesh) [ OK ] (nofullmesh) [ OK ] (backup,fullmesh) [ OK ] Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added the id argument for setting the address flags in pm_nl_ctl. Usage: pm_nl_ctl set id 1 flags backup Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch implemented a new function named pm_nl_set_endpoint(), wrapped the PM netlink commands 'ip mptcp endpoint change flags' and 'pm_nl_ctl set flags' in it, and used a new argument 'ip_mptcp' to choose which one to use to set the flags of the PM endpoint. 'ip mptcp' used the ID number argument to find out the address to change flags, while 'pm_nl_ctl' used the address and port number arguments. So we need to parse the address ID from the PM dump output as well as the address and port number. Used this wrapper in do_transfer() instead of using the pm_nl_ctl command directly. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch implemented a new function named pm_nl_show_endpoints(), wrapped the PM netlink commands 'ip mptcp endpoint show' and 'pm_nl_ctl dump' in it, used a new argument 'ip_mptcp' to choose which one to use to show all the PM endpoints. Used this wrapper in do_transfer() instead of using the pm_nl_ctl commands directly. The original 'pos+=5' in the remoing tests only works for the output of 'pm_nl_ctl show': id 1 flags subflow 10.0.1.1 It doesn't work for the output of 'ip mptcp endpoint show': 10.0.1.1 id 1 subflow So implemented a more flexible approach to get the address ID from the PM dump output to fit for both commands. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added four basic 'ip mptcp' wrappers: pm_nl_set_limits() pm_nl_add_endpoint() pm_nl_del_endpoint() pm_nl_flush_endpoint(). Wrapped the PM netlink commands 'ip mptcp' and 'pm_nl_ctl' in them, and used a new argument 'ip_mptcp' to choose which one to use for setting the PM limits, adding or deleting the PM endpoint. Used the wrappers in all the selftests in mptcp_join.sh instead of using the pm_nl_ctl commands directly. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added the backup testcase using an address with a port number. The original backup tests only work for the output of 'pm_nl_ctl dump' without the port number. It chooses the last item in the dump to parse the address in it, and in this case, the address is showed at the end of the item. But it doesn't work for the dump with the port number, in this case, the port number is showed at the end of the item, not the address. So implemented a more flexible approach to get the address and the port number from the dump to fit for the port number case. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added the port argument for setting the address flags in pm_nl_ctl. Usage: pm_nl_ctl set 10.0.2.1 flags backup port 10100 Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
It's illegal to use both port and non-signal flags for adding address. But it's legal to use both of them for setting flags, which always uses non-signal flags, backup or fullmesh. This patch moves this non-signal flag with port check from mptcp_pm_parse_addr() to mptcp_nl_cmd_add_addr(). Do the check only when adding addresses, not setting flags or deleting addresses. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Justin Iurman says: ==================== Support for the IOAM insertion frequency The insertion frequency is represented as "k/n", meaning IOAM will be added to {k} packets over {n} packets, with 0 < k <= n and 1 <= {k,n} <= 1000000. Therefore, it provides the following percentages of insertion frequency: [0.0001% (min) ... 100% (max)]. Not only this solution allows an operator to apply dynamic frequencies based on the current traffic load, but it also provides some flexibility, i.e., by distinguishing similar cases (e.g., "1/2" and "2/4"). "1/2" = Y N Y N Y N Y N ... "2/4" = Y Y N N Y Y N N ... ==================== Link: https://lore.kernel.org/r/20220202142554.9691-1-justin.iurman@uliege.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Justin Iurman authored
Add support for the IOAM insertion frequency inside its lwtunnel output function. This patch introduces a new (atomic) counter for packets, based on which the algorithm will decide if IOAM should be added or not. Default frequency is "1/1" (i.e., applied to all packets) for backward compatibility. The iproute2 patch is ready and will be submitted as soon as this one is accepted. Previous iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ mode ... ] ... New iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ freq k/n ] [ mode ... ] ... Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Justin Iurman authored
Add the insertion frequency uapi for IOAM lwtunnels. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 04 Feb, 2022 1 commit
-
-
Jakub Kicinski authored
Nothing in ipv6.h needs ndisc.h, drop it. Link: https://lore.kernel.org/r/20220203043457.2222388-1-kuba@kernel.orgAcked-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20220203231240.2297588-1-kuba@kernel.orgReviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-