- 12 Nov, 2020 17 commits
-
-
Parav Pandit authored
Cited commit in fixes tag overwrites the port attributes for the registered port. Avoid such error by checking registered flag before setting attributes. Fixes: 71ad8d55 ("devlink: Replace devlink_port_attrs_set parameters with a struct") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Martin Willi authored
VRF devices use an optimized direct path on output if a default qdisc is involved, calling Netfilter hooks directly. This path, however, does not consider Netfilter rules completing asynchronously, such as with NFQUEUE. The Netfilter okfn() is called for asynchronously accepted packets, but the VRF never passes that packet down the stack to send it out over the slave device. Using the slower redirect path for this seems not feasible, as we do not know beforehand if a Netfilter hook has asynchronously completing rules. Fix the use of asynchronously completing Netfilter rules in OUTPUT and POSTROUTING by using a special completion function that additionally calls dst_output() to pass the packet down the stack. Also, slightly adjust the use of nf_reset_ct() so that is called in the asynchronous case, too. Fixes: dcdd43c4 ("net: vrf: performance improvements for IPv4") Fixes: a9ec54d1 ("net: vrf: performance improvements for IPv6") Signed-off-by: Martin Willi <martin@strongswan.org> Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Wang Hai authored
If memory allocation for 'kbuf' succeed, cosa_write() doesn't have a corresponding kfree() in exception handling. Thus add kfree() for this function implementation. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Link: https://lore.kernel.org/r/20201110144614.43194-1-wanghai38@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move to the kernel.org patchwork instance, it has significantly lower latency for accessing from Europe and the US. Other quirks include the reply bot. Link: https://lore.kernel.org/r/20201110035120.642746-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Rohit Maheshwari says: ==================== cxgb4/ch_ktls: Fixes in nic tls code This series helps in fixing multiple nic ktls issues. Series is broken into 12 patches. Patch 1 avoids deciding tls packet based on decrypted bit. If its a retransmit packet which has tls handshake and finish (for encryption), decrypted bit won't be set there, and so we can't rely on decrypted bit. Patch 2 helps supporting linear skb. SKBs were assumed non-linear. Corrected the length extraction. Patch 3 fixes the checksum offload update in WR. Patch 4 fixes kernel panic happening due to creating new skb for each record. As part of fix driver will use same skb to send out one tls record (partial data) of the same SKB. Patch 5 fixes the problem of skb data length smaller than remaining data of the record. Patch 6 fixes the handling of SKBs which has tls header alone pkt, but not starting from beginning. Patch 7 avoids sending extra data which is used to make a record 16 byte aligned. We don't need to retransmit those extra few bytes. Patch 8 handles the cases where retransmit packet has tls starting exchanges which are prior to tls start marker. Patch 9 fixes the problem os skb free before HW knows about tcp FIN. Patch 10 handles the small packet case which has partial TAG bytes only. HW can't handle those, hence using sw crypto for such pkts. Patch 11 corrects the potential tcb update problem. Patch 12 stops the queue if queue reaches threshold value. v1->v2: - Corrected fixes tag issue. - Marked chcr_ktls_sw_fallback() static. v2->v3: - Replaced GFP_KERNEL with GFP_ATOMIC. - Removed mixed fixes. v3->v4: - Corrected fixes tag issue. v4->v5: - Separated mixed fixes from patch 4. v5-v6: - Fixes tag should be at the end. ==================== Link: https://lore.kernel.org/r/20201109105142.15398-1-rohitm@chelsio.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
Stop the queue and ask for the credits if queue reaches to threashold. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
context id and port id should be filled while sending tcb update. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
If TCP congestion caused a very small packets which only has some part fo the TAG, and that too is not till the end. HW can't handle such case, so falling back to sw crypto in such cases. v1->v2: - Marked chcr_ktls_sw_fallback() static. Fixes: dc05f3df ("chcr: Handle first or middle part of record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
If its a last packet and fin is set. Make sure FIN is informed to HW before skb gets freed. Fixes: 429765a1 ("chcr: handle partial end part of a record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
There could be a case where ACK for tls exchanges prior to start marker is missed out, and by the time tls is offloaded. This pkt should not be discarded and handled carefully. It could be plaintext alone or plaintext + finish as well. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
If a record starts in middle, reset TCB UNA so that we could avoid sending out extra packet which is needed to make it 16 byte aligned to start AES CTR. Check also considers prev_seq, which should be what is actually sent, not the skb data length. Avoid updating partial TAG to HW at any point of time, that's why we need to check if remaining part is smaller than TAG size, then reset TX_MAX to be TAG starting sequence number. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
If an skb has only header part which doesn't start from beginning, is not being handled properly. Fixes: dc05f3df ("chcr: Handle first or middle part of record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
trimmed length calculation goes wrong if skb has only tag part to send. It should be zero if there is no data bytes apart from TAG. Fixes: dc05f3df ("chcr: Handle first or middle part of record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
Creating SKB per tls record and freeing the original one causes panic. There will be race if connection reset is requested. By freeing original skb, refcnt will be decremented and that means, there is no pending record to send, and so tls_dev_del will be requested in control path while SKB of related connection is in queue. Better approach is to use same SKB to send one record (partial data) at a time. We still have to create a new SKB when partial last part of a record is requested. This fix introduces new API cxgb4_write_partial_sgl() to send partial part of skb. Present cxgb4_write_sgl can only provide feasibility to start from an offset which limits to header only and it can write sgls for the whole skb len. But this new API will help in both. It can start from any offset and can end writing in middle of the skb. v4->v5: - Removed extra changes. Fixes: 429765a1 ("chcr: handle partial end part of a record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
Checksum update was missing in the WR. Fixes: 429765a1 ("chcr: handle partial end part of a record") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
There is a possibility of linear skbs coming in. Correcting the length extraction logic. v2->v3: - Separated un-related changes from this patch. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rohit Maheshwari authored
If skb has retransmit data starting before start marker, e.g. ccs, decrypted bit won't be set for that, and if it has some data to encrypt, then it must be given to crypto ULD. So in place of decrypted, check if socket is tls offloaded. Also, unless skb has some data to encrypt, no need to give it for tls offload handling. v2->v3: - Removed ifdef. Fixes: 5a4b9fe7 ("cxgb4/chcr: complete record tx handling") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 11 Nov, 2020 10 commits
-
-
Martin Schiller authored
This fixes a regression for blocking connects introduced by commit 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect"). The x25->neighbour is already set to "NULL" by x25_disconnect() now, while a blocking connect is waiting in x25_wait_for_connection_establishment(). Therefore x25->neighbour must not be accessed here again and x25->state is also already set to X25_STATE_0 by x25_disconnect(). Fixes: 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Michael Walle authored
Since commit 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX") the network port of the Kontron sl28 board is broken. After the migration to phylink the device tree has to specify the in-band-mode property. Add it. Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201109110436.5906-1-michael@walle.ccSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Wang Hai authored
kmemleak report a memory leak as follows: unreferenced object 0xffff88810a596800 (size 512): comm "ip", pid 21558, jiffies 4297568990 (age 112.120s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 00 83 60 b0 ff ff ff ff ..........`..... backtrace: [<0000000022bbe21f>] tipc_topsrv_init_net+0x1f3/0xa70 [<00000000fe15ddf7>] ops_init+0xa8/0x3c0 [<00000000138af6f2>] setup_net+0x2de/0x7e0 [<000000008c6807a3>] copy_net_ns+0x27d/0x530 [<000000006b21adbd>] create_new_namespaces+0x382/0xa30 [<00000000bb169746>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<00000000fe2e42bc>] ksys_unshare+0x39c/0x780 [<0000000009ba3b19>] __x64_sys_unshare+0x2d/0x40 [<00000000614ad866>] do_syscall_64+0x56/0xa0 [<00000000a1b5ca3c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 'srv' is malloced in tipc_topsrv_start() but not free before leaving from the error handling cases. We need to free it. Fixes: 5c45ab24 ("tipc: make struct tipc_server private for server.c") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201109140913.47370-1-wanghai38@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Julian Wiedmann says: ==================== net/iucv: fixes 2020-11-09 One fix in the shutdown path for af_iucv sockets. This is relevant for stable as well. Also sending along an update for the Maintainers file. v1 -> v2: use the correct Fixes tag in patch 1 (Jakub) ==================== Link: https://lore.kernel.org/r/20201109075706.56573-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ursula Braun authored
I am retiring soon. Thus this patch removes myself from the MAINTAINERS file (s390 network). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> [jwi: fix up the subject] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ursula Braun authored
syzbot reported the following KASAN finding: BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 Read of size 2 at addr 000000000000021e by task syz-executor907/519 CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: [<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline] [<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135 [<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline] [<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118 [<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383 [<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline] [<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534 [<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 [<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457 [<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204 [<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline] [<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210 [<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 There is nothing to shutdown if a connection has never been established. Besides that iucv->hs_dev is not yet initialized if a socket is in IUCV_OPEN state and iucv->path is not yet initialized if socket is in IUCV_BOUND state. So, just skip the shutdown calls for a socket in these states. Fixes: eac3731b ("[S390]: Add AF_IUCV socket support") Fixes: 82492a35 ("af_iucv: add shutdown for HS transport") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> [jwi: correct one Fixes tag] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sven Van Asbroeck authored
In the net core, the struct net_device_ops -> ndo_set_rx_mode() callback is called with the dev->addr_list_lock spinlock held. However, this driver's ndo_set_rx_mode callback eventually calls lan743x_dp_write(), which acquires a mutex. Mutex acquisition may sleep, and this is not allowed when holding a spinlock. Fix by removing the dp_lock mutex entirely. Its purpose is to prevent concurrent accesses to the data port. No concurrent accesses are possible, because the dev->addr_list_lock spinlock in the core only lets through one thread at a time. Fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
zhangxiaoxu authored
When mv88e6xxx_fid_map return error, we lost free the table. Fix it. Fixes: bfb25542 ("net: dsa: mv88e6xxx: Add devlink regions") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangxiaoxu <zhangxiaoxu5@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20201109144416.1540867-1-zhangxiaoxu5@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mao Wenan authored
When net.ipv4.tcp_syncookies=1 and syn flood is happened, cookie_v4_check or cookie_v6_check tries to redo what tcp_v4_send_synack or tcp_v6_send_synack did, rsk_window_clamp will be changed if SOCK_RCVBUF is set, which will make rcv_wscale is different, the client still operates with initial window scale and can overshot granted window, the client use the initial scale but local server use new scale to advertise window value, and session work abnormally. Fixes: e88c64f0 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
The RTL8401-internal PHY identifies as RTL8201CP, and the init sequence in r8169, copied from vendor driver r8168, uses paged operations. Therefore set the same paged operation callbacks as for the other Realtek PHY's. Fixes: cdafdc29 ("r8169: sync support for RTL8401 with vendor driver") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/69882f7a-ca2f-e0c7-ae83-c9b6937282cd@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 10 Nov, 2020 6 commits
-
-
Sven Van Asbroeck authored
Commit 6f197fb6 ("lan743x: Added fixed link and RGMII support") assumes that chips with an internal PHY will never have a devicetree entry. This is incorrect: even for these chips, a devicetree entry can be useful e.g. to pass the mac address from bootloader to chip: &pcie { status = "okay"; host@0 { reg = <0 0 0 0 0>; #address-cells = <3>; #size-cells = <2>; lan7430: ethernet@0 { /* LAN7430 with internal PHY */ compatible = "microchip,lan743x"; status = "okay"; reg = <0 0 0 0 0>; /* filled in by bootloader */ local-mac-address = [00 00 00 00 00 00]; }; }; }; If a devicetree entry is present, the driver will not attach the chip to its internal phy, and the chip will be non-operational. Fix by tweaking the phy connection algorithm: - first try to connect to a phy specified in the devicetree (could be 'real' phy, or just a 'fixed-link') - if that doesn't succeed, try to connect to an internal phy, even if the chip has a devnode Tested on a LAN7430 with internal PHY. I cannot test a device using fixed-link, as I do not have access to one. Fixes: 6f197fb6 ("lan743x: Added fixed link and RGMII support") Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430 Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201108171224.23829-1-TheSven73@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paul Moore authored
The current NetLabel code doesn't correctly keep track of the netlink dump state in some cases, in particular when multiple interfaces with large configurations are loaded. The problem manifests itself by not reporting the full configuration to userspace, even though it is loaded and active in the kernel. This patch fixes this by ensuring that the dump state is properly reset when necessary inside the netlbl_unlabel_staticlist() function. Fixes: 8cc44579 ("NetLabel: Introduce static network labels for unlabeled connections") Signed-off-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@siflSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vlad Buslov authored
Iproute2 tc classifier terse dump has been accepted with modified syntax. Update the tests accordingly. Signed-off-by: Vlad Buslov <vlad@buslov.dev> Fixes: e7534fd4 ("selftests: implement flower classifier terse dump tests") Link: https://lore.kernel.org/r/20201107111928.453534-1-vlad@buslov.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
The mptcp proto struct currently does not provide the required limit for forward memory scheduling. Under pressure sk_rmem_schedule() will unconditionally try to use such field and will oops. Address the issue inheriting the tcp limit, as we already do for the wmem one. Fixes: 9c3f94e1 ("mptcp: add missing memory scheduling in the rx path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/37af798bd46f402fb7c79f57ebbdd00614f5d7fa.1604861097.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jonathan Neuschäfer authored
2.5 times faster would be 3.5 Gbps (4.375 Gbaud after 8b/10b encoding). Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20201107220822.1291215-1-j.neuschaefer@gmx.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alexander Lobakin authored
After updating userspace Ethtool from 5.7 to 5.9, I noticed that NETDEV_FEAT_CHANGE is no more raised when changing netdev features through Ethtool. That's because the old Ethtool ioctl interface always calls netdev_features_change() at the end of user request processing to inform the kernel that our netdevice has some features changed, but the new Netlink interface does not. Instead, it just notifies itself with ETHTOOL_MSG_FEATURES_NTF. Replace this ethtool_notify() call with netdev_features_change(), so the kernel will be aware of any features changes, just like in case with the ioctl interface. This does not omit Ethtool notifications, as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops ETHTOOL_MSG_FEATURES_NTF on it (net/ethtool/netlink.c:ethnl_netdev_event()). From v1 [1]: - dropped extra new line as advised by Jakub; - no functional changes. [1] https://lore.kernel.org/netdev/AlZXQ2o5uuTVHCfNGOiGgJ8vJ3KgO5YIWAnQjH0cDE@cp3-web-009.plabs.ch Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/ahA2YWXYICz5rbUSQqNG4roJ8OlJzzYQX7PTiG80@cp4-web-028.plabs.chSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Nov, 2020 2 commits
-
-
Stefano Brivio authored
Jianlin reports that a bridged IPv6 VXLAN endpoint, carrying IPv6 packets over a link with a PMTU estimation of exactly 1350 bytes, won't trigger ICMPv6 Packet Too Big replies when the encapsulated datagrams exceed said PMTU value. VXLAN over IPv6 adds 70 bytes of overhead, so an ICMPv6 reply indicating 1280 bytes as inner MTU would be legitimate and expected. This comes from an off-by-one error I introduced in checks added as part of commit 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets"), whose purpose was to prevent sending ICMPv6 Packet Too Big messages with an MTU lower than the smallest permissible IPv6 link MTU, i.e. 1280 bytes. In iptunnel_pmtud_check_icmpv6(), avoid triggering a reply only if the advertised MTU would be less than, and not equal to, 1280 bytes. Also fix the analogous comparison for IPv4, that is, skip the ICMP reply only if the resulting MTU is strictly less than 576 bytes. This becomes apparent while running the net/pmtu.sh bridged VXLAN or GENEVE selftests with adjusted lower-link MTU values. Using e.g. GENEVE, setting ll_mtu to the values reported below, in the test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() test function, we can see failures on the following tests: test | ll_mtu -------------------------------|-------- pmtu_ipv4_br_geneve4_exception | 626 pmtu_ipv6_br_geneve4_exception | 1330 pmtu_ipv6_br_geneve6_exception | 1350 owing to the different tunneling overheads implied by the corresponding configurations. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Link: https://lore.kernel.org/r/4f5fc2f33bfdf8409549fafd4f952b008bf04d63.1604681709.git.sbrivio@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oliver Herms authored
Due to the legacy usage of hard_header_len for SIT tunnels while already using infrastructure from net/ipv4/ip_tunnel.c the calculation of the path MTU in tnl_update_pmtu is incorrect. This leads to unnecessary creation of MTU exceptions for any flow going over a SIT tunnel. As SIT tunnels do not have a header themsevles other than their transport (L3, L2) headers we're leaving hard_header_len set to zero as tnl_update_pmtu is already taking care of the transport headers sizes. This will also help avoiding unnecessary IPv6 GC runs and spinlock contention seen when using SIT tunnels and for more than net.ipv6.route.gc_thresh flows. Fixes: c5441932 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201103104133.GA1573211@twsSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 07 Nov, 2020 5 commits
-
-
Vadym Kochan authored
With CONFIG_BRIDGE=m the compilation fails: ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event': prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled' in case the driver is statically enabled. Fix it by adding 'BRIDGE || BRIDGE=n' dependency. Fixes: e1189d9a ("net: marvell: prestera: Add Switchdev driver implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20201106161128.24069-1-vadym.kochan@plvision.euSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5 fixes 2020-11-03 v1->v2: - Fix fixes line tag in patch #1 - Toss ktls refcount leak fix, Maxim will look further into the root cause. - Toss eswitch chain 0 prio patch, until we determine if it is needed for -rc and net. * tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix incorrect access of RCU-protected xdp_prog net/mlx5e: Fix VXLAN synchronization after function reload net/mlx5: E-switch, Avoid extack error log for disabled vport net/mlx5: Fix deletion of duplicate rules net/mlx5e: Use spin_lock_bh for async_icosq_lock net/mlx5e: Protect encap route dev from concurrent release net/mlx5e: Fix modify header actions memory leak ==================== Link: https://lore.kernel.org/r/20201105202129.23644-1-saeedm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
RTL8125B has same or similar short packet hw padding bug as RTL8168evl. The main workaround has been extended accordingly, however we have to disable also hw checksumming for short packets on affected new chip versions. Instead of checking for an affected chip version let's simply disable hw checksumming for short packets in general. v2: - remove the version checks and disable short packet hw csum in general - reflect this in commit title and message Fixes: 0439297b ("r8169: add support for RTL8125B") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/7fbb35f0-e244-ef65-aa55-3872d7d38698@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
The caller of rtl8169_tso_csum_v2() frees the skb if false is returned. eth_skb_pad() internally frees the skb on error what would result in a double free. Therefore use __skb_put_padto() directly and instruct it to not free the skb on error. Fixes: b423e9ae ("r8169: fix offloaded tx checksum for small packets.") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Alexei Starovoitov says: ==================== pull-request: bpf 2020-11-06 1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David. 2) Tighten bpf_lsm function check, from KP. 3) Fix bpftool attaching to flow dissector, from Lorenz. 4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Update verification logic for LSM programs bpf: Zero-fill re-used per-cpu map element bpf: BPF_PRELOAD depends on BPF_SYSCALL tools/bpftool: Fix attaching flow dissector libbpf: Fix possible use after free in xsk_socket__delete libbpf: Fix null dereference in xsk_socket__delete libbpf, hashmap: Fix undefined behavior in hash_bits bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE tools, bpftool: Remove two unused variables. tools, bpftool: Avoid array index warnings. xsk: Fix possible memory leak at socket close bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs samples/bpf: Set rlimit for memlock to infinity in all samples bpf: Fix -Wshadow warnings selftest/bpf: Fix profiler test using CO-RE relocation for enums ==================== Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-