An error occurred fetching the project authors.
- 21 Jul, 2017 1 commit
-
-
Eric Dumazet authored
commit 6f64ec74 upstream. Similar to the fix provided by Dominik Heidler in commit 9b3dc0a1 ("l2tp: cast l2tp traffic counter to unsigned") we need to take care of 32bit kernels in dev_get_stats(). When using atomic_long_read(), we add a 'long' to u64 and might misinterpret high order bit, unless we cast to unsigned. Fixes: caf586e5 ("net: add a core netdev->rx_dropped counter") Fixes: 015f0688 ("net: net: add a core netdev->tx_dropped counter") Fixes: 6e7333d3 ("net: add rx_nohandler stat counter") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 Jul, 2017 1 commit
-
-
Alexander Potapenko authored
[ Upstream commit c28294b9 ] KMSAN reported a use of uninitialized memory in dev_set_alias(), which was caused by calling strlcpy() (which in turn called strlen()) on the user-supplied non-terminated string. Signed-off-by:
Alexander Potapenko <glider@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 Apr, 2017 3 commits
-
-
Willem de Bruijn authored
[ Upstream commit 104ba78c ] When transmitting on a packet socket with PACKET_VNET_HDR and PACKET_QDISC_BYPASS, validate device support for features requested in vnet_hdr. Drop TSO packets sent to devices that do not support TSO or have the feature disabled. Note that the latter currently do process those packets correctly, regardless of not advertising the feature. Because of SKB_GSO_DODGY, it is not sufficient to test device features with netif_needs_gso. Full validate_xmit_skb is needed. Switch to software checksum for non-TSO packets that request checksum offload if that device feature is unsupported or disabled. Note that similar to the TSO case, device drivers may perform checksum offload correctly even when not advertising it. When switching to software checksum, packets hit skb_checksum_help, which has two BUG_ON checksum not in linear segment. Packet sockets always allocate at least up to csum_start + csum_off + 2 as linear. Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c ethtool -K eth0 tso off tx on psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N ethtool -K eth0 tx off psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N v2: - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list) Fixes: d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by:
Willem de Bruijn <willemb@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Collins authored
[ Upstream commit 93409033 ] This is a respin of a patch to fix a relatively easily reproducible kernel panic related to the all_adj_list handling for netdevs in recent kernels. The following sequence of commands will reproduce the issue: ip link add link eth0 name eth0.100 type vlan id 100 ip link add link eth0 name eth0.200 type vlan id 200 ip link add name testbr type bridge ip link set eth0.100 master testbr ip link set eth0.200 master testbr ip link add link testbr mac0 type macvlan ip link delete dev testbr This creates an upper/lower tree of (excuse the poor ASCII art): /---eth0.100-eth0 mac0-testbr- \---eth0.200-eth0 When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one reference to eth0 is added, so this results in a panic. This change adds reference count propagation so things are handled properly. Matthias Schiffer reported a similar crash in batman-adv: https://github.com/freifunk-gluon/gluon/issues/680 https://www.open-mesh.org/issues/247 which this patch also seems to resolve. Signed-off-by:
Andrew Collins <acollins@cradlepoint.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 5fa8bbda ] Dmitry reported a warning [1] showing that we were calling net_disable_timestamp() -> static_key_slow_dec() from a non process context. Grabbing a mutex while holding a spinlock or rcu_read_lock() is not allowed. As Cong suggested, we now use a work queue. It is possible netstamp_clear() exits while netstamp_needed_deferred is not zero, but it is probably not worth trying to do better than that. netstamp_needed_deferred atomic tracks the exact number of deferred decrements. [1] [ INFO: suspicious RCU usage. ] 4.10.0-rc5+ #192 Not tainted ------------------------------- ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 0 2 locks held by syz-executor14/23111: #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock include/net/sock.h:1454 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919 #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook include/linux/netfilter.h:201 [inline] #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160 stack backtrace: CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559 RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005 RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000 R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752 in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14 INFO: lockdep is turned off. CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 Fixes: b90e5794 ("net: dont call jump_label_dec from irq context") Suggested-by:
Cong Wang <xiyou.wangcong@gmail.com> Reported-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 Jan, 2017 1 commit
-
-
Jesse Gross authored
[ Upstream commit fac8e0f5 ] When drivers express support for TSO of encapsulated packets, they only mean that they can do it for one layer of encapsulation. Supporting additional levels would mean updating, at a minimum, more IP length fields and they are unaware of this. No encapsulation device expresses support for handling offloaded encapsulated packets, so we won't generate these types of frames in the transmit path. However, GRO doesn't have a check for multiple levels of encapsulation and will attempt to build them. UDP tunnel GRO actually does prevent this situation but it only handles multiple UDP tunnels stacked on top of each other. This generalizes that solution to prevent any kind of tunnel stacking that would cause problems. Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by:
Jesse Gross <jesse@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <alexander.levin@verizon.com>
-
- 28 Sep, 2015 1 commit
-
-
Julian Anastasov authored
[ Upstream commit 2c17d27c ] Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 27 Sep, 2015 2 commits
-
-
Julian Anastasov authored
[ Upstream commit e9e4dd32 ] commit 381c759d ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by:
Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric Dumazet authored
[ Upstream commit d339727c ] User space can crash kernel with ip link add ifb10 numtxqueues 100000 type ifb We must replace a BUG_ON() by proper test and return -EINVAL for crazy values. Fixes: 60877a32 ("net: allow large number of tx queues") Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 15 Jun, 2015 1 commit
-
-
Vlad Yasevich authored
[ Upstream commit d66bf7dd ] The code in __netdev_upper_dev_link() has an over-stringent loop detection logic that actually prevents valid configurations from working correctly. In particular, the logic returns an error if an upper device is already in the list of all upper devices for a given dev. This particular check seems to be a overzealous as it disallows perfectly valid configurations. For example: # ip l a link eth0 name eth0.10 type vlan id 10 # ip l a dev br0 typ bridge # ip l s eth0.10 master br0 # ip l s eth0 master br0 <--- Will fail If you switch the last two commands (add eth0 first), then both will succeed. If after that, you remove eth0 and try to re-add it, it will fail! It appears to be enough to simply check adj_list to keeps things safe. I've tried stacking multiple devices multiple times in all different combinations, and either rx_handler registration prevented the stacking of the device linking cought the error. Signed-off-by:
Vladislav Yasevich <vyasevic@redhat.com> Acked-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Veaceslav Falico <vfalico@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 27 Apr, 2015 3 commits
-
-
Jiri Pirko authored
[ Upstream commit 5968250c ] Use them to push skb->vlan_tci into the payload and avoid code duplication. Signed-off-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Pravin B Shelar <pshelar@nicira.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Pirko authored
[ Upstream commit 62749e2c ] Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Pravin B Shelar <pshelar@nicira.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
hannes@stressinduktion.org authored
[ Upstream commit f60e5990 ] We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 14 Mar, 2015 1 commit
-
-
Matthew Thode authored
[ Upstream commit a4176a93 ] colons are used as a separator in netdev device lookup in dev_ioctl.c Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME Signed-off-by:
Matthew Thode <mthode@mthode.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 27 Feb, 2015 1 commit
-
-
Eric Dumazet authored
[ Upstream commit ac64da0b ] softnet_data.input_pkt_queue is protected by a spinlock that we must hold when transferring packets from victim queue to an active one. This is because other cpus could still be trying to enqueue packets into victim queue. A second problem is that when we transfert the NAPI poll_list from victim to current cpu, we absolutely need to special case the percpu backlog, because we do not want to add complex locking to protect process_queue : Only owner cpu is allowed to manipulate it, unless cpu is offline. Based on initial patch from Prasad Sodagudi & Subash Abhinov Kasiviswanathan. This version is better because we do not slow down packet processing, only make migration safer. Reported-by:
Prasad Sodagudi <psodagud@codeaurora.org> Reported-by:
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 Jan, 2015 4 commits
-
-
Jesse Gross authored
[ Upstream commit 5f35227e ] GSO isn't the only offload feature with restrictions that potentially can't be expressed with the current features mechanism. Checksum is another although it's a general issue that could in theory apply to anything. Even if it may be possible to implement these restrictions in other ways, it can result in duplicate code or inefficient per-packet behavior. This generalizes ndo_gso_check so that drivers can remove any features that don't make sense for a given packet, similar to netif_skb_features(). It also converts existing driver restrictions to the new format, completing the work that was done to support tunnel protocols since the issues apply to checksums as well. By actually removing features from the set that are used to do offloading, it solves another problem with the existing interface. In these cases, GSO would run with the original set of features and not do anything because it appears that segmentation is not required. CC: Tom Herbert <therbert@google.com> CC: Joe Stringer <joestringer@nicira.com> CC: Eric Dumazet <edumazet@google.com> CC: Hayes Wang <hayeswang@realtek.com> Signed-off-by:
Jesse Gross <jesse@nicira.com> Acked-by:
Tom Herbert <therbert@google.com> Fixes: 04ffcb25 ("net: Add ndo_gso_check") Tested-by:
Hayes Wang <hayeswang@realtek.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jay Vosburgh authored
[ Upstream commit 2c26d34b ] When using VXLAN tunnels and a sky2 device, I have experienced checksum failures of the following type: [ 4297.761899] eth0: hw csum failure [...] [ 4297.765223] Call Trace: [ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58 [ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50 [ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40 [ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0 [ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950 [ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360 These are reliably reproduced in a network topology of: container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch When VXLAN encapsulated traffic is received from a similarly configured peer, the above warning is generated in the receive processing of the encapsulated packet. Note that the warning is associated with the container eth0. The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and because the packet is an encapsulated Ethernet frame, the checksum generated by the hardware includes the inner protocol and Ethernet headers. The receive code is careful to update the skb->csum, except in __dev_forward_skb, as called by dev_forward_skb. __dev_forward_skb calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN) to skip over the Ethernet header, but does not update skb->csum when doing so. This patch resolves the problem by adding a call to skb_postpull_rcsum to update the skb->csum after the call to eth_type_trans. Signed-off-by:
Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Toshiaki Makita authored
[ Upstream commit 796f2da8 ] When vlan tags are stacked, it is very likely that the outer tag is stored in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto. Currently netif_skb_features() first looks at skb->protocol even if there is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol encapsulated by the inner vlan instead of the inner vlan protocol. This allows GSO packets to be passed to HW and they end up being corrupted. Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.") Signed-off-by:
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Wang authored
[ Upstream commit af6dabc9 ] Commit cecda693 ("net: keep original skb which only needs header checking during software GSO") keeps the original skb for packets that only needs header check, but it doesn't drop the packet if software segmentation or header check were failed. Fixes cecda693 ("net: keep original skb which only needs header checking during software GSO") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 Oct, 2014 1 commit
-
-
Eric Dumazet authored
Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Roman Gushchin <klamm@yandex-team.ru> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 15 Oct, 2014 1 commit
-
-
Tom Herbert authored
Add ndo_gso_check which a device can define to indicate whether is is capable of doing GSO on a packet. This funciton would be called from the stack to determine whether software GSO is needed to be done. A driver should populate this function if it advertises GSO types for which there are combinations that it wouldn't be able to handle. For instance a device that performs UDP tunneling might only implement support for transparent Ethernet bridging type of inner packets or might have limitations on lengths of inner headers. Signed-off-by:
Tom Herbert <therbert@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 07 Oct, 2014 1 commit
-
-
Eric Dumazet authored
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 06 Oct, 2014 3 commits
-
-
Eric Dumazet authored
Marking this as static allows compiler to inline it. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Some TSO engines might have a too heavy setup cost, that impacts performance on hosts sending small bursts (2 MSS per packet). This patch adds a device gso_min_segs, allowing drivers to set a minimum segment size for TSO packets, according to the NIC performance. Tested on a mlx4 NIC, this allows to get a ~110% increase of throughput when sending 2 MSS per packet. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Its unfortunate we have to walk again skb list to find the tail after segmentation, even if data is probably hot in cpu caches. skb_segment() can store the tail of the list into segs->prev, and validate_xmit_skb_list() can immediately get the tail. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 03 Oct, 2014 1 commit
-
-
Eric Dumazet authored
Validation of skb can be pretty expensive : GSO segmentation and/or checksum computations. We can do this without holding qdisc lock, so that other cpus can queue additional packets. Trick is that requeued packets were already validated, so we carry a boolean so that sch_direct_xmit() can validate a fresh skb list, or directly use an old one. Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads host. Turning TSO on or off had no effect on throughput, only few more cpu cycles. Lock contention on qdisc lock disappeared. Same if disabling TX checksum offload. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 28 Sep, 2014 1 commit
-
-
Dan Williams authored
Per commit "77873803 net_dma: mark broken" net_dma is no longer used and there is no plan to fix it. This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards. Reverting the remainder of the net_dma induced changes is deferred to subsequent patches. Marked for stable due to Roman's report of a memory leak in dma_pin_iovec_pages(): https://lkml.org/lkml/2014/9/3/177 Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: David Whipple <whipple@securedatainnovations.ch> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: <stable@vger.kernel.org> Reported-by:
Roman Gushchin <klamm@yandex-team.ru> Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- 26 Sep, 2014 2 commits
-
-
Joe Perches authored
No caller or macro uses the return value so make all the functions return void. Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
The send_check logic was only interesting in cases of TCP offload and UDP UFO where the checksum needed to be initialized to the pseudo header checksum. Now we've moved that logic into the related gso_segment functions so gso_send_check is no longer needed. Signed-off-by:
Tom Herbert <therbert@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 22 Sep, 2014 1 commit
-
-
Jason Wang authored
Commit ce93718f ("net: Don't keep around original SKB when we software segment GSO frames") frees the original skb after software GSO even for dodgy gso skbs. This breaks the stream throughput from untrusted sources, since only header checking was done during software GSO instead of a true segmentation. This patch fixes this by freeing the original gso skb only when it was really segmented by software. Fixes ce93718f ("net: Don't keep around original SKB when we software segment GSO frames.") Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Jason Wang <jasowang@redhat.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 15 Sep, 2014 1 commit
-
-
Alexander Y. Fomichev authored
__netdev_adjacent_dev_insert may add adjust device of different net namespace, without proper check it leads to emergence of broken sysfs links from/to devices in another namespace. Fix: rewrite netdev_adjacent_is_neigh_list macro as a function, move net_eq check into netdev_adjacent_is_neigh_list. (thanks David) related to: 4c75431aSigned-off-by:
Alexander Fomichev <git.user@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 13 Sep, 2014 3 commits
-
-
WANG Cong authored
These code is now protected by rtnl lock, rcu read lock is useless now. Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 04 Sep, 2014 1 commit
-
-
Jesper Dangaard Brouer authored
In commit 50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc") the validation code was moved out of dev_hard_start_xmit and into dequeue_skb. However this overlooked the fact that we do not always enqueue the skb onto a qdisc. First situation is if qdisc have flag TCQ_F_CAN_BYPASS and qdisc is empty. Second situation is if there is no qdisc on the device, which is a common case for software devices. Originally spotted and inital patch by Alexander Duyck. As a result Alex was seeing issues trying to connect to a vhost_net interface after commit 50cbe9ab was applied. Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in __dev_queue_xmit() when no qdisc. Also handle the error situation where dev_hard_start_xmit() could return a skb list, and does not return dev_xmit_complete(rc) and falls through to the kfree_skb(), in that situation it should call kfree_skb_list(). Fixes: 50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc") Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 02 Sep, 2014 5 commits
-
-
Tom Herbert authored
This flag indicates that an invalid checksum was detected in the packet. __skb_mark_checksum_bad helper function was added to set this. Checksums can be marked bad from a driver or the GRO path (the latter is implemented in this patch). csum_bad is checked in __skb_checksum_validate_complete (i.e. calling that when ip_summed == CHECKSUM_NONE). csum_bad works in conjunction with ip_summed value. In the case that ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the first (or next) checksum encountered in the packet is bad. When ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY, csum_level == 1, and csum_bad is set-- then the third checksum in the packet is bad. In the normal path, the packet will be dropped when processing the protocol layer of the bad checksum: __skb_decr_checksum_unnecessary called twice for the good checksums changing ip_summed to CHECKSUM_NONE so that __skb_checksum_validate_complete is called to validate the third checksum and that will fail since csum_bad is set. Signed-off-by:
Tom Herbert <therbert@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Now fundamentally we can process lists of SKBs as cheaply as single packets. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Just maintain the list properly by returning the head of the remaining SKB list from dev_hard_start_xmit(). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
dev_hard_start_xmit() does two things, it first validates and canonicalizes the SKB, then it actually sends it. Make a set of helper functions for doing the first part. Signed-off-by:
David S. Miller <davem@davemloft.net>
-