- 25 Mar, 2018 18 commits
-
-
David S. Miller authored
Yonghong Song says: ==================== net: permit skb_segment on head_frag frag_list skb One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. Patch #1 provides a simple solution to avoid BUG_ON. If list_skb->head_frag is true, its page-backed frag will be processed before the list_skb->frags. Patch #2 provides a test case in test_bpf module which constructs a skb and calls skb_segment() directly. The test case is able to trigger the BUG_ON without Patch #1. The patch has been tested in the following setup: ipv6_host <-> nat_server <-> ipv4_host where nat_server has a bpf program doing ipv4<->ipv6 translation and forwarding through clsact hook bpf_skb_change_proto. Changelog: v5 -> v6: . Added back missed BUG_ON(!nfrags) for zero skb_headlen(skb) case, plus a couple of cosmetic changes, from Alexander. v4 -> v5: . Replace local variable head_frag with a static inline function skb_head_frag_to_page_desc which gets the head_frag on-demand. This makes code more readable and also does not increase the stack size, from Alexander. . Remove the "if(nfrags)" guard for skb_orphan_frags and skb_zerocopy_clone as I found that they can handle zero-frag skb (with non-zero skb_headlen(skb)) properly. . Properly release segment list from skb_segment() in the test, from Eric. v3 -> v4: . Remove dynamic memory allocation and use rewinding for both index and frag to remove one branch in fast path, from Alexander. . Fix a bunch of issues in test_bpf skb_segment() test, including proper way to allocate skb, proper function argument for skb_add_rx_frag and not freeint skb, etc., from Eric. v2 -> v3: . Use starting frag index -1 (instead of 0) to special process head_frag before other frags in the skb, from Alexander Duyck. v1 -> v2: . Removed never-hit BUG_ON, spotted by Linyu Yuan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yonghong Song authored
Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] ------------[ cut here ]------------ [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: 0000 [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yonghong Song authored
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525 #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7 #4 [ffff883ffef03668] die at ffffffff8101deb2 #5 [ffff883ffef03698] do_trap at ffffffff8101a700 #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0 #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab [exception RIP: skb_segment+3044] RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216 RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011 RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500 RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80 R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7 Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2018-03-23 This series contains updates to ixgbe and ixgbevf only. Paul adds status register reads to reduce a potential race condition where registers can read 0xFFFFFFFF during a PCI reset, which in turn causes the driver to remove the adapter. Then fixes an assignment operation with an "OR" operation. Shannon Nelson provides several IPsec offload cleanups to ixgbe, as well as a patch to enable TSO with IPsec offload. Tony provides the much anticipated XDP support for ixgbevf. Currently, pass, drop and XDP_TX actions are supported, as well as meta data and stats reporting. Björn Töpel tweaks the page counting for XDP_REDIRECT, since a page can have its reference count decreased via the xdp_do_redirect() call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Intiyaz Basha says: ==================== liquidio: Tx queue cleanup Moved some common function to octeon_network.h Removed some unwanted functions and checks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
For consistency renaming txqs_start to start_txqs Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
For consistency renaming txqs_stop to stop_txqs Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
For consistency renaming txqs_wake to wake_txqs Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Using skb_iq function for deriving queue from skb Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Removing one line function wake_q Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Removing one line function stop_q Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Removing checks for netif_is_multiqueue. Configuring single queue will be a multiqueue netdev with one queues. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Removing start_txq function from VF and PF files Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Removing one line function stop_txq Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Moving common function skb_iq to to octeon_network.h Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Moving common function txqs_start to octeon_network.h Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Moving common function txqs_wake to octeon_network.h Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Intiyaz Basha authored
Moving common function txqs_stop to octeon_network.h Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Mar, 2018 3 commits
-
-
Davide Caratti authored
use u16 in place of __be16 to suppress the following sparse warnings: net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid net/sched/act_vlan.c:150:26: got unsigned short net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types) net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
tcf_idr_cleanup() is no more used, so remove it. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
In net commit 8175f7c4736f ("mlxsw: spectrum: Prevent duplicate mirrors") we prevented the user from mirroring more than once from a single binding point (port-direction pair). The fix was essentially reverted in a merge conflict resolution when net was merged into net-next. Restore it. Fixes: 03fe2deb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Mar, 2018 19 commits
-
-
Björn Töpel authored
The current page counting scheme assumes that the reference count cannot decrease until the received frame is sent to the upper layers of the networking stack. This assumption does not hold for the XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have its reference count decreased via the xdp_do_redirect call. To work around that, we now start off by a large page count and then don't allow a refcount less than two. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Tony Nguyen authored
XDP stats are included in TX stats, however, they are not reported in TX queue stats since they are setup on different queues. Add reporting for XDP queue stats to provide consistency between the total stats and per queue stats. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Tony Nguyen authored
Add support for XDP meta data when using build skb. Based on commit 366a88fe ("bpf, ixgbe: add meta data support") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Tony Nguyen authored
Current XDP implementation hits the tail on every XDP_TX; change the driver to only hit the tail after packet processing is complete. Based on commit 7379f97a ("ixgbe: delay tail write to every 'n' packets") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Tony Nguyen authored
This implements the XDP_TX action which is modeled on the ixgbe implementation. However instead of using CPU id to determine which XDP queue to use, this uses the received RX queue index, which is similar to i40e. Doing this eliminates the restriction that number of CPUs not exceed number of XDP queues that ixgbe has. Also, based on the number of queues available, the number of TX queues may be reduced when an XDP program is loaded in order to accommodate the XDP queues. Based largely on commit 33fdc82f ("ixgbe: add support for XDP_TX action") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Tony Nguyen authored
Implement XDP_PASS and XDP_DROP based on the ixgbe implementation. Based largely on commit 92470808 ("ixgbe: add XDP support for pass and drop actions"). Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Fix things up to support TSO offload in conjunction with IPsec hw offload. This raises throughput with IPsec offload on to nearly line rate. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
There is no need to calculate the trailer length if we're doing a GSO/TSO, as there is no trailer added to the packet data. Also, don't bother clearing the flags field as it was already cleared earlier. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Since the ipsec data fields will be zero anyway in the non-ipsec case, we can remove the conditional jump. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
With the patch commit f8aa2696b4af ("esp: check the NETIF_F_HW_ESP_TX_CSUM bit before segmenting") we no longer need to protect ourself from checksum offload requests on IPsec packets, so we can remove the check in our .ndo_features_check callback. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Paul Greenwalt authored
Replaced an assignment operation with an OR operation. The variable assignment was overwriting the value read from the PHY register. The OR operation sets only the intended register bits. The bits that were being overwritten are reserved, so the assignment had no functional impact. Reported by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Paul Greenwalt authored
Add status register reads and delay between reads to ixgbe_check_remove. Registers can read 0xFFFFFFFF during PCI reset, which causes the driver to remove the adapter. The additional status register reads can reduce the chance of this race condition. If the status register is not 0xFFFFFFFF, then ixgbe_check_remove returns the value of the register being read. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mathias Kresin authored
The phys embedded into the v1.1 of the VR9 SoC are using different phy ids. Add the phy ids to use the driver for this VR9 version as well. Signed-off-by: Mathias Kresin <dev@kresin.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mathias Kresin authored
The VR9 phy ids are matching only for the SoC version 1.2. Rename the macros and change the names to take this into account. Signed-off-by: Mathias Kresin <dev@kresin.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
kbuild test robot authored
Fixes: 2bfbd35d ("net: hns3: Changes required in PF mailbox to support VF reset") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jon Maloy says: ==================== tipc: introduce 128-bit auto-configurable node id We introduce a 128-bit free-format node identity as an alternative to the legacy <Zone.Cluster.Node> structured 32-bit node address. We also make configuration of this identity optional; if a bearer is enabled without a pre-configured node id it will be set automatically based on the used interface's MAC or IP address. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
Selecting and explicitly configuring a TIPC node identity may be unwanted in some cases. In this commit we introduce a default setting if the identity has not been set at the moment the first bearer is enabled. We do this by using a raw copy of a unique identifier from the used interface: MAC address in the case of an L2 bearer, IPv4/IPv6 address in the case of a UDP bearer. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-