- 01 Sep, 2017 15 commits
-
-
Ido Schimmel authored
When the abort mechanism is invoked a default route directing packets to the CPU is programmed in all the virtual routers currently in use. This can result in packet loss in case a new VRF is configured. Upon abort, program the default route in all virtual routers, whether they are in use or not. The patch is directed at net-next since post-abort fixes aren't critical and packet loss due to a missing default route will be insignificant compared to packet loss caused by the CPU port policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
I relied on the fact that anycast routes use the loopback device as their nexthop device to trap packets hitting them to the CPU. After commit 4832c30d ("net: ipv6: put host and anycast routes on device with address") this is no longer the case and such routes are programmed with a forward action (note the 'offload' flag): anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium This will prevent the router from locally receiving packets destined to the Subnet-Router anycast address. Fix this by specifically programming anycast routes with action trap, which results in the following output: anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Martin KaFai Lau says: ==================== bpf: Improve LRU map lookup performance This patchset improves the lookup performance of the LRU map. Please see individual patch for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
This patch writes 'node->ref = 1' only if node->ref is 0. The number of lookups/s for a ~1M entries LRU map increased by ~30% (260097 to 343313). Other writes on 'node->ref = 0' is not changed. In those cases, the same cache line has to be changed anyway. First column: Size of the LRU hash Second column: Number of lookups/s Before: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 260097 After: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 343313 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
Inline the lru map lookup to save the cost in making calls to bpf_map_lookup_elem() and htab_lru_map_lookup_elem(). Different LRU hash size is tested. The benefit diminishes when the cache miss starts to dominate in the bigger LRU hash. Considering the change is simple, it is still worth to optimize. First column: Size of the LRU hash Second column: Number of lookups/s Before: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1132020 1025: 1056826 2049: 1007024 4097: 853298 8193: 742723 16385: 712600 32769: 688142 65537: 677028 131073: 619437 262145: 498770 524289: 316695 1048577: 260038 After: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1221851 1025: 1144695 2049: 1049902 4097: 884460 8193: 773731 16385: 729673 32769: 721989 65537: 715530 131073: 671665 262145: 516987 524289: 321125 1048577: 260048 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
Create a new case to test the LRU lookup performance. At the beginning, the LRU map is fully loaded (i.e. the number of keys is equal to map->max_entries). The lookup is done through key 0 to num_map_entries and then repeats from 0 again. This patch also creates an anonymous struct to properly name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller authored
Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-09-01 This should be the last ipsec-next pull request for this release cycle: 1) Support netdevice ESP trailer removal when decryption is offloaded. From Yossi Kuperman. 2) Fix overwritten return value of copy_sec_ctx(). Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
David Ahern says: ==================== bpf: Add option to set mark and priority in cgroup sock programs Add option to set mark and priority in addition to bound device for newly created sockets. Also, allow the bpf programs to use the get_current_uid_gid helper meaning socket marks, priority and device can be set based on the uid/gid of the running process. Sample programs are updated to demonstrate the new options. v3 - no changes to Patches 1 and 2 which Alexei acked in previous versions - dropped change related to recursive programs in a cgroup - updated tests per dropped patch v2 - added flag to control recursive behavior as requested by Alexei - added comment to sock_filter_func_proto regarding use of get_current_uid_gid helper - updated test programs for recursive option ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Update cgrp2 bpf sock tests to check that device, mark and priority can all be set on a socket via bpf programs attached to a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add option to dump socket settings. Will be used in the next patch to verify bpf programs are correctly setting mark, priority and device based on the cgroup attachment for the program run. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add option to detach programs from a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Update sock test to set mark and priority on socket create. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Allow BPF programs run on sock create to use the get_current_uid_gid helper. IPv4 and IPv6 sockets are created in a process context so there is always a valid uid/gid Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 31 Aug, 2017 20 commits
-
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Add IPv6 host dpipe table This patchset adds IPv6 host dpipe table support. This will provide the ability to observe the hardware offloaded IPv6 neighbors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Add support for controlling IPv6 neighbor counters via dpipe. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Add support for setting counters on IPv6 neighbors based on dpipe's host6 table counter status. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Add support for IPv6 host table dump. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Change the host entry filler helper to be applicable for both IPv4/6 addresses. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Add helper for accessing destination IP in case of IPv6 neighbor. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Add IPv6 host table initial support. The action behavior for both IPv4/6 tables is the same, thus the same action dump op is used. Neighbors with link local address are ignored. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
Neighbors with link local addresses are not offloaded to the host table, yet, the are maintained in the driver for adjacency table usage. When dumping the IPv6 host neighbors this link local neighbors should be ignored. This patch exports this helper for dpipe usage. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arkadi Sharshevsky authored
This will be used by the IPv6 host table which will be introduced in the following patches. The fields in the header are added per-use. This header is global and can be reused by many drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Saves 4 bytes replacing following instructions : lea rax, [rsi + rdx * 8 + offsetof(...)] mov rax, qword ptr [rax] cmp rax, 0 by : mov rax, [rsi + rdx * 8 + offsetof(...)] test rax, rax Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Fix compilation error below: $ make samples/bpf/ LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to assembly file make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1 make: *** [samples/bpf/] Error 2 Fixes: 306da4e6 ("samples/bpf: xdp_redirect load XDP dummy prog on TX device") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rami Rosen authored
This patch fixes two trivial typos in net_device_ops documentation, related to ndo_xdp_flush callback. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrii authored
Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c, similar to the handling in net/ipv6/tcp_ipv6.c Signed-off-by: Andrii Vladyka <tulup@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This extends bridge fdb table tracepoints to also cover learned fdb entries in the br_fdb_update path. Note that unlike other tracepoints I have moved this to when the fdb is modified because this is in the datapath and can generate a lot of noise in the trace output. br_fdb_update is also called from added_by_user context in the NTF_USE case which is already traced ..hence the !added_by_user check. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cong Wang authored
TC filters when used as classifiers are bound to TC classes. However, there is a hidden difference when adding them in different orders: 1. If we add tc classes before its filters, everything is fine. Logically, the classes exist before we specify their ID's in filters, it is easy to bind them together, just as in the current code base. 2. If we add tc filters before the tc classes they bind, we have to do dynamic lookup in fast path. What's worse, this happens all the time not just once, because on fast path tcf_result is passed on stack, there is no way to propagate back to the one in tc filters. This hidden difference hurts performance silently if we have many tc classes in hierarchy. This patch intends to close this gap by doing the reverse binding when we create a new class, in this case we can actually search all the filters in its parent, match and fixup by classid. And because tcf_result is specific to each type of tc filter, we have to introduce a new ops for each filter to tell how to bind the class. Note, we still can NOT totally get rid of those class lookup in ->enqueue() because cgroup and flow filters have no way to determine the classid at setup time, they still have to go through dynamic lookup. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steffen Klassert authored
A recent commit added an output_mark. When copying this output_mark, the return value of copy_sec_ctx is overwitten without a check. Fix this by copying the output_mark before the security context. Fixes: 077fbac4 ("net: xfrm: support setting an output mark.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
Yossi Kuperman authored
In conjunction with crypto offload [1], removing the ESP trailer by hardware can potentially improve the performance by avoiding (1) a cache miss incurred by reading the nexthdr field and (2) the necessity to calculate the csum value of the trailer in order to keep skb->csum valid. This patch introduces the changes to the xfrm stack and merely serves as an infrastructure. Subsequent patch to mlx5 driver will put this to a good use. [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.htmlSigned-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2017-08-31 (GRE Offloads support) This series provides the support for MPLS RSS and GRE TX offloads and RSS support. The first patch from Gal and Ariel provides the mlx5 driver support for ConnectX capability to perform IP version identification and matching in order to distinguish between IPv4 and IPv6 without the need to specify the encapsulation type, thus perform RSS in MPLS automatically without specifying MPLS ethertyoe. This patch will also serve for inner GRE IPv4/6 classification for inner GRE RSS. 2nd patch from Gal, Adds the TX offloads support for GRE tunneled packets, by reporting the needed netdev features. 3rd patch from Gal, Adds GRE inner RSS support by creating the needed device resources (Steering Tables/rules and traffic classifiers) to Match GRE traffic and perform RSS hashing on the inner headers. Improvement: Testing 8 TCP streams bandwidth over GRE: System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Before: 21.3 Gbps (Single RQ) Now : 90.5 Gbps (RSS spread on 8 RQs) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rick Farrington authored
Fix crash in linux PF driver when BARs have been cleared/de-programmed; fail early init (prior to mapping BARs) if the BAR0 or BAR1 registers are zero. This situation can arise when the PF is added to a VM (PCI pass-through), then a PF FLR is issued (in the VM). After this occurs, the BAR registers will be zero. If we attempt to load the PF driver in the host (after VM has been shutdown), the host can reset. Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
IPv4 name uses "destination ip" as does the IPv6 patch set. Make the mac field consistent. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Aug, 2017 5 commits
-
-
Haiyang Zhang authored
There are two typos in the document, netvsc.txt, regarding UDP hashing level. This patch fixes them. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
xennet_start_xmit() might copy skb with inappropriate layout into a fresh one. Old skb is freed, and at this point it is not a drop, but a consume. New skb will then be either consumed or dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gal Pressman authored
Introduce a new flow table and indirect TIRs which are used to hash the inner packet headers of GRE tunneled packets. When a GRE tunneled packet is received, the TTC flow table will match the new IPv4/6->GRE rules which will forward it to the inner TTC table. The inner TTC is similar to its counterpart outer TTC table, but matching the inner packet headers instead of the outer ones (and does not include the new IPv4/6->GRE rules). The new rules will not add steering hops since they are added to an already existing flow group which will be matched regardless of this patch. Non GRE traffic will not be affected. The inner flow table will forward the packet to inner indirect TIRs which hash the inner packet and thus result in RSS for the tunneled packets. Testing 8 TCP streams bandwidth over GRE: System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Before: 21.3 Gbps (Single RQ) Now : 90.5 Gbps (RSS spread on 8 RQs) Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Gal Pressman authored
Add TX offloads support for GRE tunneled packets by reporting the needed netdev features. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Gal Pressman authored
This change adds the ability for flow steering to classify IPv4/6 packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP packets and hit IPv4/6 classification steering rules. Since IP packets with MPLS tag header have MPLS ethertype, they missed the IPv4/6 ethertype rule and ended up hitting the default filter forwarding all the packets to the same single RQ (No RSS). Since our device is able to look past the MPLS tag and identify the next protocol we introduce this solution which replaces ethertype matching by the device's capability to perform IP version identification and matching in order to distinguish between IPv4 and IPv6. Therefore, when driver is performing flow steering configuration on the device it will use IP version matching in IP classified rules instead of ethertype matching which will cause relevant MPLS tagged packets to hit this rule as well. If the device doesn't support IP version matching the driver will fall back to use legacy ethertype matching in the steering as before. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-