Commits · 637cd8c312d8caf234821fd37238b8f956d9ab13 · Kirill Smelkov / linux

01 Sep, 2017 10 commits

bpf: Add lru_hash_lookup performance test · 637cd8c3

Martin KaFai Lau authored Aug 31, 2017

Create a new case to test the LRU lookup performance.

At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map->max_entries).   The lookup is done through key 0
to num_map_entries and then repeats from 0 again.

This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

637cd8c3

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 08daaec7

David S. Miller authored Sep 01, 2017

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-09-01

This should be the last ipsec-next pull request for this
release cycle:

1) Support netdevice ESP trailer removal when decryption
   is offloaded. From Yossi Kuperman.

2) Fix overwritten return value of copy_sec_ctx().

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

08daaec7

Merge branch 'bpf-Add-option-to-set-mark-and-priority-in-cgroup-sock-programs' · 8fd68207

David S. Miller authored Sep 01, 2017

David Ahern says:

====================
bpf: Add option to set mark and priority in cgroup sock programs

Add option to set mark and priority in addition to bound device for newly
created sockets. Also, allow the bpf programs to use the get_current_uid_gid
helper meaning socket marks, priority and device can be set based on the
uid/gid of the running process.

Sample programs are updated to demonstrate the new options.

v3
- no changes to Patches 1 and 2 which Alexei acked in previous versions
- dropped change related to recursive programs in a cgroup
- updated tests per dropped patch

v2
- added flag to control recursive behavior as requested by Alexei
- added comment to sock_filter_func_proto regarding use of
  get_current_uid_gid helper
- updated test programs for recursive option
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8fd68207

samples/bpf: Update cgroup socket examples to use uid gid helper · 0adc3dd9

David Ahern authored Aug 31, 2017

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0adc3dd9

samples/bpf: Update cgrp2 socket tests · 33aeb5e3

David Ahern authored Aug 31, 2017

Update cgrp2 bpf sock tests to check that device, mark and priority
can all be set on a socket via bpf programs attached to a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

33aeb5e3

samples/bpf: Add option to dump socket settings · f776d460

David Ahern authored Aug 31, 2017

Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f776d460

samples/bpf: Add detach option to test_cgrp2_sock · 609b1c32

David Ahern authored Aug 31, 2017

Add option to detach programs from a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

609b1c32

samples/bpf: Update sock test to allow setting mark and priority · fa38aa17

David Ahern authored Aug 31, 2017

Update sock test to set mark and priority on socket create.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa38aa17

bpf: Allow cgroup sock filters to use get_current_uid_gid helper · ae2cf1c4

David Ahern authored Aug 31, 2017

Allow BPF programs run on sock create to use the get_current_uid_gid
helper. IPv4 and IPv6 sockets are created in a process context so
there is always a valid uid/gid
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae2cf1c4

bpf: Add mark and priority to sock options that can be set · 482dca93

David Ahern authored Aug 31, 2017

Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

482dca93

31 Aug, 2017 20 commits

Merge branch 'mlxsw-Add-IPv6-host-dpipe-table' · e12f1a59

David S. Miller authored Aug 31, 2017

Jiri Pirko says:

====================
mlxsw: Add IPv6 host dpipe table

This patchset adds IPv6 host dpipe table support. This will provide the
ability to observe the hardware offloaded IPv6 neighbors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e12f1a59

mlxsw: spectrum_dpipe: Add support for controlling IPv6 neighbor counters · 0fb5fe3c

Arkadi Sharshevsky authored Aug 31, 2017

Add support for controlling IPv6 neighbor counters via dpipe.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fb5fe3c

mlxsw: spectrum_router: Add support for setting counters on IPv6 neighbors · 1ed5574c

Arkadi Sharshevsky authored Aug 31, 2017

Add support for setting counters on IPv6 neighbors based on dpipe's host6
table counter status.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ed5574c

mlxsw: spectrum_dpipe: Add support for IPv6 host table dump · 410774bd

Arkadi Sharshevsky authored Aug 31, 2017

Add support for IPv6 host table dump.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

410774bd

mlxsw: spectrum_dpipe: Make host entry fill handler more generic · 6049e539

Arkadi Sharshevsky authored Aug 31, 2017

Change the host entry filler helper to be applicable for both IPv4/6
addresses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6049e539

mlxsw: spectrum_router: Add IPv6 neighbor access helper · 0250768c

Arkadi Sharshevsky authored Aug 31, 2017

Add helper for accessing destination IP in case of IPv6 neighbor.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0250768c

mlxsw: spectrum_dpipe: Add IPv6 host table initial support · 506f7dd5

Arkadi Sharshevsky authored Aug 31, 2017

Add IPv6 host table initial support. The action behavior for both IPv4/6
tables is the same, thus the same action dump op is used. Neighbors with
link local address are ignored.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

506f7dd5

mlxsw: spectrum_router: Export IPv6 link local address check helper · 1d1056d8

Arkadi Sharshevsky authored Aug 31, 2017

Neighbors with link local addresses are not offloaded to the host table,
yet, the are maintained in the driver for adjacency table usage. When
dumping the IPv6 host neighbors this link local neighbors should be
ignored. This patch exports this helper for dpipe usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d1056d8

devlink: Add IPv6 header for dpipe · 1797f5b3

Arkadi Sharshevsky authored Aug 31, 2017

This will be used by the IPv6 host table which will be introduced in the
following patches. The fields in the header are added per-use. This header
is global and can be reused by many drivers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1797f5b3

x86: bpf_jit: small optimization in emit_bpf_tail_call() · 84ccac6e

Eric Dumazet authored Aug 31, 2017

Saves 4 bytes replacing following instructions :

lea rax, [rsi + rdx * 8 + offsetof(...)]
mov rax, qword ptr [rax]
cmp rax, 0

by :

mov rax, [rsi + rdx * 8 + offsetof(...)]
test rax, rax
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

84ccac6e

samples/bpf: Fix compilation issue in redirect dummy program · 3edcf18e

Tariq Toukan authored Aug 31, 2017

Fix compilation error below:

$ make samples/bpf/

LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to
assembly file
make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1
make: *** [samples/bpf/] Error 2

Fixes: 306da4e6 ("samples/bpf: xdp_redirect load XDP dummy prog on TX device")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3edcf18e

net: fix two typos in net_device_ops documentation. · f16ded59

Rami Rosen authored Aug 31, 2017

This patch fixes two trivial typos in net_device_ops documentation,
related to ndo_xdp_flush callback.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f16ded59

net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() · 323fbd0e

Andrii authored Aug 31, 2017

Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c,
similar
to the handling in net/ipv6/tcp_ipv6.c
Signed-off-by: Andrii Vladyka <tulup@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

323fbd0e

bridge: add tracepoint in br_fdb_update · e3cfddd5

Roopa Prabhu authored Aug 30, 2017

This extends bridge fdb table tracepoints to also cover
learned fdb entries in the br_fdb_update path. Note that
unlike other tracepoints I have moved this to when the fdb
is modified because this is in the datapath and can generate
a lot of noise in the trace output. br_fdb_update is also called
from added_by_user context in the NTF_USE case which is already
traced ..hence the !added_by_user check.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3cfddd5

net_sched: add reverse binding for tc class · 07d79fc7

Cong Wang authored Aug 30, 2017

TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:

1. If we add tc classes before its filters, everything is fine.
   Logically, the classes exist before we specify their ID's in
   filters, it is easy to bind them together, just as in the current
   code base.

2. If we add tc filters before the tc classes they bind, we have to
   do dynamic lookup in fast path. What's worse, this happens all
   the time not just once, because on fast path tcf_result is passed
   on stack, there is no way to propagate back to the one in tc filters.

This hidden difference hurts performance silently if we have many tc
classes in hierarchy.

This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.

Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07d79fc7

xfrm: Fix return value check of copy_sec_ctx. · 8598112d

Steffen Klassert authored Aug 31, 2017

A recent commit added an output_mark. When copying
this output_mark, the return value of copy_sec_ctx
is overwitten without a check. Fix this by copying
the output_mark before the security context.

Fixes: 077fbac4 ("net: xfrm: support setting an output mark.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

8598112d

xfrm: Add support for network devices capable of removing the ESP trailer · 47ebcc0b

Yossi Kuperman authored Aug 30, 2017

In conjunction with crypto offload [1], removing the ESP trailer by
hardware can potentially improve the performance by avoiding (1) a
cache miss incurred by reading the nexthdr field and (2) the necessity
to calculate the csum value of the trailer in order to keep skb->csum
valid.

This patch introduces the changes to the xfrm stack and merely serves
as an infrastructure. Subsequent patch to mlx5 driver will put this to
a good use.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.htmlSigned-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

47ebcc0b

Merge tag 'mlx5-GRE-Offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ea3100ab

David S. Miller authored Aug 30, 2017

Saeed Mahameed says:

====================
mlx5-updates-2017-08-31 (GRE Offloads support)

This series provides the support for MPLS RSS and GRE TX offloads and
RSS support.

The first patch from Gal and Ariel provides the mlx5 driver support for
ConnectX capability to perform IP version identification and matching in
order to distinguish between IPv4 and IPv6 without the need to specify the
encapsulation type, thus perform RSS in MPLS automatically without
specifying MPLS ethertyoe. This patch will also serve for inner GRE IPv4/6
classification for inner GRE RSS.

2nd patch from Gal, Adds the TX offloads support for GRE tunneled packets,
by reporting the needed netdev features.

3rd patch from Gal, Adds GRE inner RSS support by creating the needed device
resources (Steering Tables/rules and traffic classifiers) to Match GRE traffic
and perform RSS hashing on the inner headers.

Improvement:
Testing 8 TCP streams bandwidth over GRE:
    System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    Before: 21.3 Gbps (Single RQ)
    Now   : 90.5 Gbps (RSS spread on 8 RQs)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ea3100ab

liquidio: fix crash in presence of zeroed-out base address regs · acfb98b9

Rick Farrington authored Aug 30, 2017

Fix crash in linux PF driver when BARs have been cleared/de-programmed;
fail early init (prior to mapping BARs) if the BAR0 or
BAR1 registers are zero.

This situation can arise when the PF is added to a VM (PCI pass-through),
then a PF FLR is issued (in the VM).  After this occurs, the BAR registers
will be zero. If we attempt to load the PF driver in the host
(after VM has been shutdown), the host can reset.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acfb98b9

devlink: Maintain consistency in mac field name · 12bdc5e1

David Ahern authored Aug 30, 2017

IPv4 name uses "destination ip" as does the IPv6 patch set.
Make the mac field consistent.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12bdc5e1

30 Aug, 2017 10 commits

hv_netvsc: Fix typos in the document of UDP hashing · d35d6e92

Haiyang Zhang authored Aug 30, 2017

There are two typos in the document, netvsc.txt,
regarding UDP hashing level. This patch fixes them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d35d6e92

xen-netfront: be more drop monitor friendly · 62f3250f

Eric Dumazet authored Aug 30, 2017

xennet_start_xmit() might copy skb with inappropriate layout
into a fresh one.

Old skb is freed, and at this point it is not a drop, but
a consume. New skb will then be either consumed or dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62f3250f

net/mlx5e: Support RSS for GRE tunneled packets · 7b3722fa

Gal Pressman authored Aug 13, 2017

Introduce a new flow table and indirect TIRs which are used to hash the
inner packet headers of GRE tunneled packets.

When a GRE tunneled packet is received, the TTC flow table will match
the new IPv4/6->GRE rules which will forward it to the inner TTC table.
The inner TTC is similar to its counterpart outer TTC table, but
matching the inner packet headers instead of the outer ones (and does
not include the new IPv4/6->GRE rules).
The new rules will not add steering hops since they are added to an
already existing flow group which will be matched regardless of this
patch. Non GRE traffic will not be affected.

The inner flow table will forward the packet to inner indirect TIRs
which hash the inner packet and thus result in RSS for the tunneled
packets.

Testing 8 TCP streams bandwidth over GRE:
System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 21.3 Gbps (Single RQ)
Now   : 90.5 Gbps (RSS spread on 8 RQs)
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

7b3722fa

net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels · 27299841

Gal Pressman authored Aug 13, 2017

Add TX offloads support for GRE tunneled packets by reporting the needed
netdev features.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

27299841

net/mlx5e: Use IP version matching to classify IP traffic · 888fcd9c

Gal Pressman authored Aug 15, 2017

This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classification steering rules.

Since IP packets with MPLS tag header have MPLS ethertype, they
missed the IPv4/6 ethertype rule and ended up hitting the default
filter forwarding all the packets to the same single RQ (No RSS).

Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces ethertype
matching by the device's capability to perform IP version
identification and matching in order to distinguish between IPv4 and
IPv6.
Therefore, when driver is performing flow steering configuration on the
device it will use IP version matching in IP classified rules instead
of ethertype matching which will cause relevant MPLS tagged packets to
hit this rule as well.

If the device doesn't support IP version matching the driver will fall back
to use legacy ethertype matching in the steering as before.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

888fcd9c

bpf: test_maps: fix typos, "conenct" and "listeen" · 90774a93

Colin Ian King authored Aug 30, 2017

Trivial fix to typos in printf error messages:
"conenct" -> "connect"
"listeen" -> "listen"

thanks to Daniel Borkmann for spotting one of these mistakes
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90774a93

qed: fix spelling mistake: "calescing" -> "coalescing" · 9e4a5613

Colin Ian King authored Aug 30, 2017

Trivial fix to spelling mistake in DP_NOTICE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e4a5613

net: hns3: Fixes the wrong IS_ERR check on the returned phydev value · 752b0694

Salil Mehta authored Aug 30, 2017

This patch removes the wrong check being done for the phy device being
returned by the mdiobus_get_phy() function. This function never returns
the error pointers.

Fixes: 256727da ("net: hns3: Add MDIO support to HNS3 Ethernet
Driver for hip08 SoC")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

752b0694

net: bcm63xx_enet: make bcm_enetsw_ethtool_ops const · dc8007e8

Bhumika Goyal authored Aug 30, 2017

Make this const as it is never modified.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc8007e8

ipv6: sr: fix get_srh() to comply with IPv6 standard "RFC 8200" · 5829d70b

Ahmed Abdelsalam authored Aug 30, 2017

IPv6 packet may carry more than one extension header, and IPv6 nodes must
accept and attempt to process extension headers in any order and occurring
any number of times in the same packet. Hence, there should be no
assumption that Segment Routing extension header is to appear immediately
after the IPv6 header.

Moreover, section 4.1 of RFC 8200 gives a recommendation on the order of
appearance of those extension headers within an IPv6 packet. According to
this recommendation, Segment Routing extension header should appear after
Hop-by-Hop and Destination Options headers (if they present).

This patch fixes the get_srh(), so it gets the segment routing header
regardless of its position in the chain of the extension headers in IPv6
packet, and makes sure that the IPv6 routing extension header is of Type 4.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

5829d70b