Commits · b1b123cfb24b19a4e73e55011a5da58d9523b073 · Kirill Smelkov / linux

01 Sep, 2017 23 commits

net: mdio-mux: Remove unnecessary 'out of memory' message · b1b123cf

Corentin Labbe authored Sep 01, 2017

This patch fix checkpatch warning about unnecessary 'out of memory'
message.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1b123cf

net: mdio-mux: Fix NULL Comparison style · 2d00cd85

Corentin Labbe authored Sep 01, 2017

This patch fix checkpatch warning about NULL Comparison style.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d00cd85

Merge branch 'mvpp2-optional-PHYs-and-GoP-link-irq' · c5b2cef3

David S. Miller authored Sep 01, 2017

Antoine Tenart says:

====================
net: mvpp2: optional PHYs and GoP link irq

This series aims at making the driver work when no PHY is connected
between a port and the physical layer and not described as a fixed-phy.
This is useful for some usecases such as when a switch is connected
directly to the serdes lanes. It can also be used for SFP ports on the
7k-db and 8k-db while waiting for the phylink support to land in (which
should be part of another series).

This series makes the phy optional in the PPv2 driver, and then adds
the support for the GoP port link interrupt to handle link status
changes on such ports.

This was tested using the SFP ports on the 7k-db and 8k-db boards.

Since v1:
  - Now use phy_interface_mode_is_rgmii() in the GoP link patch.
  - Added one cosmetic patch to take advantage of phy_interface_mode_is_rgmii()
    in the whole PPv2 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c5b2cef3

Documentation/bindings: net: marvell-pp2: add the link interrupt · db40b4d1

Antoine Tenart authored Sep 01, 2017

A link interrupt can be described. Document this valid interrupt name.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

db40b4d1

net: mvpp2: use the GoP interrupt for link status changes · fd3651b2

Antoine Tenart authored Sep 01, 2017

This patch adds the GoP link interrupt support for when a port isn't
connected to a PHY. Because of this the phylib callback is never called
and the link status management isn't done. This patch use the GoP link
interrupt in such cases to still have a minimal link management. Without
this patch ports not connected to a PHY cannot work.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd3651b2

net: mvpp2: make the phy optional · 5997c86b

Antoine Tenart authored Sep 01, 2017

There is not necessarily a PHY between the GoP and the physical port.
However, the driver currently makes the "phy" property mandatory,
contrary to what is stated in the device tree bindings. This patch makes
the PHY optional, and aligns the PPv2 driver on its device tree
documentation. However if a PHY is provided, the GoP link interrupt
won't be used.

With this patch switches directly connected to the serdes lanes and SFP
ports on the Armada 8040-db and Armada 7040-db can be used if the link
interrupt is described in the device tree.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5997c86b

net: mvpp2: take advantage of the is_rgmii helper · 1df2270d

Antoine Tenart authored Sep 01, 2017

Convert all RGMII checks to use the phy_interface_mode_is_rgmii()
helper. This is a cosmetic patch.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

1df2270d

Merge branch 'mlxsw-next-fixes' · 0d22a3cf

David S. Miller authored Sep 01, 2017

Jiri Pirko says:

====================
mlxsw: spectrum_router: Couple of fixes

Ido Schimmel (2):
  mlxsw: spectrum_router: Trap packets hitting anycast routes
  mlxsw: spectrum_router: Set abort trap in all virtual routers
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0d22a3cf

mlxsw: spectrum_router: Set abort trap in all virtual routers · 241bc859

Ido Schimmel authored Sep 01, 2017

When the abort mechanism is invoked a default route directing packets to
the CPU is programmed in all the virtual routers currently in use. This
can result in packet loss in case a new VRF is configured.

Upon abort, program the default route in all virtual routers, whether
they are in use or not.

The patch is directed at net-next since post-abort fixes aren't critical
and packet loss due to a missing default route will be insignificant
compared to packet loss caused by the CPU port policer.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

241bc859

mlxsw: spectrum_router: Trap packets hitting anycast routes · d3b6d377

Ido Schimmel authored Sep 01, 2017

I relied on the fact that anycast routes use the loopback device as
their nexthop device to trap packets hitting them to the CPU.

After commit 4832c30d ("net: ipv6: put host and anycast routes on
device with address") this is no longer the case and such routes are
programmed with a forward action (note the 'offload' flag):

anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium

This will prevent the router from locally receiving packets destined to
the Subnet-Router anycast address.

Fix this by specifically programming anycast routes with action trap,
which results in the following output:

anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium

Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3b6d377

Merge branch 'bpf-Improve-LRU-map-lookup-performance' · 843bd2b3

David S. Miller authored Sep 01, 2017

Martin KaFai Lau says:

====================
bpf: Improve LRU map lookup performance

This patchset improves the lookup performance of the LRU map.
Please see individual patch for details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

843bd2b3

bpf: Only set node->ref = 1 if it has not been set · bb9b9f88

Martin KaFai Lau authored Aug 31, 2017

This patch writes 'node->ref = 1' only if node->ref is 0.
The number of lookups/s for a ~1M entries LRU map increased by
~30% (260097 to 343313).

Other writes on 'node->ref = 0' is not changed.  In those cases, the
same cache line has to be changed anyway.

First column: Size of the LRU hash
Second column: Number of lookups/s

Before:
> echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')"
1048577: 260097

After:
> echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')"
1048577: 343313
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb9b9f88

bpf: Inline LRU map lookup · cc555421

Martin KaFai Lau authored Aug 31, 2017

Inline the lru map lookup to save the cost in making calls to
bpf_map_lookup_elem() and htab_lru_map_lookup_elem().

Different LRU hash size is tested.  The benefit diminishes when
the cache miss starts to dominate in the bigger LRU hash.
Considering the change is simple, it is still worth to optimize.

First column: Size of the LRU hash
Second column: Number of lookups/s

Before:
> for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done
513: 1132020
1025: 1056826
2049: 1007024
4097: 853298
8193: 742723
16385: 712600
32769: 688142
65537: 677028
131073: 619437
262145: 498770
524289: 316695
1048577: 260038

After:
> for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done
513: 1221851
1025: 1144695
2049: 1049902
4097: 884460
8193: 773731
16385: 729673
32769: 721989
65537: 715530
131073: 671665
262145: 516987
524289: 321125
1048577: 260048
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc555421

bpf: Add lru_hash_lookup performance test · 637cd8c3

Martin KaFai Lau authored Aug 31, 2017

Create a new case to test the LRU lookup performance.

At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map->max_entries).   The lookup is done through key 0
to num_map_entries and then repeats from 0 again.

This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

637cd8c3

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 08daaec7

David S. Miller authored Sep 01, 2017

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-09-01

This should be the last ipsec-next pull request for this
release cycle:

1) Support netdevice ESP trailer removal when decryption
   is offloaded. From Yossi Kuperman.

2) Fix overwritten return value of copy_sec_ctx().

Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

08daaec7

Merge branch 'bpf-Add-option-to-set-mark-and-priority-in-cgroup-sock-programs' · 8fd68207

David S. Miller authored Sep 01, 2017

David Ahern says:

====================
bpf: Add option to set mark and priority in cgroup sock programs

Add option to set mark and priority in addition to bound device for newly
created sockets. Also, allow the bpf programs to use the get_current_uid_gid
helper meaning socket marks, priority and device can be set based on the
uid/gid of the running process.

Sample programs are updated to demonstrate the new options.

v3
- no changes to Patches 1 and 2 which Alexei acked in previous versions
- dropped change related to recursive programs in a cgroup
- updated tests per dropped patch

v2
- added flag to control recursive behavior as requested by Alexei
- added comment to sock_filter_func_proto regarding use of
  get_current_uid_gid helper
- updated test programs for recursive option
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8fd68207

samples/bpf: Update cgroup socket examples to use uid gid helper · 0adc3dd9

David Ahern authored Aug 31, 2017

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0adc3dd9

samples/bpf: Update cgrp2 socket tests · 33aeb5e3

David Ahern authored Aug 31, 2017

Update cgrp2 bpf sock tests to check that device, mark and priority
can all be set on a socket via bpf programs attached to a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

33aeb5e3

samples/bpf: Add option to dump socket settings · f776d460

David Ahern authored Aug 31, 2017

Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f776d460

samples/bpf: Add detach option to test_cgrp2_sock · 609b1c32

David Ahern authored Aug 31, 2017

Add option to detach programs from a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

609b1c32

samples/bpf: Update sock test to allow setting mark and priority · fa38aa17

David Ahern authored Aug 31, 2017

Update sock test to set mark and priority on socket create.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa38aa17

bpf: Allow cgroup sock filters to use get_current_uid_gid helper · ae2cf1c4

David Ahern authored Aug 31, 2017

Allow BPF programs run on sock create to use the get_current_uid_gid
helper. IPv4 and IPv6 sockets are created in a process context so
there is always a valid uid/gid
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae2cf1c4

bpf: Add mark and priority to sock options that can be set · 482dca93

David Ahern authored Aug 31, 2017

Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

482dca93

31 Aug, 2017 17 commits

Merge branch 'mlxsw-Add-IPv6-host-dpipe-table' · e12f1a59

David S. Miller authored Aug 31, 2017

Jiri Pirko says:

====================
mlxsw: Add IPv6 host dpipe table

This patchset adds IPv6 host dpipe table support. This will provide the
ability to observe the hardware offloaded IPv6 neighbors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e12f1a59

mlxsw: spectrum_dpipe: Add support for controlling IPv6 neighbor counters · 0fb5fe3c

Arkadi Sharshevsky authored Aug 31, 2017

Add support for controlling IPv6 neighbor counters via dpipe.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fb5fe3c

mlxsw: spectrum_router: Add support for setting counters on IPv6 neighbors · 1ed5574c

Arkadi Sharshevsky authored Aug 31, 2017

Add support for setting counters on IPv6 neighbors based on dpipe's host6
table counter status.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ed5574c

mlxsw: spectrum_dpipe: Add support for IPv6 host table dump · 410774bd

Arkadi Sharshevsky authored Aug 31, 2017

Add support for IPv6 host table dump.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

410774bd

mlxsw: spectrum_dpipe: Make host entry fill handler more generic · 6049e539

Arkadi Sharshevsky authored Aug 31, 2017

Change the host entry filler helper to be applicable for both IPv4/6
addresses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6049e539

mlxsw: spectrum_router: Add IPv6 neighbor access helper · 0250768c

Arkadi Sharshevsky authored Aug 31, 2017

Add helper for accessing destination IP in case of IPv6 neighbor.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0250768c

mlxsw: spectrum_dpipe: Add IPv6 host table initial support · 506f7dd5

Arkadi Sharshevsky authored Aug 31, 2017

Add IPv6 host table initial support. The action behavior for both IPv4/6
tables is the same, thus the same action dump op is used. Neighbors with
link local address are ignored.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

506f7dd5

mlxsw: spectrum_router: Export IPv6 link local address check helper · 1d1056d8

Arkadi Sharshevsky authored Aug 31, 2017

Neighbors with link local addresses are not offloaded to the host table,
yet, the are maintained in the driver for adjacency table usage. When
dumping the IPv6 host neighbors this link local neighbors should be
ignored. This patch exports this helper for dpipe usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d1056d8

devlink: Add IPv6 header for dpipe · 1797f5b3

Arkadi Sharshevsky authored Aug 31, 2017

This will be used by the IPv6 host table which will be introduced in the
following patches. The fields in the header are added per-use. This header
is global and can be reused by many drivers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1797f5b3

x86: bpf_jit: small optimization in emit_bpf_tail_call() · 84ccac6e

Eric Dumazet authored Aug 31, 2017

Saves 4 bytes replacing following instructions :

lea rax, [rsi + rdx * 8 + offsetof(...)]
mov rax, qword ptr [rax]
cmp rax, 0

by :

mov rax, [rsi + rdx * 8 + offsetof(...)]
test rax, rax
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

84ccac6e

samples/bpf: Fix compilation issue in redirect dummy program · 3edcf18e

Tariq Toukan authored Aug 31, 2017

Fix compilation error below:

$ make samples/bpf/

LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to
assembly file
make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1
make: *** [samples/bpf/] Error 2

Fixes: 306da4e6 ("samples/bpf: xdp_redirect load XDP dummy prog on TX device")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3edcf18e

net: fix two typos in net_device_ops documentation. · f16ded59

Rami Rosen authored Aug 31, 2017

This patch fixes two trivial typos in net_device_ops documentation,
related to ndo_xdp_flush callback.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f16ded59

net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() · 323fbd0e

Andrii authored Aug 31, 2017

Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv() in net/dccp/ipv6.c,
similar
to the handling in net/ipv6/tcp_ipv6.c
Signed-off-by: Andrii Vladyka <tulup@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

323fbd0e

bridge: add tracepoint in br_fdb_update · e3cfddd5

Roopa Prabhu authored Aug 30, 2017

This extends bridge fdb table tracepoints to also cover
learned fdb entries in the br_fdb_update path. Note that
unlike other tracepoints I have moved this to when the fdb
is modified because this is in the datapath and can generate
a lot of noise in the trace output. br_fdb_update is also called
from added_by_user context in the NTF_USE case which is already
traced ..hence the !added_by_user check.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3cfddd5

net_sched: add reverse binding for tc class · 07d79fc7

Cong Wang authored Aug 30, 2017

TC filters when used as classifiers are bound to TC classes.
However, there is a hidden difference when adding them in different
orders:

1. If we add tc classes before its filters, everything is fine.
   Logically, the classes exist before we specify their ID's in
   filters, it is easy to bind them together, just as in the current
   code base.

2. If we add tc filters before the tc classes they bind, we have to
   do dynamic lookup in fast path. What's worse, this happens all
   the time not just once, because on fast path tcf_result is passed
   on stack, there is no way to propagate back to the one in tc filters.

This hidden difference hurts performance silently if we have many tc
classes in hierarchy.

This patch intends to close this gap by doing the reverse binding when
we create a new class, in this case we can actually search all the
filters in its parent, match and fixup by classid. And because
tcf_result is specific to each type of tc filter, we have to introduce
a new ops for each filter to tell how to bind the class.

Note, we still can NOT totally get rid of those class lookup in
->enqueue() because cgroup and flow filters have no way to determine
the classid at setup time, they still have to go through dynamic lookup.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07d79fc7

xfrm: Fix return value check of copy_sec_ctx. · 8598112d

Steffen Klassert authored Aug 31, 2017

A recent commit added an output_mark. When copying
this output_mark, the return value of copy_sec_ctx
is overwitten without a check. Fix this by copying
the output_mark before the security context.

Fixes: 077fbac4 ("net: xfrm: support setting an output mark.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

8598112d

xfrm: Add support for network devices capable of removing the ESP trailer · 47ebcc0b

Yossi Kuperman authored Aug 30, 2017

In conjunction with crypto offload [1], removing the ESP trailer by
hardware can potentially improve the performance by avoiding (1) a
cache miss incurred by reading the nexthdr field and (2) the necessity
to calculate the csum value of the trailer in order to keep skb->csum
valid.

This patch introduces the changes to the xfrm stack and merely serves
as an infrastructure. Subsequent patch to mlx5 driver will put this to
a good use.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.htmlSigned-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

47ebcc0b