Commits · 0d235c3fabb79bddc46527885985f0ae281a89f9 · nexedi / linux

24 Aug, 2017 33 commits

net/mlx5: Add hash table to search FTEs in a flow-group · 0d235c3f

Matan Barak authored May 28, 2017

When adding a flow table entry (fte) to a flow group (fg), we first
need to check whether this fte exist. In such a case we just merge
the destinations (if possible). Currently, this is done by traversing
the fte list available in a fg. This could take a lot of time when
using large flow groups. Speeding this up by using rhashtable, which
is much faster.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

0d235c3f

net/mlx5: Don't store reserved part in FTEs and FGs · 667cb65a

Matan Barak authored Aug 07, 2017

The current code stores fte_match_param in the software representation
of FTEs and FGs. fte_match_param contains a large reserved area at the
bottom of the struct. Since downstream patches are going to hash this
part, we would like to avoid doing so on a reserved part.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

667cb65a

net/mlx5: Convert linear search for free index to ida · 8ebabaa0

Matan Barak authored May 28, 2017

When allocating a flow table entry, we need to allocate a free index
in the flow group. Currently, this is done by traversing the existing
flow table entries in the flow group, until a free index is found.
Replacing this by using a ida, which allows us to find a free index
much faster.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

8ebabaa0

net/mlx5e: Fix wrong code indentation in conditional statement · 09800305

Gal Pressman authored Aug 22, 2017

Fix the following checkpatch warning in en_ethtool.c:
WARNING: suspect code indent for conditional statements (8, 9)
+	for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+	 strcpy(data + (idx++) * ETH_GSTRING_LEN,
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

09800305

net/mlx5: Remove a leftover unused variable · 07533c67

Gal Pressman authored Aug 21, 2017

mlx5_core_wq is no longer being used and should be removed
from the code.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

07533c67

net/mlx5: Add a blank line after declarations V2 · 3c004583

Saeed Mahameed authored Aug 24, 2017

The blank line should be after u32 val = ...
and not after __be32 __iomem *addr = ...

Fixes: ad5b39a9 ("net/mlx5: Add a blank line after declarations")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Joe Perches <joe@perches.com>

3c004583

bpf: netdev is never null in __dev_map_flush · a5e2da6e

Daniel Borkmann authored Aug 24, 2017

No need to test for it in fast-path, every dev in bpf_dtab_netdev
is guaranteed to be non-NULL, otherwise dev_map_update_elem() will
fail in the first place.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5e2da6e

bpf, doc: Add arm32 as arch supporting eBPF JIT · d2aaa3dc

Shubham Bansal authored Aug 23, 2017

As eBPF JIT support for arm32 was added recently with
commit 39c13c20, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.
Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2aaa3dc

Merge branch 'bpf-verifier-fixes' · 81152518

David S. Miller authored Aug 23, 2017

Edward Cree says:

====================
bpf: verifier fixes

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

81152518

bpf/verifier: document liveness analysis · 8e9cd9ce

Edward Cree authored Aug 23, 2017

The liveness tracking algorithm is quite subtle; add comments to explain it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e9cd9ce

bpf/verifier: remove varlen_map_value_access flag · 1b688a19

Edward Cree authored Aug 23, 2017

The optimisation it does is broken when the 'new' register value has a
 variable offset and the 'old' was constant.  I broke it with my pointer
 types unification (see Fixes tag below), before which the 'new' value
 would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
 other changes in that patch mean that its original behaviour (ignore
 min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
 processed instructions.

Fixes: f1174f77 ("bpf/verifier: rework value tracking")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b688a19

selftests/bpf: add a test for a pruning bug in the verifier · df20cb7e

Alexei Starovoitov authored Aug 23, 2017

The test makes a read through a map value pointer, then considers pruning
 a branch where the register holds an adjusted map value pointer.  It
 should not prune, but currently it does.
Signed-off-by: Alexei Starovoitov <ast@fb.com>
[ecree@solarflare.com: added test-name and patch description]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

df20cb7e

bpf/verifier: when pruning a branch, ignore its write marks · 63f45f84

Edward Cree authored Aug 23, 2017

The fact that writes occurred in reaching the continuation state does
 not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
 write marks in that case.

Fixes: dc503a8a ("bpf/verifier: track liveness for pruning")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

63f45f84

selftests/bpf: add a test for a bug in liveness-based pruning · d893dc26

Edward Cree authored Aug 23, 2017

Writes in straight-line code should not prevent reads from propagating
 along jumps.  With current verifier code, the jump from 3 to 5 does not
 add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
 the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.

Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit

from 3 to 5: safe

from 1 to 3: safe
processed 8 insns, stack depth 0
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d893dc26

gre: remove duplicated assignment of iph · 60890e04

Colin Ian King authored Aug 23, 2017

iph is being assigned the same value twice; remove the redundant
first assignment. (Thanks to Nikolay Aleksandrov for pointing out
that the first asssignment should be removed and not the second)

Fixes warning:
net/ipv4/ip_gre.c:265:2: warning: Value stored to 'iph' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60890e04

net: tipc: constify genl_ops · 042a9010

Arvind Yadav authored Aug 23, 2017

genl_ops are not supposed to change at runtime. All functions
working with genl_ops provided by <net/genetlink.h> work with
const genl_ops. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

042a9010

net: hinic: make functions set_ctrl0 and set_ctrl1 static · 5719e5eb

Colin Ian King authored Aug 23, 2017

The functions set_ctrl0 and set_ctrl1 are local to the source and do
not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'set_ctrl0' was not declared. Should it be static?
symbol 'set_ctrl1' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5719e5eb

net/sock: allow the user to set negative peek offset · 257a7303

Paolo Abeni authored Aug 23, 2017

This is necessary to allow the user to disable peeking with
offset once it's enabled.
Unix sockets already allow the above, with this patch we
permit it for udp[6] sockets, too.

Fixes: 627d2d6b ("udp: enable MSG_PEEK at non-zero offset")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

257a7303

Merge branch 'mlxsw-multichain-tc-offload' · 110d8465

David S. Miller authored Aug 23, 2017

Jiri Pirko says:

====================
mlxsw: spectrum: Introduce multichain TC offload

This patchset introduces offloading of rules added to chain with
non-zero index, which was previously forbidden. Also, goto_chain
termination action is offloaded allowing to jump to processing
of desired chain.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

110d8465

mlxsw: spectrum_flower: Offload goto_chain termination action · 0ede6ba2

Jiri Pirko authored Aug 23, 2017

If action is gact goto_chain, offload it to HW by jumping to another
ruleset.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ede6ba2

mlxsw: spectrum_acl: Provide helper to lookup ruleset · dbec8ee9

Jiri Pirko authored Aug 23, 2017

We need to lookup ruleset in order to offload goto_chain termination
action. This patch adds it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbec8ee9

mlxsw: spectrum_acl: Allow to get group_id value for a ruleset · 0ade3b64

Jiri Pirko authored Aug 23, 2017

For goto_chain action we need to know group_id of a ruleset to jump to.
Provide infrastructure in order to get it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ade3b64

net: sched: add couple of goto_chain helpers · e457d86a

Jiri Pirko authored Aug 23, 2017

Add helpers to find out if a gact instance is goto_chain termination
action and to get chain index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e457d86a

mlxsw: spectrum: Offload multichain TC rules · 45b62742

Jiri Pirko authored Aug 23, 2017

Reflect chain index coming down from TC core and create a ruleset per
chain. Note that only chain 0, being the implicit chain, is bound to the
device for processing. The rest of chains have to be "jumped-to" by
actions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45b62742

Merge branch 'mvpp2-software-TSO-support' · ae99e188

David S. Miller authored Aug 23, 2017

Antoine Tenart says:

====================
net: mvpp2: software TSO support

This series adds the s/w TSO support in the PPv2 driver, in addition to
two cosmetic commits. As stated in patch 3/3:

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ae99e188

net: mvpp2: software tso support · 186cd4d4

Antoine Ténart authored Aug 23, 2017

The patch uses the tso API to implement the tso functionality in Marvell
PPv2 driver.

Using iperf and 10G ports, using TSO shows a significant performance
improvement by a factor 2 to reach around 9.5Gbps in TX; as well as a
significant CPU usage drop (from 25% to 15%).
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

186cd4d4

net: mvpp2: unify the txq size define use · 85affd7e

Antoine Ténart authored Aug 23, 2017

The txq size is defined by MVPP2_AGGR_TXQ_SIZE, which is sometime not
used directly but through variables. As it is a fixed value use the
define everywhere in the driver.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85affd7e

net: define the TSO header size in net/tso.h · f9cbe9a5

Antoine Ténart authored Aug 23, 2017

The TSO header size was defined in many drivers. Factorize the code and
define its size in net/tso.h.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9cbe9a5

ipv4: do metrics match when looking up and deleting a route · 5f9ae3d9

Xin Long authored Aug 23, 2017

Now when ipv4 route inserts a fib_info, it memcmp fib_metrics.
It means ipv4 route identifies one route also with metrics.

But when removing a route, it tries to find the route without
caring about the metrics. It will cause that the route with
right metrics can't be removed.

Thomas noticed this issue when doing the testing:

1. add:
   # ip route append 192.168.7.0/24 dev v window 1000
   # ip route append 192.168.7.0/24 dev v window 1001
   # ip route append 192.168.7.0/24 dev v window 1002
   # ip route append 192.168.7.0/24 dev v window 1003
2. delete:
   # ip route delete 192.168.7.0/24 dev v window 1002
3. show:
     192.168.7.0/24 proto boot scope link window 1001
     192.168.7.0/24 proto boot scope link window 1002
     192.168.7.0/24 proto boot scope link window 1003

The one with window 1002 wasn't deleted but the first one was.

This patch is to do metrics match when looking up and deleting
one route.
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f9ae3d9

Merge branch 'tcp-sw-rx-timestamps' · d260e9e6

David S. Miller authored Aug 23, 2017

Mike Maloney says:

====================
net: Add software rx timestamp for TCP.

Add software rx timestamps for TCP, and a test to ensure consistency of
behavior between IP, UDP, and TCP implementation.

Changes since v1:
  -Initialize tss->ts[1] to 0 if caller requested any timestamps.
  -Fix test case to validate that tss->ts[1] is zero.
  -Fix tests to actually use a raw socket.
  -Fix --tcp flag to work on the test.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d260e9e6

selftests/net: Add a test to validate behavior of rx timestamps · 16e78122

Mike Maloney authored Aug 22, 2017

Validate the behavior of the combination of various timestamp socket
options, and ensure consistency across ip, udp, and tcp.
Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16e78122

tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg · 98aaa913

Mike Maloney authored Aug 22, 2017

When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
timestamp corresponding to the highest sequence number data returned.

Previously the skb->tstamp is overwritten when a TCP packet is placed
in the out of order queue.  While the packet is in the ooo queue, save the
timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
options which are only used on the tx path, and a previously unused 4
byte hole.

When skbs are coalesced either in the sk_receive_queue or the
out_of_order_queue always choose the timestamp of the appended skb to
maintain the invariant of returning the timestamp of the last byte in
the recvmsg buffer.
Signed-off-by: Mike Maloney <maloney@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98aaa913

liquidio: change manner of detecting whether or not NIC firmware is loaded · b2854772

Felix Manlunas authored Aug 22, 2017

In the NIC firmware, the 1-bit flag indicating "firmware is loaded" moved
from SLI_SCRATCH_1 to SLI_SCRATCH_2 (these are Octeon general-purpose
scratch registers).  Make the PF driver conform to this change.

Remove code that sets the "firmware is loaded" flag because it's now the
firmware's job to do that.

In the code that detects whether or not the firmware is loaded, don't just
rely on checking the "firmware is loaded" flag because that may cause a
rare false negative.  Add code that deduces whether or not the firmware is
loaded; that will never give a false negative.

Also bump up driver version to match newer NIC firmware.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2854772

23 Aug, 2017 4 commits

gre: fix goto statement typo · e3d0328c

William Tu authored Aug 22, 2017

Fix typo: pnet_tap_faied.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3d0328c

Merge branch 'bpf-minor-cleanups' · 572a5767

David S. Miller authored Aug 22, 2017

Daniel Borkmann says:

====================
Two minor BPF cleanups

Two minor cleanups on devmap and redirect I still had
in my queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

572a5767

bpf: minor cleanups for dev_map · af4d045c

Daniel Borkmann authored Aug 23, 2017

Some minor code cleanups, while going over it I also noticed that
we're accounting the bitmap only for one CPU currently, so fix that
up as well.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

af4d045c

bpf: misc xdp redirect cleanups · e4a8e817

Daniel Borkmann authored Aug 23, 2017

Few cleanups including: bpf_redirect_map() is really XDP only due to
the return code. Move it to a more appropriate location where we do
the XDP redirect handling and change it's name into bpf_xdp_redirect_map()
to make it consistent to the bpf_xdp_redirect() helper.

xdp_do_redirect_map() helper can be static since only used out of filter.c
file. Drop the goto in xdp_do_generic_redirect() and only return errors
directly. In xdp_do_flush_map() only clear ri->map_to_flush which is the
arg we're using in that function, ri->map is cleared earlier along with
ri->ifindex.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4a8e817

22 Aug, 2017 3 commits

bpf: fix map value attribute for hash of maps · cd36c3a2

Daniel Borkmann authored Aug 23, 2017

Currently, iproute2's BPF ELF loader works fine with array of maps
when retrieving the fd from a pinned node and doing a selfcheck
against the provided map attributes from the object file, but we
fail to do the same for hash of maps and thus refuse to get the
map from pinned node.

Reason is that when allocating hash of maps, fd_htab_map_alloc() will
set the value size to sizeof(void *), and any user space map creation
requests are forced to set 4 bytes as value size. Thus, selfcheck
will complain about exposed 8 bytes on 64 bit archs vs. 4 bytes from
object file as value size. Contract is that fdinfo or BPF_MAP_GET_FD_BY_ID
returns the value size used to create the map.

Fix it by handling it the same way as we do for array of maps, which
means that we leave value size at 4 bytes and in the allocation phase
round up value size to 8 bytes. alloc_htab_elem() needs an adjustment
in order to copy rounded up 8 bytes due to bpf_fd_htab_map_update_elem()
calling into htab_map_update_elem() with the pointer of the map
pointer as value. Unlike array of maps where we just xchg(), we're
using the generic htab_map_update_elem() callback also used from helper
calls, which published the key/value already on return, so we need
to ensure to memcpy() the right size.

Fixes: bcc6b1b7 ("bpf: Add hash of maps support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd36c3a2

MIPS,bpf: fix missing break in switch statement · 4a00aa05

Colin Ian King authored Aug 22, 2017

There is a missing break causing a fall-through and setting
ctx.use_bbit_insns to the wrong value. Fix this by adding the
missing break.

Detected with cppcheck:
"Variable 'ctx.use_bbit_insns' is reassigned a value before the old
one has been used. 'break;' missing?"

Fixes: 8d8d18c3 ("MIPS,bpf: Fix using smp_processor_id() in preemptible splat.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a00aa05

net: sched: use kvmalloc() for class hash tables · 9695fe6f

Eric Dumazet authored Aug 22, 2017

High order GFP_KERNEL allocations can stress the host badly.

Use modern kvmalloc_array()/kvfree() instead of custom
allocations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9695fe6f