Commits · bb23581b9b38703257acabd520aa5ebf1db008af · nexedi / linux

12 Apr, 2019 1 commit

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · bb23581b

David S. Miller authored Apr 11, 2019

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-12

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Improve BPF verifier scalability for large programs through two
   optimizations: i) remove verifier states that are not useful in pruning,
   ii) stop walking parentage chain once first LIVE_READ is seen. Combined
   gives approx 20x speedup. Increase limits for accepting large programs
   under root, and add various stress tests, from Alexei.

2) Implement global data support in BPF. This enables static global variables
   for .data, .rodata and .bss sections to be properly handled which allows
   for more natural program development. This also opens up the possibility
   to optimize program workflow by compiling ELFs only once and later only
   rewriting section data before reload, from Daniel and with test cases and
   libbpf refactoring from Joe.

3) Add config option to generate BTF type info for vmlinux as part of the
   kernel build process. DWARF debug info is converted via pahole to BTF.
   Latter relies on libbpf and makes use of BTF deduplication algorithm which
   results in 100x savings compared to DWARF data. Resulting .BTF section is
   typically about 2MB in size, from Andrii.

4) Add BPF verifier support for stack access with variable offset from
   helpers and add various test cases along with it, from Andrey.

5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
   so that L2 encapsulation can be used for tc tunnels, from Alan.

6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
   users can define a subset of allowed __sk_buff fields that get fed into
   the test program, from Stanislav.

7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
   various UBSAN warnings in bpftool, from Yonghong.

8) Generate a pkg-config file for libbpf, from Luca.

9) Dump program's BTF id in bpftool, from Prashant.

10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
    program, from Magnus.

11) kallsyms related fixes for the case when symbols are not present in
    BPF selftests and samples, from Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bb23581b

11 Apr, 2019 34 commits

bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN · 947e8b59

Stanislav Fomichev authored Apr 11, 2019

This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
and be sure that nobody is erroneously setting ctx_{in,out}.

Fixes: b0b9395d ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

947e8b59

tools: add smp_* barrier variants to include infrastructure · 6b7a2114

Daniel Borkmann authored Apr 09, 2019

Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure: this patch adds the implementation
for x86-64 and arm64, and have it fall back as currently is for
other archs which do not have it implemented at this point. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.

This is on top of 09d62154 ("tools, perf: add and use optimized
ring_buffer_{read_head, write_tail} helpers"), which didn't touch
smp_* barrier implementations. Magnus recently rightfully reported
however that the latter on x86-64 still wrongly falls back to sfence,
lfence and mfence respectively, thus fix that for applications under
tools making use of these to avoid such ugly surprises. The main
header under tools (include/asm/barrier.h) will in that case not
select the fallback implementation.
Reported-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6b7a2114

Merge branch 'ipv6-Refactor-nexthop-selection-helpers-during-a-fib-lookup' · 78f07ada

David S. Miller authored Apr 11, 2019

David Ahern says:

====================
ipv6: Refactor nexthop selection helpers during a fib lookup

IPv6 has a fib6_nh embedded within each fib6_info and a separate
fib6_info for each path in a multipath route. A side effect is that
a fib6_info is passed all the way down the stack when selecting a path
on a fib lookup. Refactor the fib lookup functions and associated
helper functions to take a fib6_nh when appropriate to enable IPv6
to work with nexthop objects where the fib6_nh is not directly part
of a fib entry.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

78f07ada

ipv6: Refactor __ip6_route_redirect · 0b34eb00

David Ahern authored Apr 09, 2019

Move the nexthop evaluation of a fib entry to a helper that can be
leveraged for each fib6_nh in a multipath nexthop object.

In the move, 'continue' statements means the helper returns false
(loop should continue) and 'break' means return true (found the entry
of interest).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b34eb00

ipv6: Refactor rt6_device_match · 0c59d006

David Ahern authored Apr 09, 2019

Move the device and gateway checks in the fib6_next loop to a helper
that can be called per fib6_nh entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c59d006

ipv6: Move fib6_multipath_select down in ip6_pol_route · d83009d4

David Ahern authored Apr 09, 2019

Move the siblings and fib6_multipath_select after the null entry check
since a null entry can not have siblings.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d83009d4

ipv6: Be smarter with null_entry handling in ip6_pol_route_lookup · af52a52c

David Ahern authored Apr 09, 2019

Clean up the fib6_null_entry handling in ip6_pol_route_lookup.
rt6_device_match can return fib6_null_entry, but fib6_multipath_select
can not. Consolidate the fib6_null_entry handling and on the final
null_entry check set rt and goto out - no need to defer to a second
check after rt6_find_cached_rt.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af52a52c

ipv6: Refactor find_rr_leaf · 30c15f03

David Ahern authored Apr 09, 2019

find_rr_leaf has 3 loops over fib_entries calling find_match. The loops
are very similar with differences in start point and whether the metric
is evaluated:
    1. start at rr_head, no extra loop compare, check fib metric
    2. start at leaf, compare rt against rr_head, check metric
    3. start at cont (potential saved point from earlier loops), no
       extra loop compare, no metric check

Create 1 loop that is called 3 different times. This will make a
later change with multipath nexthop objects much simpler.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30c15f03

ipv6: Refactor find_match · 28679ed1

David Ahern authored Apr 09, 2019

find_match primarily needs a fib6_nh (and fib6_flags which it passes
through to rt6_score_route). Move fib6_check_expired up to the call
sites so find_match is only called for relevant entries. Remove the
match argument which is mostly a pass through and use the return
boolean to decide if match gets set in the call sites.

The end result is a helper that can be called per fib6_nh struct
which is needed once fib entries reference nexthop objects that
have more than one fib6_nh.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28679ed1

ipv6: Pass fib6_nh and flags to rt6_score_route · 702cea56

David Ahern authored Apr 09, 2019

rt6_score_route only needs the fib6_flags and nexthop data. Change
it accordingly. Allows re-use later for nexthop based fib6_nh.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

702cea56

ipv6: Change rt6_probe to take a fib6_nh · cc3a86c8

David Ahern authored Apr 09, 2019

rt6_probe sends probes for gateways in a nexthop. As such it really
depends on a fib6_nh, not a fib entry. Move last_probe to fib6_nh and
update rt6_probe to a fib6_nh struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc3a86c8

ipv6: Remove rt6_check_dev · 6e1809a5

David Ahern authored Apr 09, 2019

rt6_check_dev is a simpler helper with only 1 caller. Fold the code
into rt6_score_route.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e1809a5

ipv6: Only call rt6_check_neigh for nexthop with gateway · 1ba9a895

David Ahern authored Apr 09, 2019

Change rt6_check_neigh to take a fib6_nh instead of a fib entry.
Move the check on fib_flags and whether the nexthop has a gateway
up to the one caller.

Remove the inline from the definition as well. Not necessary.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ba9a895

dns: remove redundant zero length namelen check · 62720b12

Colin Ian King authored Apr 09, 2019

The zero namelen check is redundant as it has already been checked
for zero at the start of the function.  Remove the redundant check.

Addresses-Coverity: ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62720b12

Merge branch 'bpf-l2-encap' · 94c59aab

Daniel Borkmann authored Apr 11, 2019

Alan Maguire says:

====================
Extend bpf_skb_adjust_room growth to mark inner MAC header so
that L2 encapsulation can be used for tc tunnels.

Patch #1 extends the existing test_tc_tunnel to support UDP
encapsulation; later we want to be able to test MPLS over UDP
and MPLS over GRE encapsulation.

Patch #2 adds the BPF_F_ADJ_ROOM_ENCAP_L2(len) macro, which
allows specification of inner mac length.  Other approaches were
explored prior to taking this approach.  Specifically, I tried
automatically computing the inner mac length on the basis of the
specified flags (so inner maclen for GRE/IPv4 encap is the len_diff
specified to bpf_skb_adjust_room minus GRE + IPv4 header length
for example).  Problem with this is that we don't know for sure
what form of GRE/UDP header we have; is it a full GRE header,
or is it a FOU UDP header or generic UDP encap header? My fear
here was we'd end up with an explosion of flags.  The other approach
tried was to support inner L2 header marking as a separate room
adjustment, i.e. adjust for L3/L4 encap, then call
bpf_skb_adjust_room for L2 encap.  This can be made to work but
because it imposed an order on operations, felt a bit clunky.

Patch #3 syncs tools/ bpf.h.

Patch #4 extends the tests again to support MPLSoverGRE,
MPLSoverUDP, and transparent ethernet bridging (TEB) where
the inner L2 header is an ethernet header.  Testing of BPF
encap against tunnels is done for cases where configuration
of such tunnels is possible (MPLSoverGRE[6], MPLSoverUDP,
gre[6]tap), and skipped otherwise.  Testing of BPF encap/decap
is always carried out.

Changes since v2:
 - updated tools/testing/selftest/bpf/config with FOU/MPLS CONFIG
   variables (patches 1, 4)
 - reduced noise in patch 1 by avoiding unnecessary movement of code
 - eliminated inner_mac variable in bpf_skb_net_grow (patch 2)

Changes since v1:
 - fixed formatting of commit references.
 - BPF_F_ADJ_ROOM_FIXED_GSO flag enabled on all variants (patch 1)
 - fixed fou6 options for UDP encap; checksum errors observed were
   due to the fact fou6 tunnel was not set up with correct ipproto
   options (41 -6).  0 checksums work fine (patch 1)
 - added definitions for mask and shift used in setting L2 length
   (patch 2)
 - allow udp encap with fixed GSO (patch 2)
 - changed "elen" to "l2_len" to be more descriptive (patch 4)
====================
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

94c59aab

selftests_bpf: add L2 encap to test_tc_tunnel · 3ec61df8

Alan Maguire authored Apr 09, 2019

Update test_tc_tunnel to verify adding inner L2 header
encapsulation (an MPLS label or ethernet header) works.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3ec61df8

bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2 · 1db04c30

Alan Maguire authored Apr 09, 2019

Sync include/uapi/linux/bpf.h with tools/ equivalent to add
BPF_F_ADJ_ROOM_ENCAP_L2(len) macro.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1db04c30

bpf: add layer 2 encap support to bpf_skb_adjust_room · 58dfc900

Alan Maguire authored Apr 09, 2019

commit 868d5235 ("bpf: add bpf_skb_adjust_room encap flags")
introduced support to bpf_skb_adjust_room for GSO-friendly GRE
and UDP encapsulation.

For GSO to work for skbs, the inner headers (mac and network) need to
be marked.  For L3 encapsulation using bpf_skb_adjust_room, the mac
and network headers are identical.  Here we provide a way of specifying
the inner mac header length for cases where L2 encap is desired.  Such
an approach can support encapsulated ethernet headers, MPLS headers etc.
For example to convert from a packet of form [eth][ip][tcp] to
[eth][ip][udp][inner mac][ip][tcp], something like the following could
be done:

	headroom = sizeof(iph) + sizeof(struct udphdr) + inner_maclen;

	ret = bpf_skb_adjust_room(skb, headroom, BPF_ADJ_ROOM_MAC,
				  BPF_F_ADJ_ROOM_ENCAP_L4_UDP |
				  BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 |
				  BPF_F_ADJ_ROOM_ENCAP_L2(inner_maclen));
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

58dfc900

selftests_bpf: extend test_tc_tunnel for UDP encap · 166b5a7f

Alan Maguire authored Apr 09, 2019

commit 868d5235 ("bpf: add bpf_skb_adjust_room encap flags")
introduced support to bpf_skb_adjust_room for GSO-friendly GRE
and UDP encapsulation and later introduced associated test_tc_tunnel
tests. Here those tests are extended to cover UDP encapsulation also.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

166b5a7f

tipc: use standard write_lock & unlock functions when creating node · 909620ff

Jon Maloy authored Apr 11, 2019

In the function tipc_node_create() we protect the peer capability field
by using the node rw_lock. However, we access the lock directly instead
of using the dedicated functions for this, as we do everywhere else in
node.c. This cosmetic spot is fixed here.

Fixes: 40999f11 ("tipc: make link capability update thread safe")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

909620ff

bpf: fix missing bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUN · c695865c

Stanislav Fomichev authored Apr 11, 2019

Commit b0b9395d ("bpf: support input __sk_buff context in
BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in
BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined
for !CONFIG_BPF_SYSCALL:

net/bpf/test_run.c: In function ‘bpf_ctx_init’:
net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration]
   err = bpf_check_uarg_tail_zero(data_in, max_size, size);
         ^~~~~~~~~~~~~~~~~~~~~~~~

Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: b0b9395d ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

c695865c

net: sched: flower: use correct ht function to prevent duplicates · 9e35552a

Vlad Buslov authored Apr 11, 2019

Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.

Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e35552a

netns: read NETNSA_NSID as s32 attribute in rtnl_net_getid() · ecce39ec

Guillaume Nault authored Apr 11, 2019

NETNSA_NSID is signed. Use nla_get_s32() to avoid confusion.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecce39ec

selftests: bpf: add selftest for __sk_buff context in BPF_PROG_TEST_RUN · 3daf8e70

Stanislav Fomichev authored Apr 09, 2019

Simple test that sets cb to {1,2,3,4,5} and priority to 6, runs bpf
program that fails if cb is not what we expect and increments cb[i] and
priority. When the test finishes, we check that cb is now {2,3,4,5,6}
and priority is 7.

We also test the sanity checks:
* ctx_in is provided, but ctx_size_in is zero (same for
  ctx_out/ctx_size_out)
* unexpected non-zero fields in __sk_buff return EINVAL
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3daf8e70

libbpf: add support for ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN · 5e903c65

Stanislav Fomichev authored Apr 09, 2019

Support recently introduced input/output context for test runs.
We extend only bpf_prog_test_run_xattr. bpf_prog_test_run is
unextendable and left as is.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5e903c65

bpf: support input __sk_buff context in BPF_PROG_TEST_RUN · b0b9395d

Stanislav Fomichev authored Apr 09, 2019

Add new set of arguments to bpf_attr for BPF_PROG_TEST_RUN:
* ctx_in/ctx_size_in - input context
* ctx_out/ctx_size_out - output context

The intended use case is to pass some meta data to the test runs that
operate on skb (this has being brought up on recent LPC).

For programs that use bpf_prog_test_run_skb, support __sk_buff input and
output. Initially, from input __sk_buff, copy _only_ cb and priority into
skb, all other non-zero fields are prohibited (with EINVAL).
If the user has set ctx_out/ctx_size_out, copy the potentially modified
__sk_buff back to the userspace.

We require all fields of input __sk_buff except the ones we explicitly
support to be set to zero. The expectation is that in the future we might
add support for more fields and we want to fail explicitly if the user
runs the program on the kernel where we don't yet support them.

The API is intentionally vague (i.e. we don't explicitly add __sk_buff
to bpf_attr, but ctx_in) to potentially let other test_run types use
this interface in the future (this can be xdp_md for xdp types for
example).

v4:
  * don't copy more than allowed in bpf_ctx_init [Martin]

v3:
  * handle case where ctx_in is NULL, but ctx_out is not [Martin]
  * convert size==0 checks to ptr==NULL checks and add some extra ptr
    checks [Martin]

v2:
  * Addressed comments from Martin Lau
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b0b9395d

tools/bpftool: show btf id in program information · 569b0c77

Prashant Bhole authored Apr 10, 2019

Let's add a way to know whether a program has btf context.
Patch adds 'btf_id' in the output of program listing.
When btf_id is present, it means program has btf context.

Sample output:
user@test# bpftool prog list
25: xdp  name xdp_prog1  tag 539ec6ce11b52f98  gpl
	loaded_at 2019-04-10T11:44:20+0900  uid 0
	xlated 488B  not jited  memlock 4096B  map_ids 23
	btf_id 1
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

569b0c77

libbpf: Fix build with gcc-8 · d5adbdd7

Andrey Ignatov authored Apr 10, 2019

Reported in [1].

With gcc 8.3.0 the following error is issued:

  cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi
  -fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer
  -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2
  -Werror -Wall -Wno-pointer-arith -Wno-sign-compare  -MD -MQ
  'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o
  'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c
  ../src/libbpf.c: In function 'bpf_object__elf_collect':
  ../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this
  function [-Werror=maybe-uninitialized]
     if (map_def_sz <= sizeof(struct bpf_map_def)) {
         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ../src/libbpf.c:827:18: note: 'map_def_sz' was declared here
    int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
                    ^~~~~~~~~~

According to [2] -Wmaybe-uninitialized is enabled by -Wall.
Same error is generated by clang's -Wconditional-uninitialized.

[1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d5adbdd7

mailmap: add entry for email addresses · fa0dcb3f

Daniel Borkmann authored Apr 10, 2019

Redirect email addresses from git log to the mainly used ones
for Alexei and myself such that it is consistent with the ones
in MAINTAINERS file. Useful in particular when git mailmap is
enabled on broader scope, for example:

  $ git config --global log.mailmap true
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>

fa0dcb3f

net: fou: remove redundant code in gue_udp_recv · 526bb57a

Lorenzo Bianconi authored Apr 09, 2019

Remove not useful protocol version check in gue_udp_recv since just
gue version 0 can hit that code. Moreover remove duplicated hdrlen
computation
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

526bb57a

fou: correct spelling of encapsulation · c9d52f21

Simon Horman authored Apr 09, 2019

Correct spelling of encapsulation.
Found by inspection.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9d52f21

Merge branch 'net-sched-taprio-fix-picos_per_byte-miscalculation' · b8c7e2c3

David S. Miller authored Apr 10, 2019

Leandro Dorileo says:

====================
net/sched: taprio: fix picos_per_byte miscalculation

This set fixes miscalculations based on invalid link speed values.

Changes in v6:
 + Avoid locking a spinlock while calling __ethtool_get_link_ksettings()
   (suggested by: Cong Wang);

Changes in v5:
 + Don't iterate over all the net_device maintained list (suggested by: Florian Fainelli);

Changes in v4:
 + converted pr_info calls to netdev_dbg (suggested by: Florian Fainelli);

Changes in v3:
 + yet pr_info() format warnings;

Changes in v2:
 + fixed pr_info() format both on cbs and taprio patches;
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8c7e2c3

net/sched: cbs: fix port_rate miscalculation · e0a7683d

Leandro Dorileo authored Apr 08, 2019

The Credit Based Shaper heavily depends on link speed to calculate
the scheduling credits, we can't properly calculate the credits if the
device has failed to report the link speed.

In that case we can't dequeue packets assuming a wrong port rate that will
result into an inconsistent credit distribution.

This patch makes sure we fail to dequeue case:

1) __ethtool_get_link_ksettings() reports error or 2) the ethernet driver
failed to set the ksettings' speed value (setting link speed to
SPEED_UNKNOWN).

Additionally we properly re calculate the port rate whenever the link speed
is changed.

Fixes: 3d0bd028 ("net/sched: Add support for HW offloading for CBS")
Signed-off-by: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Reviewed-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0a7683d

net/sched: taprio: fix picos_per_byte miscalculation · 7b9eba7b

Leandro Dorileo authored Apr 08, 2019

The Time Aware Priority Scheduler is heavily dependent to link speed,
it relies on it to calculate transmission bytes per cycle, we can't
properly calculate the so called budget if the device has failed
to report the link speed.

In that case we can't dequeue packets assuming a wrong budget.
This patch makes sure we fail to dequeue case:

1) __ethtool_get_link_ksettings() reports error or 2) the ethernet
driver failed to set the ksettings' speed value (setting link speed
to SPEED_UNKNOWN).

Additionally we re calculate the budget whenever the link speed is
changed.

Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Reviewed-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b9eba7b

10 Apr, 2019 5 commits

net: strparser: fix comment · 93e21254

Jakub Kicinski authored Apr 10, 2019

Fix comment.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93e21254

ipv4: Handle RTA_GATEWAY set to 0 · d73f80f9

David Ahern authored Apr 10, 2019

Govindarajulu reported a regression with Network Manager which sends an
RTA_GATEWAY attribute with the address set to 0. Fixup the handling of
RTA_GATEWAY to only set fc_gw_family if the gateway address is actually
set.

Fixes: f35b794b ("ipv4: Prepare fib_config for IPv6 gateway")
Reported-by: Govindarajulu Varadarajan <govind.varadar@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d73f80f9

Merge branch 'net-sched-move-back-qlen-to-per-CPU-accounting' · 44b9b6ca

David S. Miller authored Apr 10, 2019

Paolo Abeni says:

====================
net: sched: move back qlen to per CPU accounting

The commit 46b1c18f ("net: sched: put back q.qlen into a single location")
introduced some measurable regression in the contended scenarios for
lock qdisc.

As Eric suggested we could replace q.qlen access with calls to qdisc_is_empty()
in the datapath and revert the above commit. The TC subsystem updates
qdisc->is_empty in a somewhat loose way: notably 'is_empty' is set only when
the qdisc dequeue() calls return a NULL ptr. That is, the invocation after
the last packet is dequeued.

The above is good enough for BYPASS implementation - the only downside is that
we end up avoiding the optimization for a very small time-frame - but will
break hard things when internal structures consistency for classful qdisc
relies on child qdisc_is_empty().

A more strict 'is_empty' update adds a relevant complexity to its life-cycle, so
this series takes a different approach: we allow lockless qdisc to switch from
per CPU accounting to global stats accounting when the NOLOCK bit is cleared.
Since most pieces of infrastructure are already in place, this requires very
little changes to the pfifo_fast qdisc, and any later NOLOCK qdisc can hook
there with little effort - no need to maintain two different implementations.

The first 2 patches removes direct qlen access from non core TC code, the 3rd
and 4th patches place and use the infrastructure to allow stats account
switching and the 5th patch is the actual revert.

 v1 -> v2:
  - fixed build issues
  - more descriptive commit message for patch 5/5
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

44b9b6ca

Revert: "net: sched: put back q.qlen into a single location" · 73eb628d

Paolo Abeni authored Apr 10, 2019

This revert commit 46b1c18f ("net: sched: put back q.qlen into
a single location").
After the previous patch, when a NOLOCK qdisc is enslaved to a
locking qdisc it switches to global stats accounting. As a consequence,
when a classful qdisc accesses directly a child qdisc's qlen, such
qdisc is not doing per CPU accounting and qlen value is consistent.

In the control path nobody uses directly qlen since commit
e5f0e8f8 ("net: sched: introduce and use qdisc tree flush/purge
helpers"), so we can remove the contented atomic ops from the
datapath.

v1 -> v2:
 - complete the qdisc_qstats_atomic_qlen_dec() ->
   qdisc_qstats_cpu_qlen_dec() replacement, fix build issue
 - more descriptive commit message
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73eb628d

net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too · 8a53e616

Paolo Abeni authored Apr 10, 2019

Since stats updating is always consistent with TCQ_F_CPUSTATS flag,
we can disable it at qdisc creation time flipping such bit.

In my experiments, if the NOLOCK flag is cleared, per CPU stats
accounting does not give any measurable performance gain, but it
waste some memory.

Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving
a NOLOCK qdisc to 'lock' one.

Use stats update helper inside pfifo_fast, to cope correctly with
TCQ_F_CPUSTATS flag change.

As a side effect, q.qlen value for any child qdiscs is always
consistent for all lock classfull qdiscs.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a53e616