Commits · db48814bd2833ee231f48bd2030082369b88f9ef · Kirill Smelkov / linux

17 Jun, 2019 5 commits

libbpf: identify maps by section index in addition to offset · db48814b

Andrii Nakryiko authored Jun 17, 2019

To support maps to be defined in multiple sections, it's important to
identify map not just by offset within its section, but section index as
well. This patch adds tracking of section index.

For global data, we record section index of corresponding
.data/.bss/.rodata ELF section for uniformity, and thus don't need
a special value of offset for those maps.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

db48814b

libbpf: refactor map initialization · bf829271

Andrii Nakryiko authored Jun 17, 2019

User and global data maps initialization has gotten pretty complicated
and unnecessarily convoluted. This patch splits out the logic for global
data map and user-defined map initialization. It also removes the
restriction of pre-calculating how many maps will be initialized,
instead allowing to keep adding new maps as they are discovered, which
will be used later for BTF-defined map definitions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bf829271

libbpf: streamline ELF parsing error-handling · 01b29d1d

Andrii Nakryiko authored Jun 17, 2019

Simplify ELF parsing logic by exiting early, as there is no common clean
up path to execute. That makes it unnecessary to track when err was set
and when it was cleared. It also reduces nesting in some places.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

01b29d1d

libbpf: extract BTF loading logic · 9c6660d0

Andrii Nakryiko authored Jun 17, 2019

As a preparation for adding BTF-based BPF map loading, extract .BTF and
.BTF.ext loading logic.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9c6660d0

libbpf: add common min/max macro to libbpf_internal.h · d7fe74f9

Andrii Nakryiko authored Jun 17, 2019

Multiple files in libbpf redefine their own definitions for min/max.
Let's define them in libbpf_internal.h and use those everywhere.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d7fe74f9

14 Jun, 2019 11 commits

bpf: Fix build error without CONFIG_INET · 7f94208c

YueHaibing authored Jun 12, 2019

If CONFIG_INET is not set, building fails:

kernel/bpf/verifier.o: In function `check_mem_access':
verifier.c: undefined reference to `bpf_xdp_sock_is_valid_access'
kernel/bpf/verifier.o: In function `convert_ctx_accesses':
verifier.c: undefined reference to `bpf_xdp_sock_convert_ctx_access'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: fada7fdc ("bpf: Allow bpf_map_lookup_elem() on an xskmap")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7f94208c

selftests/bpf: convert socket_cookie test to sk storage · 69d96519

Stanislav Fomichev authored Jun 12, 2019

This lets us test that both BPF_PROG_TYPE_CGROUP_SOCK_ADDR and
BPF_PROG_TYPE_SOCK_OPS can access underlying bpf_sock.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

69d96519

bpf/tools: sync bpf.h · cd17d777

Stanislav Fomichev authored Jun 12, 2019

Add sk to struct bpf_sock_addr and struct bpf_sock_ops.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cd17d777

bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type · 1314ef56

Stanislav Fomichev authored Jun 12, 2019

And let it use bpf_sk_storage_{get,delete} helpers to access socket
storage. Kernel context (struct bpf_sock_ops_kern) already has sk
member, so I just expose it to the BPF hooks. I use
PTR_TO_SOCKET_OR_NULL and return NULL in !is_fullsock case.

I also export bpf_tcp_sock to make it possible to access tcp socket stats.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1314ef56

bpf: export bpf_sock for BPF_PROG_TYPE_CGROUP_SOCK_ADDR prog type · fb85c4a7

Stanislav Fomichev authored Jun 12, 2019

And let it use bpf_sk_storage_{get,delete} helpers to access socket
storage. Kernel context (struct bpf_sock_addr_kern) already has sk
member, so I just expose it to the BPF hooks. Using PTR_TO_SOCKET
instead of PTR_TO_SOCK_COMMON should be safe because the hook is
called on bind/connect.

Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

fb85c4a7

bpf: Add test for SO_REUSEPORT_DETACH_BPF · d30bd78c

Martin KaFai Lau authored Jun 13, 2019

This patch adds a test for the new sockopt SO_REUSEPORT_DETACH_BPF.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d30bd78c

bpf: Sync asm-generic/socket.h to tools/ · 13a748ea

Martin KaFai Lau authored Jun 13, 2019

SO_DETACH_REUSEPORT_BPF is needed for the test in the next patch.
It is defined in the socket.h.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

13a748ea

bpf: net: Add SO_DETACH_REUSEPORT_BPF · 99f3a064

Martin KaFai Lau authored Jun 13, 2019

There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH.
This patch adds SO_DETACH_REUSEPORT_BPF sockopt.  The same
sockopt can be used to undo both SO_ATTACH_REUSEPORT_[CE]BPF.

reseport_detach_prog() is added and it is mostly a mirror
of the existing reuseport_attach_prog().  The differences are,
it does not call reuseport_alloc() and returns -ENOENT when
there is no old prog.

Cc: Craig Gallek <kraig@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

99f3a064

libbpf: fix check for presence of associated BTF for map creation · e55d54f4

Andrii Nakryiko authored Jun 12, 2019

Kernel internally checks that either key or value type ID is specified,
before using btf_fd. Do the same in libbpf's map creation code for
determining when to retry map creation w/o BTF.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: fba01a06 ("libbpf: use negative fd to specify missing BTF")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e55d54f4

selftests/bpf: signedness bug in enable_all_controllers() · cabd3e88

Dan Carpenter authored Jun 13, 2019

The "len" variable needs to be signed for the error handling to work
properly.

Fixes: 596092ef ("selftests/bpf: enable all available cgroup v2 controllers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cabd3e88

samples/bpf: fix include path in Makefile · b552d33c

Prashant Bhole authored Jun 14, 2019

Recent commit included libbpf.h in selftests/bpf/bpf_util.h.
Since some samples use bpf_util.h and samples/bpf/Makefile doesn't
have libbpf.h path included, build was failing. Let's add the path
in samples/bpf/Makefile.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b552d33c

12 Jun, 2019 1 commit

bpf: silence warning messages in core · aee450cb

Valdis Klētnieks authored Jun 06, 2019

Compiling kernel/bpf/core.c with W=1 causes a flood of warnings:

kernel/bpf/core.c:1198:65: warning: initialized field overwritten [-Woverride-init]
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~
kernel/bpf/core.c:1198:65: note: (near initialization for 'public_insntable[12]')
 1198 | #define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
      |                                                                 ^~~~
kernel/bpf/core.c:1087:2: note: in expansion of macro 'BPF_INSN_3_TBL'
 1087 |  INSN_3(ALU, ADD,  X),   \
      |  ^~~~~~
kernel/bpf/core.c:1202:3: note: in expansion of macro 'BPF_INSN_MAP'
 1202 |   BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
      |   ^~~~~~~~~~~~

98 copies of the above.

The attached patch silences the warnings, because we *know* we're overwriting
the default initializer. That leaves bpf/core.c with only 6 other warnings,
which become more visible in comparison.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

aee450cb

11 Jun, 2019 12 commits

Merge branch 'bpf-libbpf-num-cpus' · 5e2ac390

Daniel Borkmann authored Jun 11, 2019

Hechao Li says:

====================
Getting number of possible CPUs is commonly used for per-CPU BPF maps
and perf_event_maps. Add a new API libbpf_num_possible_cpus() that
helps user with per-CPU related operations and remove duplicate
implementations in bpftool and selftests.

v2: Save errno before calling pr_warning in case it is changed.
v3: Make sure libbpf_num_possible_cpus never returns 0 so that user only
    has to check if ret value < 0.
v4: Fix error code when reading 0 bytes from possible CPU file.
v5: Fix selftests compliation issue.
v6: Split commit to reuse libbpf_num_possible_cpus() into two commits:
    One commit to remove bpf_util.h from test BPF C programs.
    One commit to reuse libbpf_num_possible_cpus() in bpftools
    and bpf_util.h.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5e2ac390

bpf: use libbpf_num_possible_cpus internally · 4c587c19

Hechao Li authored Jun 10, 2019

Use the newly added bpf_num_possible_cpus() in bpftool and selftests
and remove duplicate implementations.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4c587c19

selftests/bpf: remove bpf_util.h from BPF C progs · ebb88607

Hechao Li authored Jun 10, 2019

Though currently there is no problem including bpf_util.h in kernel
space BPF C programs, in next patch in this stack, I will reuse
libbpf_num_possible_cpus() in bpf_util.h thus include libbpf.h in it,
which will cause BPF C programs compile error. Therefore I will first
remove bpf_util.h from all test BPF programs.

This can also make it clear that bpf_util.h is a user-space utility
while bpf_helpers.h is a kernel space utility.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ebb88607

bpf: add a new API libbpf_num_possible_cpus() · 6446b315

Hechao Li authored Jun 10, 2019

Adding a new API libbpf_num_possible_cpus() that helps user with
per-CPU map operations.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

6446b315

selftests/bpf : clean up feature/ when make clean · 89cceaa9

Hechao Li authored Jun 10, 2019

An error "implicit declaration of function 'reallocarray'" can be thrown
with the following steps:

$ cd tools/testing/selftests/bpf
$ make clean && make CC=<Path to GCC 4.8.5>
$ make clean && make CC=<Path to GCC 7.x>

The cause is that the feature folder generated by GCC 4.8.5 is not
removed, leaving feature-reallocarray being 1, which causes reallocarray
not defined when re-compliing with GCC 7.x. This diff adds feature
folder to EXTRA_CLEAN to avoid this problem.

v2: Rephrase the commit message.
Signed-off-by: Hechao Li <hechaol@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

89cceaa9

selftests/bpf: fix constness of source arg for bpf helpers · c7cebffe

Andrii Nakryiko authored Jun 10, 2019

Fix signature of bpf_probe_read and bpf_probe_write_user to mark source
pointer as const. This causes warnings during compilation for
applications relying on those helpers.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c7cebffe

samples: bpf: don't run probes at the local make stage · 0ed3cc4a

Jakub Kicinski authored Jun 07, 2019

Quentin reports that commit 07c3bbdb ("samples: bpf: print
a warning about headers_install") is producing the false
positive when make is invoked locally, from the samples/bpf/
directory.

When make is run locally it hits the "all" target, which
will recursively invoke make through the full build system.

Speed up the "local" run which doesn't actually build anything,
and avoid false positives by skipping all the probes if not in
kbuild environment (cover both the new warning and the BTF
probes).
Reported-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0ed3cc4a

Merge branch 'xskmap-lookup' · ab1b3a95

Alexei Starovoitov authored Jun 10, 2019

Jonathan Lemon says:

====================
Currently, the AF_XDP code uses a separate map in order to
determine if an xsk is bound to a queue.  Have the xskmap
lookup return a XDP_SOCK pointer on the kernel side, which
the verifier uses to extract relevant values.

Patches:
 1 - adds XSK_SOCK type
 2 - sync bpf.h with tools
 3 - add tools selftest
 4 - update lib/bpf, removing qidconf

v4->v5:
 - xskmap lookup now returns XDP_SOCK type instead of pointer to element.
 - no changes lib/bpf/xsk.c

v3->v4:
 - Clarify error handling path.

v2->v3:
 - Use correct map type.
====================
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ab1b3a95

libbpf: remove qidconf and better support external bpf programs. · 10a13bb4

Jonathan Lemon authored Jun 06, 2019

Use the recent change to XSKMAP bpf_map_lookup_elem() to test if
there is a xsk present in the map instead of duplicating the work
with qidconf.

Fix things so callers using XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD
bypass any internal bpf maps, so xsk_socket__{create|delete} works
properly.

Clean up error handling path.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

10a13bb4

tools/bpf: Add bpf_map_lookup_elem selftest for xskmap · 940e7be3

Jonathan Lemon authored Jun 06, 2019

Check that bpf_map_lookup_elem lookup and structure
access operats correctly.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

940e7be3

bpf/tools: sync bpf.h · 91eda599

Jonathan Lemon authored Jun 06, 2019

Sync uapi/linux/bpf.h
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

91eda599

bpf: Allow bpf_map_lookup_elem() on an xskmap · fada7fdc

Jonathan Lemon authored Jun 06, 2019

Currently, the AF_XDP code uses a separate map in order to
determine if an xsk is bound to a queue.  Instead of doing this,
have bpf_map_lookup_elem() return a xdp_sock.

Rearrange some xdp_sock members to eliminate structure holes.

Remove selftest - will be added back in later patch.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fada7fdc

06 Jun, 2019 2 commits

bpf: allow CGROUP_SKB programs to use bpf_skb_cgroup_id() helper · 4ecabd55

Roman Gushchin authored Jun 06, 2019

Currently bpf_skb_cgroup_id() is not supported for CGROUP_SKB
programs. An attempt to load such a program generates an error
like this:

    libbpf:
    0: (b7) r6 = 0
    ...
    9: (85) call bpf_skb_cgroup_id#79
    unknown func bpf_skb_cgroup_id#79

There are no particular reasons for denying it, and we have some
use cases where it might be useful.

So let's add it to the list of allowed helpers.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4ecabd55

samples: bpf: print a warning about headers_install · 07c3bbdb

Jakub Kicinski authored Jun 05, 2019

It seems like periodically someone posts patches to "fix"
header includes.  The issue is that samples expect the
include path to have the uAPI headers (from usr/) first,
and then tools/ headers, so that locally installed uAPI
headers take precedence.  This means that if users didn't
run headers_install they will see all sort of strange
compilation errors, e.g.:

  HOSTCC  samples/bpf/test_lru_dist
  samples/bpf/test_lru_dist.c:39:8: error: redefinition of ‘struct list_head’
   struct list_head {
          ^~~~~~~~~
   In file included from samples/bpf/test_lru_dist.c:9:0:
   ../tools/include/linux/types.h:69:8: note: originally defined here
    struct list_head {
           ^~~~~~~~~

Try to detect this situation, and print a helpful warning.

v2: just use HOSTCC (Jiong).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

07c3bbdb

04 Jun, 2019 2 commits

bpf: remove redundant assignment to err · 6685699e

Colin Ian King authored Jun 04, 2019

The variable err is assigned with the value -EINVAL that is never
read and it is re-assigned a new value later on.  The assignment is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

6685699e

bpf: hbm: fix spelling mistake "notifcations" -> "notificiations" · 2ed99339

Colin Ian King authored Jun 03, 2019

There is a spelling mistake in the help information, fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2ed99339

01 Jun, 2019 4 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 0462eaac

David S. Miller authored May 31, 2019

Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-05-31

The following pull-request contains BPF updates for your *net-next* tree.

Lots of exciting new features in the first PR of this developement cycle!
The main changes are:

1) misc verifier improvements, from Alexei.

2) bpftool can now convert btf to valid C, from Andrii.

3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs.
   This feature greatly improves BPF speed on 32-bit architectures. From Jiong.

4) cgroups will now auto-detach bpf programs. This fixes issue of thousands
   bpf programs got stuck in dying cgroups. From Roman.

5) new bpf_send_signal() helper, from Yonghong.

6) cgroup inet skb programs can signal CN to the stack, from Lawrence.

7) miscellaneous cleanups, from many developers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0462eaac

selftests/bpf: measure RTT from xdp using xdping · cd538502

Alan Maguire authored May 31, 2019

xdping allows us to get latency estimates from XDP.  Output looks
like this:

./xdping -I eth4 192.168.55.8
Setting up XDP for eth4, please wait...
XDP setup disrupts network connectivity, hit Ctrl+C to quit

Normal ping RTT data
[Ignore final RTT; it is distorted by XDP using the reply]
PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms

4 packets transmitted, 4 received, 0% packet loss, time 3079ms
rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms

XDP RTT data:
64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms

The xdping program loads the associated xdping_kern.o BPF program
and attaches it to the specified interface.  If run in client
mode (the default), it will add a map entry keyed by the
target IP address; this map will store RTT measurements, current
sequence number etc.  Finally in client mode the ping command
is executed, and the xdping BPF program will use the last ICMP
reply, reformulate it as an ICMP request with the next sequence
number and XDP_TX it.  After the reply to that request is received
we can measure RTT and repeat until the desired number of
measurements is made.  This is why the sequence numbers in the
normal ping are 1, 2, 3 and 8.  We XDP_TX a modified version
of ICMP reply 4 and keep doing this until we get the 4 replies
we need; hence the networking stack only sees reply 8, where
we have XDP_PASSed it upstream since we are done.

In server mode (-s), xdping simply takes ICMP requests and replies
to them in XDP rather than passing the request up to the networking
stack.  No map entry is required.

xdping can be run in native XDP mode (the default, or specified
via -N) or in skb mode (-S).

A test program test_xdping.sh exercises some of these options.

Note that native XDP does not seem to XDP_TX for veths, hence -N
is not tested.  Looking at the code, it looks like XDP_TX is
supported so I'm not sure if that's expected.  Running xdping in
native mode for ixgbe as both client and server works fine.

Changes since v4

- close fds on cleanup (Song Liu)

Changes since v3

- fixed seq to be __be16 (Song Liu)
- fixed fd checks in xdping.c (Song Liu)

Changes since v2

- updated commit message to explain why seq number of last
  ICMP reply is 8 not 4 (Song Liu)
- updated types of seq number, raddr and eliminated csum variable
  in xdpclient/xdpserver functions as it was not needed (Song Liu)
- added XDPING_DEFAULT_COUNT definition and usage specification of
  default/max counts (Song Liu)

Changes since v1
 - moved from RFC to PATCH
 - removed unused variable in ipv4_csum() (Song Liu)
 - refactored ICMP checks into icmp_check() function called by client
   and server programs and reworked client and server programs due
   to lack of shared code (Song Liu)
 - added checks to ensure that SKB and native mode are not requested
   together (Song Liu)
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

cd538502

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 33aae282

David S. Miller authored May 31, 2019

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-05-31

This series contains updates to the iavf driver.

Nathan Chancellor converts the use of gnu_printf to printf.

Aleksandr modifies the driver to limit the number of RSS queues to the
number of online CPUs in order to avoid creating misconfigured RSS
queues.

Gustavo A. R. Silva converts a couple of instances where sizeof() can be
replaced with struct_size().

Alice makes the remaining changes to the iavf driver to cleanup all the
old "i40evf" references in the driver to iavf, including the file names
that still contained the old driver reference.  There was no functional
changes made, just cosmetic to reduce any confusion going forward now
that the iavf driver is the virtual function driver for both i40e and
ice drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

33aae282

bpf: doc: update answer for 32-bit subregister question · c231c22a

Jiong Wang authored May 30, 2019

There has been quite a few progress around the two steps mentioned in the
answer to the following question:

  Q: BPF 32-bit subregister requirements

This patch updates the answer to reflect what has been done.

v2:
 - Add missing full stop. (Song Liu)
 - Minor tweak on one sentence. (Song Liu)

v1:
 - Integrated rephrase from Quentin and Jakub
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c231c22a

31 May, 2019 3 commits

Merge branch 'map-charge-cleanup' · d168286d

Alexei Starovoitov authored May 31, 2019

Roman Gushchin says:

====================
During my work on memcg-based memory accounting for bpf maps
I've done some cleanups and refactorings of the existing
memlock rlimit-based code. It makes it more robust, unifies
size to pages conversion, size checks and corresponding error
codes. Also it adds coverage for cgroup local storage and
socket local storage maps.

It looks like some preliminary work on the mm side might be
required to start working on the memcg-based accounting,
so I'm sending these patches as a separate patchset.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d168286d

bpf: move memory size checks to bpf_map_charge_init() · c85d6913

Roman Gushchin authored May 29, 2019

Most bpf map types doing similar checks and bytes to pages
conversion during memory allocation and charging.

Let's unify these checks by moving them into bpf_map_charge_init().
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c85d6913

bpf: rework memlock-based memory accounting for maps · b936ca64

Roman Gushchin authored May 29, 2019

In order to unify the existing memlock charging code with the
memcg-based memory accounting, which will be added later, let's
rework the current scheme.

Currently the following design is used:
  1) .alloc() callback optionally checks if the allocation will likely
     succeed using bpf_map_precharge_memlock()
  2) .alloc() performs actual allocations
  3) .alloc() callback calculates map cost and sets map.memory.pages
  4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
     and performs actual charging; in case of failure the map is
     destroyed
  <map is in use>
  1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
     performs uncharge and releases the user
  2) .map_free() callback releases the memory

The scheme can be simplified and made more robust:
  1) .alloc() calculates map cost and calls bpf_map_charge_init()
  2) bpf_map_charge_init() sets map.memory.user and performs actual
    charge
  3) .alloc() performs actual allocations
  <map is in use>
  1) .map_free() callback releases the memory
  2) bpf_map_charge_finish() performs uncharge and releases the user

The new scheme also allows to reuse bpf_map_charge_init()/finish()
functions for memcg-based accounting. Because charges are performed
before actual allocations and uncharges after freeing the memory,
no bogus memory pressure can be created.

In cases when the map structure is not available (e.g. it's not
created yet, or is already destroyed), on-stack bpf_map_memory
structure is used. The charge can be transferred with the
bpf_map_charge_move() function.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b936ca64