Commits · 9f30cd568b392051e0dcfa69b7f8d6021b41ed2a · nexedi / linux

09 Aug, 2019 5 commits

Merge branch 'bpf-xdp-fwd-sample-improvements' · 9f30cd56

Daniel Borkmann authored Aug 09, 2019

Jesper Dangaard Brouer says:

====================
V3: Hopefully fixed all issues point out by Yonghong Song

V2: Addressed issues point out by Yonghong Song
 - Please ACK patch 2/3 again
 - Added ACKs and reviewed-by to other patches

This patchset is focused on improvements for XDP forwarding sample
named xdp_fwd, which leverage the existing FIB routing tables as
described in LPC2018[1] talk by David Ahern.

The primary motivation is to illustrate how Toke's recent work
improves usability of XDP_REDIRECT via lookups in devmap. The other
patches are to help users understand the sample.

I have more improvements to xdp_fwd, but those might requires changes
to libbpf.  Thus, sending these patches first as they are isolated.

[1] http://vger.kernel.org/lpc-networking2018.html#session-1
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9f30cd56

samples/bpf: xdp_fwd explain bpf_fib_lookup return codes · abcce733

Jesper Dangaard Brouer authored Aug 08, 2019

Make it clear that this XDP program depend on the network
stack to do the ARP resolution.  This is connected with the
BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup().

Another common mistake (seen via XDP-tutorial) is that users
don't realize that sysctl net.ipv{4,6}.conf.all.forwarding
setting is honored by bpf_fib_lookup.
Reported-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

abcce733

samples/bpf: make xdp_fwd more practically usable via devmap lookup · a32a32cb

Jesper Dangaard Brouer authored Aug 08, 2019

This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
that the chosen egress index should be checked for existence in the
devmap. This can now be done via taking advantage of Toke's work in
commit 0cdbb4b0 ("devmap: Allow map lookups from eBPF").

This change makes xdp_fwd more practically usable, as this allows for
a mixed environment, where IP-forwarding fallback to network stack, if
the egress device isn't configured to use XDP.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a32a32cb

samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports · 3783d437

Jesper Dangaard Brouer authored Aug 08, 2019

The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
which only have a single TX port. Change name to xdp_tx_ports
to make it more descriptive.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3783d437

xdp: xdp_umem: fix umem pages mapping for 32bits systems · d9973cec

Ivan Khoronzhuk authored Aug 08, 2019

Use kmap instead of page_address as it's not always in low memory.
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d9973cec

08 Aug, 2019 1 commit

tools/bpf: fix core_reloc.c compilation error · b7076592

Yonghong Song authored Aug 07, 2019

On my local machine, I have the following compilation errors:
=====
  In file included from prog_tests/core_reloc.c:3:0:
  ./progs/core_reloc_types.h:517:46: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘fancy_char_ptr_t’
 typedef const char * const volatile restrict fancy_char_ptr_t;
                                              ^
  ./progs/core_reloc_types.h:527:2: error: unknown type name ‘fancy_char_ptr_t’
    fancy_char_ptr_t d;
    ^
=====

I am using gcc 4.8.5. Later compilers may change their behavior not emitting the
error. Nevertheless, let us fix the issue. "restrict" can be tested
without typedef.

Fixes: 9654e2ae ("selftests/bpf: add CO-RE relocs modifiers/typedef tests")
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b7076592

07 Aug, 2019 19 commits

Merge branch 'compile-once-run-everywhere' · 726e333f

Alexei Starovoitov authored Aug 07, 2019

Andrii Nakryiko says:

====================
This patch set implements central part of CO-RE (Compile Once - Run
Everywhere, see [0] and [1] for slides and video): relocating fields offsets.
Most of the details are written down as comments to corresponding parts of the
code.

Patch #1 adds a bunch of commonly useful btf_xxx helpers to simplify working
with BTF types.
Patch #2 converts existing libbpf code to these new helpers and removes some
of pre-existing ones.
Patch #3 adds loading of .BTF.ext offset relocations section and macros to
work with its contents.
Patch #4 implements CO-RE relocations algorithm in libbpf.
Patch #5 introduced BPF_CORE_READ macro, hiding usage of Clang's
__builtin_preserve_access_index intrinsic that records offset relocation.
Patches #6-#14 adds selftests validating various parts of relocation handling,
type compatibility, etc.

For all tests to work, you'll need latest Clang/LLVM supporting
__builtin_preserve_access_index intrinsic, used for recording offset
relocations. Kernel on which selftests run should have BTF information built
in (CONFIG_DEBUG_INFO_BTF=y).

  [0] http://vger.kernel.org/bpfconf2019.html#session-2
  [1] http://vger.kernel.org/lpc-bpf2018.html#session-2

v5->v6:
- fix bad comment formatting for real (Alexei);

v4->v5:
- drop constness for btf_xxx() helpers, allowing to avoid type casts (Alexei);
- rebase on latest bpf-next, change test__printf back to printf;

v3->v4:
- added btf_xxx helpers (Alexei);
- switched libbpf code to new helpers;
- reduced amount of logging and simplified format in few places (Alexei);
- made flavor name parsing logic more strict (exactly three underscores);
- no uname() error checking (Alexei);
- updated misc tests to reflect latest Clang fixes (Yonghong);

v2->v3:
- enclose BPF_CORE_READ args in parens (Song);

v1->v2:
- add offsetofend(), fix btf_ext optional fields checks (Song);
- add bpf_core_dump_spec() for logging spec representation;
- move special first element processing out of the loop (Song);
- typo fixes (Song);
- drop BPF_ST | BPF_MEM insn relocation (Alexei);
- extracted BPF_CORE_READ into bpf_helpers (Alexei);
- added extra tests validating Clang capturing relocs correctly (Yonghong);
- switch core_relocs.c to use sub-tests;
- updated mods tests after Clang bug was fixed (Yonghong);
- fix bug enumerating candidate types;
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

726e333f

selftests/bpf: add CO-RE relocs misc tests · 29e1c668

Andrii Nakryiko authored Aug 07, 2019

Add tests validating few edge-cases of capturing offset relocations.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

29e1c668

selftests/bpf: add CO-RE relocs ints tests · c1f5e7dd

Andrii Nakryiko authored Aug 07, 2019

Add various tests validating handling compatible/incompatible integer
types.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c1f5e7dd

selftests/bpf: add CO-RE relocs ptr-as-array tests · d698f9db

Andrii Nakryiko authored Aug 07, 2019

Add test validating correct relocation handling for cases where pointer
to something is used as an array. E.g.:

  int *ptr = ...;
  int x = ptr[42];
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d698f9db

selftests/bpf: add CO-RE relocs modifiers/typedef tests · 9654e2ae

Andrii Nakryiko authored Aug 07, 2019

Add tests validating correct handling of various combinations of
typedefs and const/volatile/restrict modifiers.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9654e2ae

selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests · d9db3550

Andrii Nakryiko authored Aug 07, 2019

Test CO-RE relocation handling of ints, enums, pointers, func protos, etc.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d9db3550

selftests/bpf: add CO-RE relocs array tests · 20a9ad2e

Andrii Nakryiko authored Aug 07, 2019

Add tests for various array handling/relocation scenarios.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

20a9ad2e

selftests/bpf: add CO-RE relocs nesting tests · ec6438a9

Andrii Nakryiko authored Aug 07, 2019

Add a bunch of test validating correct handling of nested
structs/unions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ec6438a9

selftests/bpf: add CO-RE relocs struct flavors tests · 002d3afc

Andrii Nakryiko authored Aug 07, 2019

Add tests verifying that BPF program can use various struct/union
"flavors" to extract data from the same target struct/union.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

002d3afc

selftests/bpf: add CO-RE relocs testing setup · df36e621

Andrii Nakryiko authored Aug 07, 2019

Add CO-RE relocation test runner. Add one simple test validating that
libbpf's logic for searching for kernel image and loading BTF out of it
works.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

df36e621

selftests/bpf: add BPF_CORE_READ relocatable read macro · 2dc26d5a

Andrii Nakryiko authored Aug 07, 2019

Add BPF_CORE_READ macro used in tests to do bpf_core_read(), which
automatically captures offset relocation.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

2dc26d5a

libbpf: implement BPF CO-RE offset relocation algorithm · ddc7c304

Andrii Nakryiko authored Aug 07, 2019

This patch implements the core logic for BPF CO-RE offsets relocations.
Every instruction that needs to be relocated has corresponding
bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
to match recorded "local" relocation spec against potentially many
compatible "target" types, creating corresponding spec. Details of the
algorithm are noted in corresponding comments in the code.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ddc7c304

libbpf: add .BTF.ext offset relocation section loading · 4cedc0da

Andrii Nakryiko authored Aug 07, 2019

Add support for BPF CO-RE offset relocations. Add section/record
iteration macros for .BTF.ext. These macro are useful for iterating over
each .BTF.ext record, either for dumping out contents or later for BPF
CO-RE relocation handling.

To enable other parts of libbpf to work with .BTF.ext contents, moved
a bunch of type definitions into libbpf_internal.h.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4cedc0da

libbpf: convert libbpf code to use new btf helpers · b03bc685

Andrii Nakryiko authored Aug 07, 2019

Simplify code by relying on newly added BTF helper functions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b03bc685

libbpf: add helpers for working with BTF types · ef20a9b2

Andrii Nakryiko authored Aug 07, 2019

Add lots of frequently used helpers that simplify working with BTF
types.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ef20a9b2

Merge branch 'test_progs-stdio' · 682cdbdc

Alexei Starovoitov authored Aug 06, 2019

Stanislav Fomichev says:

====================
I was looking into converting test_sockops* to test_progs framework
and that requires using cgroup_helpers.c which rely on stdio/stderr.
Let's use open_memstream to override stdout into buffer during
subtests instead of custom test_{v,}printf wrappers. That lets
us continue to use stdio in the subtests and dump it on failure
if required.

That would also fix bpf_find_map which currently uses printf to
signal failure (missed during test_printf conversion).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

682cdbdc

selftests/bpf: test_progs: drop extra trailing tab · 16e910d4

Stanislav Fomichev authored Aug 06, 2019

Small (un)related cleanup.

Cc: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

16e910d4

selftests/bpf: test_progs: test__printf -> printf · 66bd2ec1

Stanislav Fomichev authored Aug 06, 2019

Now that test__printf is a simple wraper around printf, let's drop it
(and test__vprintf as well).

Cc: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

66bd2ec1

selftests/bpf: test_progs: switch to open_memstream · 946152b3

Stanislav Fomichev authored Aug 06, 2019

Use open_memstream to override stdout during test execution.
The copy of the original stdout is held in env.stdout and used
to print subtest info and dump failed log.

test_{v,}printf are now simple wrappers around stdout and will be
removed in the next patch.

v5:
* fix -v crash by always setting env.std{in,err} (Alexei Starovoitov)
* drop force_log check from stdio_hijack (Andrii Nakryiko)

v4:
* one field per line for stdout/stderr (Andrii Nakryiko)

v3:
* don't do strlen over log_buf, log_cnt has it already (Andrii Nakryiko)

v2:
* add ifdef __GLIBC__ around open_memstream (maybe pointless since
  we already depend on glibc for argp_parse)
* hijack stderr as well (Andrii Nakryiko)
* don't hijack for every test, do it once (Andrii Nakryiko)
* log_cap -> log_size (Andrii Nakryiko)
* do fseeko in a proper place (Andrii Nakryiko)
* check open_memstream returned value (Andrii Nakryiko)

Cc: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

946152b3

06 Aug, 2019 2 commits

selftests/bpf: add loop test 5 · 8c303960

Alexei Starovoitov authored Aug 02, 2019

Add a test with multiple exit conditions.
It's not an infinite loop only when the verifier can properly track
all math on variable 'i' through all possible ways of executing this loop.

barrier()s are needed to disable llvm optimization that combines multiple
branches into fewer branches.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>

8c303960

selftests/bpf: add loop test 4 · a78d0dbe

Alexei Starovoitov authored Aug 02, 2019

Add a test that returns a 'random' number between [0, 2^20)
If state pruning is not working correctly for loop body the number of
processed insns will be 2^20 * num_of_insns_in_loop_body and the program
will be rejected.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>

a78d0dbe

01 Aug, 2019 3 commits

Merge branch 'setsockopt-extra-mem' · 02bc2b64

Alexei Starovoitov authored Aug 01, 2019

Stanislav Fomichev says:

====================
Current setsockopt hook is limited to the size of the buffer that
user had supplied. Since we always allocate memory and copy the value
into kernel space, allocate just a little bit more in case BPF
program needs to override input data with a larger value.

The canonical example is TCP_CONGESTION socket option where
input buffer is a string and if user calls it with a short string,
BPF program has no way of extending it.

The tests are extended with TCP_CONGESTION use case.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

02bc2b64

selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use case · fd5ef31f

Stanislav Fomichev authored Jul 29, 2019

Ignore SOL_TCP:TCP_CONGESTION in getsockopt and always override
SOL_TCP:TCP_CONGESTION with "cubic" in setsockopt hook.

Call setsockopt(SOL_TCP, TCP_CONGESTION) with short optval ("nv")
to make sure BPF program has enough buffer space to replace it
with "cubic".
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fd5ef31f

bpf: always allocate at least 16 bytes for setsockopt hook · 9babe825

Stanislav Fomichev authored Jul 29, 2019

Since we always allocate memory, allocate just a little bit more
for the BPF program in case it need to override user input with
bigger value. The canonical example is TCP_CONGESTION where
input string might be too small to override (nv -> bbr or cubic).

16 bytes are chosen to match the size of TCP_CA_NAME_MAX and can
be extended in the future if needed.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9babe825

31 Jul, 2019 9 commits

tools: bpftool: add support for reporting the effective cgroup progs · a98bf573

Jakub Kicinski authored Jul 30, 2019

Takshak said in the original submission:

With different bpf attach_flags available to attach bpf programs specially
with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective
bpf-programs available to any sub-cgroups really needs to be available for
easy debugging.

Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached
bpf-programs to a cgroup but also the inherited ones from parent cgroup.

So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here
to list all the effective bpf-programs available for execution at a specified
cgroup.

Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf:
  # ./test_cgroup_attach

With old bpftool:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/
  ID       AttachType      AttachFlags     Name
  271      egress          multi           pkt_cntr_1
  272      egress          multi           pkt_cntr_2

Attached new program pkt_cntr_4 in cg2 gives following:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2
  ID       AttachType      AttachFlags     Name
  273      egress          override        pkt_cntr_4

And with new "effective" option it shows all effective programs for cg2:

 # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective
  ID       AttachType      AttachFlags     Name
  273      egress          override        pkt_cntr_4
  271      egress          override        pkt_cntr_1
  272      egress          override        pkt_cntr_2

Compared to original submission use a local flag instead of global
option.

We need to clear query_flags on every command, in case batch mode
wants to use varying settings.

v2: (Takshak)
 - forbid duplicated flags;
 - fix cgroup path freeing.
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a98bf573

selftests/bpf: fix clearing buffered output between tests/subtests · bf8ff0f8

Andrii Nakryiko authored Jul 30, 2019

Clear buffered output once test or subtests finishes even if test was
successful. Not doing this leads to accumulation of output from previous
tests and on first failed tests lots of irrelevant output will be
dumped, greatly confusing things.

v1->v2: fix Fixes tag, add more context to patch

Fixes: 3a516a0a ("selftests/bpf: add sub-tests support for test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bf8ff0f8

Merge branch 'gen-syn-cookie' · 116e7dbe

Alexei Starovoitov authored Jul 30, 2019

Petar Penkov says:

====================
This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF. Currently, this helper is enabled at both the TC hook and the
XDP hook.

The first two patches in the series add/modify several TCP helper functions to
allow for SKB-less operation, as is the case at the XDP hook.

The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. The return value of
this function contains both the MSS value, encoded in the cookie, and the
cookie itself.

The last three patches sync tools/ and add a test.

Performance evaluation:
I sent 10Mpps to a fixed port on a host with 2 10G bonded Mellanox 4 NICs from
random IPv6 source addresses. Without XDP I observed 7.2Mpps (syn-acks) being
sent out if the IPv6 packets carry 20 bytes of TCP options or 7.6Mpps if they
carry no options. If I attached a simple program that checks if a packet is
IPv6/TCP/SYN, looks up the socket, issues a cookie, and sends it back out after
swapping src/dest, recomputing the checksum, and setting the ACK flag, I
observed 10Mpps being sent back out.

Changes since v1:
1/ Added performance numbers to the cover letter
2/ Patch 2: Refactored a bit to fix compilation issues
3/ Patch 3: Changed ENOTSUPP to EOPNOTSUPP at Toke's suggestion

Changes since RFC:
1/ Cookie is returned in host order at Alexei's suggestion
2/ If cookies are not enabled via a sysctl, the helper function returns
   -ENOENT instead of -EINVAL at Lorenz's suggestion
3/ Fixed documentation to properly reflect that MSS is 16 bits at
   Lorenz's suggestion
4/ BPF helper requires TCP length to match ->doff field, rather than to simply
   be no more than 20 bytes at Eric and Alexei's suggestion
5/ Packet type is looked up from the packet version field, rather than from the
   socket. v4 packets are rejected on v6-only sockets but should work with
   dual stack listeners at Eric's suggestion
6/ Removed unnecessary `net` argument from helper function in patch 2 at
   Lorenz's suggestion
7/ Changed test to only pass MSS option so we can convince the verifier that the
   memory access is not out of bounds

Note that 7/ below illustrates the verifier might need to be extended to allow
passing a variable tcph->doff to the helper function like below:

__u32 thlen = tcph->doff * 4;
if (thlen < sizeof(*tcph))
	return;
__s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen);
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

116e7dbe

selftests/bpf: add test for bpf_tcp_gen_syncookie · 91bc3578

Petar Penkov authored Jul 29, 2019

Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.

Additionally, verify that the MSS for that SYN cookie is within
expected range.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

91bc3578

selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers · 637f71c0

Petar Penkov authored Jul 29, 2019

Expose bpf_tcp_gen_syncookie to selftests.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

637f71c0

bpf: sync bpf.h to tools/ · 3745ee18

Petar Penkov authored Jul 29, 2019

Sync updated documentation for bpf_redirect_map.

Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3745ee18

bpf: add bpf_tcp_gen_syncookie helper · 70d66244

Petar Penkov authored Jul 29, 2019

This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

70d66244

tcp: add skb-less helpers to retrieve SYN cookie · 9349d600

Petar Penkov authored Jul 29, 2019

This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9349d600

tcp: tcp_syn_flood_action read port from socket · 96511278

Petar Penkov authored Jul 29, 2019

This allows us to call this function before an SKB has been
allocated.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

96511278

29 Jul, 2019 1 commit

Merge branch 'devmap_hash' · d3406913

Alexei Starovoitov authored Jul 29, 2019

Toke Høiland-Jørgensen says:

====================
This series adds a new map type, devmap_hash, that works like the existing
devmap type, but using a hash-based indexing scheme. This is useful for the use
case where a devmap is indexed by ifindex (for instance for use with the routing
table lookup helper). For this use case, the regular devmap needs to be sized
after the maximum ifindex number, not the number of devices in it. A hash-based
indexing scheme makes it possible to size the map after the number of devices it
should contain instead.

This was previously part of my patch series that also turned the regular
bpf_redirect() helper into a map-based one; for this series I just pulled out
the patches that introduced the new map type.

Changelog:

v5:

- Dynamically set the number of hash buckets by rounding up max_entries to the
  nearest power of two (mirroring the regular hashmap), as suggested by Jesper.

v4:

- Remove check_memlock parameter that was left over from an earlier patch
  series.
- Reorder struct members to avoid holes.

v3:

- Rework the split into different patches
- Use spin_lock_irqsave()
- Also add documentation and bash completion definitions for bpftool

v2:

- Split commit adding the new map type so uapi and tools changes are separate.

Changes to these patches since the previous series:

- Rebase on top of the other devmap changes (makes this one simpler!)
- Don't enforce key==val, but allow arbitrary indexes.
- Rename the type to devmap_hash to reflect the fact that it's just a hashmap now.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d3406913