Commits · 1110865273c1fe76853c5900b0a28214afc50b4c · Kirill Smelkov / linux

01 Aug, 2023 5 commits

Merge branch 'Remove unused fields in cpumap & devmap' · 11108652

Martin KaFai Lau authored Jul 31, 2023

Hou Tao says:

====================
Patchset "Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP
maps" [0] changed per-map flush list to global per-cpu flush list
for cpumap, devmap and xskmap, but it forgot to remove these unused
fields from cpumap and devmap. So just remove these unused fields.

Comments and suggestions are always welcome.

[0]: https://lore.kernel.org/bpf/20191219061006.21980-1-bjorn.topel@gmail.com
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

11108652

bpf, devmap: Remove unused dtab field from bpf_dtab_netdev · 1ea66e89

Hou Tao authored Jul 28, 2023

Commit 96360004 ("xdp: Make devmap flush_list common for all map
instances") removes the use of bpf_dtab_netdev::dtab in bq_enqueue(),
so just remove dtab from bpf_dtab_netdev.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-3-houtao@huaweicloud.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

1ea66e89

bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry · 2d20bfc3

Hou Tao authored Jul 28, 2023

Since commit cdfafe98 ("xdp: Make cpumap flush_list common for all
map instances"), cmap is no longer used, so just remove it.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-2-houtao@huaweicloud.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

2d20bfc3

netfilter: bpf: Only define get_proto_defrag_hook() if necessary · 81584c23

Daniel Xu authored Jul 31, 2023

Before, we were getting this warning:

net/netfilter/nf_bpf_link.c:32:1: warning: 'get_proto_defrag_hook' defined but not used [-Wunused-function]

Guard the definition with CONFIG_NF_DEFRAG_IPV[4|6].

Fixes: 91721c2d ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307291213.fZ0zDmoG-lkp@intel.com/Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/b128b6489f0066db32c4772ae4aaee1480495929.1690840454.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

81584c23

bpf: Fix an array-index-out-of-bounds issue in disasm.c · e99688eb

Yonghong Song authored Jul 31, 2023

syzbot reported an array-index-out-of-bounds when printing out bpf
insns. Further investigation shows the insn is illegal but
is printed out due to log level 1 or 2 before actual insn verification
in do_check().

This particular illegal insn is a MOVSX insn with offset value 2.
The legal offset value for MOVSX should be 8, 16 and 32.
The disasm sign-extension-size array index is calculated as
 (insn->off / 8) - 1
and offset value 2 gives an out-of-bound index -1.

Tighten the checking for MOVSX insn in disasm.c to avoid
array-index-out-of-bounds issue.

Reported-by: syzbot+3758842a6c01012aa73b@syzkaller.appspotmail.com
Fixes: f835bb62 ("bpf: Add kernel/bpftool asm support for new instructions")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230731204534.1975311-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e99688eb

31 Jul, 2023 1 commit

net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn · 74bdfab4

Lorenz Bauer authored Jul 31, 2023

There are already INDIRECT_CALLABLE_DECLARE in the hashtable
headers, no need to declare them again.

Fixes: 0f495f76 ("net: remove duplicate reuseport_lookup functions")
Suggested-by: Martin Lau <martin.lau@linux.dev>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230731-indir-call-v1-1-4cd0aeaee64f@isovalent.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

74bdfab4

30 Jul, 2023 1 commit

docs/bpf: Fix malformed documentation · fb213ecb

Yonghong Song authored Jul 29, 2023

Two issues are fixed:
1. Malformed table due to newly-introduced BPF_MOVSX
2. Missing reference link for ``Sign-extension load operations``

Fixes: 245d4c40 ("docs/bpf: Add documentation for new instructions")
Cc: bpf@ietf.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307291840.Cqhj7uox-lkp@intel.com/Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230730004251.381307-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fb213ecb

28 Jul, 2023 29 commits

Merge branch 'support-defragmenting-ipv-4-6-packets-in-bpf' · eb03993a

Alexei Starovoitov authored Jul 28, 2023

Daniel Xu says:

====================
Support defragmenting IPv(4|6) packets in BPF

=== Context ===

In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:

1. Enforce policy on first fragment and accept all subsequent fragments.
   This works but may let in certain attacks or allow data exfiltration.

2. Enforce policy on first fragment and drop all subsequent fragments.
   This does not really work b/c some protocols may rely on
   fragmentation. For example, DNS may rely on oversized UDP packets for
   large responses.

So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:

    Middleboxes [...] should process IP fragments in a manner that is
    consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
    must maintain state in order to achieve this goal.

=== BPF related bits ===

Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.

The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.

=== Changelog ===

Changes from v5:

* Fix defrag disable codepaths

Changes from v4:

* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings

Changes from v3:

* Correctly initialize `addrlen` stack var for recvmsg()

Changes from v2:

* module_put() if ->enable() fails
* Fix CI build errors

Changes from v1:

* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active

[0]: https://datatracker.ietf.org/doc/html/rfc8900
====================

Link: https://lore.kernel.org/r/cover.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eb03993a

bpf: selftests: Add defrag selftests · c313eae7

Daniel Xu authored Jul 21, 2023

These selftests tests 2 major scenarios: the BPF based defragmentation
can successfully be done and that packet pointers are invalidated after
calls to the kfunc. The logic is similar for both ipv4 and ipv6.

In the first scenario, we create a UDP client and UDP echo server. The
the server side is fairly straightforward: we attach the prog and simply
echo back the message.

The on the client side, we send fragmented packets to and expect the
reassembled message back from the server.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/33e40fdfddf43be93f2cb259303f132f46750953.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c313eae7

bpf: selftests: Support custom type and proto for client sockets · e15a2209

Daniel Xu authored Jul 21, 2023

Extend connect_to_fd_opts() to take optional type and protocol
parameters for the client socket. These parameters are useful when
opening a raw socket to send IP fragments.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/9067db539efdfd608aa86a2b143c521337c111fc.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e15a2209

bpf: selftests: Support not connecting client socket · 3495e89c

Daniel Xu authored Jul 21, 2023

For connectionless protocols or raw sockets we do not want to actually
connect() to the server.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/525c13d66dac2d640a1db922546842c051c6f2e6.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

3495e89c

netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link · 91721c2d

Daniel Xu authored Jul 21, 2023

This commit adds support for enabling IP defrag using pre-existing
netfilter defrag support. Basically all the flag does is bump a refcnt
while the link the active. Checks are also added to ensure the prog
requesting defrag support is run _after_ netfilter defrag hooks.

We also take care to avoid any issues w.r.t. module unloading -- while
defrag is active on a link, the module is prevented from unloading.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

91721c2d

netfilter: defrag: Add glue hooks for enabling/disabling defrag · 9abddac5

Daniel Xu authored Jul 21, 2023

We want to be able to enable/disable IP packet defrag from core
bpf/netfilter code. In other words, execute code from core that could
possibly be built as a module.

To help avoid symbol resolution errors, use glue hooks that the modules
will register callbacks with during module init.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

9abddac5

docs/bpf: Improve documentation for cpu=v4 instructions · ee932bf9

Yonghong Song authored Jul 28, 2023

Improve documentation for cpu=v4 instructions based on
David's suggestions.

Cc: bpf@ietf.org
Suggested-by: David Vernet <void@manifault.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728225105.919595-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ee932bf9

bpf: Non-atomically allocate freelist during prefill · d1a02358

YiFei Zhu authored Jul 28, 2023

In internal testing of test_maps, we sometimes observed failures like:
  test_maps: test_maps.c:173: void test_hashmap_percpu(unsigned int, void *):
    Assertion `bpf_map_update_elem(fd, &key, value, BPF_ANY) == 0' failed.
where the errno is ENOMEM. After some troubleshooting and enabling
the warnings, we saw:
  [   91.304708] percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left
  [   91.304716] CPU: 51 PID: 24145 Comm: test_maps Kdump: loaded Tainted: G                 N 6.1.38-smp-DEV #7
  [   91.304719] Hardware name: Google Astoria/astoria, BIOS 0.20230627.0-0 06/27/2023
  [   91.304721] Call Trace:
  [   91.304724]  <TASK>
  [   91.304730]  [<ffffffffa7ef83b9>] dump_stack_lvl+0x59/0x88
  [   91.304737]  [<ffffffffa7ef83f8>] dump_stack+0x10/0x18
  [   91.304738]  [<ffffffffa75caa0c>] pcpu_alloc+0x6fc/0x870
  [   91.304741]  [<ffffffffa75ca302>] __alloc_percpu_gfp+0x12/0x20
  [   91.304743]  [<ffffffffa756785e>] alloc_bulk+0xde/0x1e0
  [   91.304746]  [<ffffffffa7566c02>] bpf_mem_alloc_init+0xd2/0x2f0
  [   91.304747]  [<ffffffffa7547c69>] htab_map_alloc+0x479/0x650
  [   91.304750]  [<ffffffffa751d6e0>] map_create+0x140/0x2e0
  [   91.304752]  [<ffffffffa751d413>] __sys_bpf+0x5a3/0x6c0
  [   91.304753]  [<ffffffffa751c3ec>] __x64_sys_bpf+0x1c/0x30
  [   91.304754]  [<ffffffffa7ef847a>] do_syscall_64+0x5a/0x80
  [   91.304756]  [<ffffffffa800009b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

This makes sense, because in atomic context, percpu allocation would
not create new chunks; it would only create in non-atomic contexts.
And if during prefill all precpu chunks are full, -ENOMEM would
happen immediately upon next unit_alloc.

Prefill phase does not actually run in atomic context, so we can
use this fact to allocate non-atomically with GFP_KERNEL instead
of GFP_NOWAIT. This avoids the immediate -ENOMEM.

GFP_NOWAIT has to be used in unit_alloc when bpf program runs
in atomic context. Even if bpf program runs in non-atomic context,
in most cases, rcu read lock is enabled for the program so
GFP_NOWAIT is still needed. This is often also the case for
BPF_MAP_UPDATE_ELEM syscalls.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230728043359.3324347-1-zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d1a02358

selftests/bpf: Enable test test_progs-cpuv4 for gcc build kernel · a76584fc

Yonghong Song authored Jul 27, 2023

Currently, test_progs-cpuv4 is generated with clang build kernel
when bpf cpu=v4 is supported by the clang compiler.
Let us enable test_progs-cpuv4 for gcc build kernel as well.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728055745.2285202-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a76584fc

bpf: Fix compilation warning with -Wparentheses · 09fedc73

Yonghong Song authored Jul 27, 2023

The kernel test robot reported compilation warnings when -Wparentheses is
added to KBUILD_CFLAGS with gcc compiler. The following is the error message:

  .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_reg_to_size_sx’:
  .../bpf-next/kernel/bpf/verifier.c:5901:14:
    error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
    if (s64_max >= 0 == s64_min >= 0) {
        ~~~~~~~~^~~~
  .../bpf-next/kernel/bpf/verifier.c: In function ‘coerce_subreg_to_size_sx’:
  .../bpf-next/kernel/bpf/verifier.c:5965:14:
    error: suggest parentheses around comparison in operand of ‘==’ [-Werror=parentheses]
    if (s32_min >= 0 == s32_max >= 0) {
        ~~~~~~~~^~~~

To fix the issue, add proper parentheses for the above '>=' condition
to silence the warning/error.

I tried a few clang compilers like clang16 and clang18 and they do not emit
such warnings with -Wparentheses.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307281133.wi0c4SqG-lkp@intel.com/Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230728055740.2284534-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

09fedc73

Merge branch 'bpf-support-new-insns-from-cpu-v4' · f7e6bd33

Alexei Starovoitov authored Jul 27, 2023

Yonghong Song says:

====================
bpf: Support new insns from cpu v4

In previous discussion ([1]), it is agreed that we should introduce
cpu version 4 (llvm flag -mcpu=v4) which contains some instructions
which can simplify code, make code easier to understand, fix the
existing problem, or simply for feature completeness. More specifically,
the following new insns are proposed:
  . sign extended load
  . sign extended mov
  . bswap
  . signed div/mod
  . ja with 32-bit offset

This patch set added kernel support for insns proposed in [1] except
BPF_ST which already has full kernel support. Beside the above proposed
insns, LLVM will generate BPF_ST insn as well under -mcpu=v4.
The llvm patch ([2]) has been merged into llvm-project 'main' branch.

The patchset implements interpreter, jit and verifier support for these new
insns.

For this patch set, I tested cpu v2/v3/v4 and the selftests are all passed.
I also tested selftests introduced in this patch set with additional changes
beside normal jit testing (bpf_jit_enable = 1 and bpf_jit_harden = 0)
  - bpf_jit_enable = 0
  - bpf_jit_enable = 1 and bpf_jit_harden = 1
and both testing passed.

  [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
  [2] https://reviews.llvm.org/D144829

Changelogs:
  v4 -> v5:
   . for v4, patch 8/17 missed in mailing list and patchwork, so resend.
   . rebase on top of master
  v3 -> v4:
   . some minor asm syntax adjustment based on llvm change.
   . add clang version and target arch guard for new tests
     so they can still compile with old llvm compilers.
   . some changes to the bpf doc.
  v2 -> v3:
   . add missed disasm change from v2.
   . handle signed load of ctx fields properly.
   . fix some interpreter sdiv/smod error when bpf_jit_enable = 0.
   . fix some verifier range bounding errors.
   . add more C tests.
  RFCv1 -> v2:
   . add more verifier supports for signed extend load and mov insns.
   . rename some insn names to be more consistent with intel practice.
   . add cpuv4 test runner for test progs.
   . add more unit and C tests.
   . add documentation.
====================

Link: https://lore.kernel.org/r/20230728011143.3710005-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f7e6bd33

docs/bpf: Add documentation for new instructions · 245d4c40

Yonghong Song authored Jul 27, 2023

Add documentation in instruction-set.rst for new instruction encoding
and their corresponding operations. Also removed the question
related to 'no BPF_SDIV' in bpf_design_QA.rst since we have
BPF_SDIV insn now.

Cc: bpf@ietf.org
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011342.3724411-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

245d4c40

selftests/bpf: Test ldsx with more complex cases · 0c606571

Yonghong Song authored Jul 27, 2023

The following ldsx cases are tested:
  - signed readonly map value
  - read/write map value
  - probed memory
  - not-narrowed ctx field access
  - narrowed ctx field access.

Without previous proper verifier/git handling, the test will fail.

If cpuv4 is not supported either by compiler or by jit,
the test will be skipped.

  # ./test_progs -t ldsx_insn
  #113/1   ldsx_insn/map_val and probed_memory:SKIP
  #113/2   ldsx_insn/ctx_member_sign_ext:SKIP
  #113/3   ldsx_insn/ctx_member_narrow_sign_ext:SKIP
  #113     ldsx_insn:SKIP
  Summary: 1/0 PASSED, 3 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011336.3723434-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0c606571

selftests/bpf: Add unit tests for new gotol insn · 613dad49

Yonghong Song authored Jul 27, 2023

Add unit tests for gotol insn.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011329.3721881-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

613dad49

selftests/bpf: Add unit tests for new sdiv/smod insns · de1c2680

Yonghong Song authored Jul 27, 2023

Add unit tests for sdiv/smod insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011321.3720500-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

de1c2680

selftests/bpf: Add unit tests for new bswap insns · 79dbabc1

Yonghong Song authored Jul 27, 2023

Add unit tests for bswap insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011314.3720109-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

79dbabc1

selftests/bpf: Add unit tests for new sign-extension mov insns · f02ec3ff

Yonghong Song authored Jul 27, 2023

Add unit tests for movsx insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011309.3719295-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f02ec3ff

selftests/bpf: Add unit tests for new sign-extension load insns · 147c8f44

Yonghong Song authored Jul 27, 2023

Add unit tests for new ldsx insns. The test includes sign-extension
with a single value or with a value range.

If cpuv4 is not supported due to
  (1) older compiler, e.g., less than clang version 18, or
  (2) test runner test_progs and test_progs-no_alu32 which tests
      cpu v2 and v3, or
  (3) non-x86_64 arch not supporting new insns in jit yet,
a dummy program is added with below output:
  #318/1   verifier_ldsx/cpuv4 is not supported by compiler or jit, use a dummy test:OK
  #318     verifier_ldsx:OK
to indicate the test passed with a dummy test instead of actually
testing cpuv4. I am using a dummy prog to avoid changing the
verifier testing infrastructure. Once clang 18 is widely available
and other architectures support cpuv4, at least for CI run,
the dummy program can be removed.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011304.3719139-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

147c8f44

selftests/bpf: Add a cpuv4 test runner for cpu=v4 testing · a5d0c26a

Yonghong Song authored Jul 27, 2023

Similar to no-alu32 runner, if clang compiler supports -mcpu=v4,
a cpuv4 runner is created to test bpf programs compiled with
-mcpu=v4.

The following are some num-of-insn statistics for each newer
instructions based on existing selftests, excluding subsequent
cpuv4 insn specific tests.

   insn pattern                # of instructions
   reg = (s8)reg               4
   reg = (s16)reg              4
   reg = (s32)reg              144
   reg = *(s8 *)(reg + off)    13
   reg = *(s16 *)(reg + off)   14
   reg = *(s32 *)(reg + off)   15215
   reg = bswap16 reg           142
   reg = bswap32 reg           38
   reg = bswap64 reg           14
   reg s/= reg                 0
   reg s%= reg                 0
   gotol <offset>              58

Note that in llvm -mcpu=v4 implementation, the compiler is a little
bit conservative about generating 'gotol' insn (32-bit branch offset)
as it didn't precise count the number of insns (e.g., some insns are
debug insns, etc.). Compared to old 'goto' insn, newer 'gotol' insn
should have comparable verification states to 'goto' insn.

With current patch set, all selftests passed with -mcpu=v4
when running test_progs-cpuv4 binary. The -mcpu=v3 and -mcpu=v2 run
are also successful.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011250.3718252-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a5d0c26a

selftests/bpf: Fix a test_verifier failure · 86180493

Yonghong Song authored Jul 27, 2023

The following test_verifier subtest failed due to
new encoding for BSWAP.

  $ ./test_verifier
  ...
  #99/u invalid 64-bit BPF_END FAIL
  Unexpected success to load!
  verification time 215 usec
  stack depth 0
  processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  #99/p invalid 64-bit BPF_END FAIL
  Unexpected success to load!
  verification time 198 usec
  stack depth 0
  processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Tighten the test so it still reports a failure.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011244.3717464-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

86180493

bpf: Add kernel/bpftool asm support for new instructions · f835bb62

Yonghong Song authored Jun 28, 2023

Add asm support for new instructions so kernel verifier and bpftool
xlated insn dumps can have proper asm syntax for new instructions.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

f835bb62

bpf: Support new 32bit offset jmp instruction · 4cd58e9a

Yonghong Song authored Jul 27, 2023

Add interpreter/jit/verifier support for 32bit offset jmp instruction.
If a conditional jmp instruction needs more than 16bit offset,
it can be simulated with a conditional jmp + a 32bit jmp insn.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4cd58e9a

bpf: Fix jit blinding with new sdiv/smov insns · 7058e3a3

Yonghong Song authored Jul 27, 2023

Handle new insns properly in bpf_jit_blind_insn() function.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

7058e3a3

bpf: Support new signed div/mod instructions. · ec0e2da9

Yonghong Song authored Jul 27, 2023

Add interpreter/jit support for new signed div/mod insns.
The new signed div/mod instructions are encoded with
unsigned div/mod instructions plus insn->off == 1.
Also add basic verifier support to ensure new insns get
accepted.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ec0e2da9

bpf: Support new unconditional bswap instruction · 0845c3db

Yonghong Song authored Jul 27, 2023

The existing 'be' and 'le' insns will do conditional bswap
depends on host endianness. This patch implements
unconditional bswap insns.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0845c3db

bpf: Handle sign-extenstin ctx member accesses · 1f1e864b

Yonghong Song authored Jul 27, 2023

Currently, if user accesses a ctx member with signed types,
the compiler will generate an unsigned load followed by
necessary left and right shifts.

With the introduction of sign-extension load, compiler may
just emit a ldsx insn instead. Let us do a final movsx sign
extension to the final unsigned ctx load result to
satisfy original sign extension requirement.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1f1e864b

bpf: Support new sign-extension mov insns · 8100928c

Yonghong Song authored Jul 27, 2023

Add interpreter/jit support for new sign-extension mov insns.
The original 'MOV' insn is extended to support reg-to-reg
signed version for both ALU and ALU64 operations. For ALU mode,
the insn->off value of 8 or 16 indicates sign-extension
from 8- or 16-bit value to 32-bit value. For ALU64 mode,
the insn->off value of 8/16/32 indicates sign-extension
from 8-, 16- or 32-bit value to 64-bit value.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8100928c

bpf: Support new sign-extension load insns · 1f9a1ea8

Yonghong Song authored Jul 27, 2023

Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1f9a1ea8

bpf, docs: fix BPF_NEG entry in instruction-set.rst · 10d78a66

Jose E. Marchesi authored Jul 26, 2023

This patch fixes the documentation of the BPF_NEG instruction to
denote that it does not use the source register operand.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Acked-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230726092543.6362-1-jose.marchesi@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

10d78a66

26 Jul, 2023 1 commit

bpf: work around -Wuninitialized warning · 63e2da3b

Arnd Bergmann authored Jul 25, 2023

Splitting these out into separate helper functions means that we
actually pass an uninitialized variable into another function call
if dec_active() happens to not be inlined, and CONFIG_PREEMPT_RT
is disabled:

kernel/bpf/memalloc.c: In function 'add_obj_to_free_list':
kernel/bpf/memalloc.c:200:9: error: 'flags' is used uninitialized [-Werror=uninitialized]
  200 |         dec_active(c, flags);

Avoid this by passing the flags by reference, so they either get
initialized and dereferenced through a pointer, or the pointer never
gets accessed at all.

Fixes: 18e027b1 ("bpf: Factor out inc/dec of active flag into helpers.")
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230725202653.2905259-1-arnd@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63e2da3b

25 Jul, 2023 3 commits

selftests/xsk: Fix spelling mistake "querrying" -> "querying" · 13fd5e14

Colin Ian King authored Jul 20, 2023

There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230720104815.123146-1-colin.i.king@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

13fd5e14

Merge branch 'Add SO_REUSEPORT support for TC bpf_sk_assign' · 36397a18

Martin KaFai Lau authored Jul 24, 2023

Lorenz Bauer says:

====================
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.

I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.

Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
---
Changes in v6:
- Reject unhashed UDP sockets in bpf_sk_assign to avoid ref leak
- Link to v5: https://lore.kernel.org/r/20230613-so-reuseport-v5-0-f6686a0dbce0@isovalent.com

Changes in v5:
- Drop reuse_sk == sk check in inet[6]_steal_stock (Kuniyuki)
- Link to v4: https://lore.kernel.org/r/20230613-so-reuseport-v4-0-4ece76708bba@isovalent.com

Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com

Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent.com

Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)

---
Daniel Borkmann (1):
      selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

36397a18

selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper · 22408d58

Daniel Borkmann authored Jul 20, 2023

We use two programs to check that the new reuseport logic is executed
appropriately.

The first is a TC clsact program which bpf_sk_assigns
the skb to a UDP or TCP socket created by user space. Since the test
communicates via lo we see both directions of packets in the eBPF.
Traffic ingressing to the reuseport socket is identified by looking
at the destination port. For TCP, we additionally need to make sure
that we only assign the initial SYN packets towards our listening
socket. The network stack then creates a request socket which
transitions to ESTABLISHED after the 3WHS.

The second is a reuseport program which shares the fact that
it has been executed with user space. This tells us that the delayed
lookup mechanism is working.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-8-7021b683cdae@isovalent.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

22408d58