Commits · 5ed88f81511ce695692f0510ab3ca17eee68eff6 · Kirill Smelkov / linux

26 Oct, 2022 21 commits

selftests/bpf: Panic on hard/soft lockup · 5ed88f81

Daniel Müller authored Oct 25, 2022

When running tests, we should probably accept any help we can get when
it comes to detecting issues early or making them more debuggable. We
have seen a few cases where a test_progs_noalu32 run, for example,
encountered a soft lockup and stopped making progress. It was only
interrupted once we hit the overall test timeout [0]. We can not and do
not want to necessarily rely on test timeouts, because those rely on
infrastructure provided by the environment we run in (and which is not
present in tools/testing/selftests/bpf/vmtest.sh, for example).
To that end, let's enable panics on soft as well as hard lockups to fail
fast should we encounter one. That's happening in the configuration
indented to be used for selftests (including when using vmtest.sh or
when running in BPF CI).

[0] https://github.com/kernel-patches/bpf/runs/7844499997Signed-off-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20221025231546.811766-1-deso@posteo.netSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5ed88f81

Merge branch 'bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs' · a48b4bf9

Alexei Starovoitov authored Oct 25, 2022

Yonghong Song says:

====================

There already exists a local storage implementation for cgroup-attached
bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

This patch implemented new cgroup local storage available to
non-cgroup-attached bpf programs. In the patch series, Patches 1 and 2
are preparation patches. Patch 3 implemented new cgroup local storage
kernel support. Patches 4 and 5 implemented libbpf and bpftool support.
Patches 6-8 fixed one existing test and added four new tests to validate
kernel/libbpf implementations. Patch 9 added documentation for new
BPF_MAP_TYPE_CGRP_STORAGE map type and comparison of the old and new
cgroup local storage maps.

Changelogs:
  v5 -> v6:
    . fix selftest test_libbpf_str/bpf_map_type_str due to marking
      BPF_MAP_TYPE_CGROUP_STORAGE as deprecated.
    . add cgrp_local_storage test in s390x denylist since the test
      has some fentry/fexit programs.
  v4 -> v5:
    . additional refactoring in patch 2
    . fix the call site for bpf_cgrp_storage_free() in kernel/cgroup/cgroup.c.
    . add a test for progs attaching to cgroups
    . add a negative test (the helper key is a task instead of expected cgroup)
    . some spelling fixes
  v3 -> v4:
    . fix a config guarding problem in kernel/cgroup/cgroup.c when
      cgrp_storage is deleted (CONFIG_CGROUP_BPF => CONFIG_BPF_SYSCALL).
    . rename selftest from cgroup_local_storage.c to cgrp_local_storage.c
      so the name can better align with map name.
    . fix a few misspellings.
  v2 -> v3:
    . fix a config caused kernel test complaint.
    . better description/comments in uapi bpf.h and bpf_cgrp_storage.c.
    . factor code for better resue for map_alloc/map_free.
    . improved explanation in map documentation.
  v1 -> v2:
    . change map name from BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE to
      BPF_MAP_TYPE_CGRP_STORAGE.
    . removed support of sleepable programs.
    . changed the place of freeing cgrp local storage from put_css_set_locked()
      to css_free_rwork_fn().
    . added map documentation.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a48b4bf9

docs/bpf: Add documentation for new cgroup local storage · d4319801

Yonghong Song authored Oct 25, 2022

Add some descriptions and examples for BPF_MAP_TYPE_CGRP_STORAGE.
Also illustate the major difference between BPF_MAP_TYPE_CGRP_STORAGE
and BPF_MAP_TYPE_CGROUP_STORAGE and recommend to use
BPF_MAP_TYPE_CGRP_STORAGE instead of BPF_MAP_TYPE_CGROUP_STORAGE
in the end.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042922.676383-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d4319801

selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x · 0a1b69d1

Yonghong Song authored Oct 25, 2022

Test cgrp_local_storage have some programs utilizing trampoline.
Arch s390x does not support trampoline so add the test to
the corresponding DENYLIST file.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042917.675685-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0a1b69d1

selftests/bpf: Add selftests for new cgroup local storage · 12bb6ca4

Yonghong Song authored Oct 25, 2022

Add four tests for new cgroup local storage, (1) testing bpf program helpers
and user space map APIs, (2) testing recursive fentry triggering won't deadlock,
(3) testing progs attached to cgroups, and (4) a negative test if the
bpf_cgrp_storage_get() helper key is not a cgroup btf id.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042911.675546-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

12bb6ca4

selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str · fd4ca6c1

Yonghong Song authored Oct 25, 2022

Previous bpf patch made a change to uapi bpf.h like
  @@ -922,7 +922,14 @@ enum bpf_map_type {
        BPF_MAP_TYPE_SOCKHASH,
  -     BPF_MAP_TYPE_CGROUP_STORAGE,
  +     BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
  +     BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
        BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
where BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED and BPF_MAP_TYPE_CGROUP_STORAGE
have the same enum value. This will cause selftest test_libbpf_str/bpf_map_type_str
failing. This patch fixed the issue by avoid the check for
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED in the test.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042906.674830-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fd4ca6c1

bpftool: Support new cgroup local storage · f7f0f165

Yonghong Song authored Oct 25, 2022

Add support for new cgroup local storage
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042901.674177-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f7f0f165

libbpf: Support new cgroup local storage · 4fe64af2

Yonghong Song authored Oct 25, 2022

Add support for new cgroup local storage.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4fe64af2

bpf: Implement cgroup storage available to non-cgroup-attached bpf progs · c4bcfb38

Yonghong Song authored Oct 25, 2022

Similar to sk/inode/task storage, implement similar cgroup local storage.

There already exists a local storage implementation for cgroup-attached
bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

The life-cycle of storage is managed with the life-cycle of the
cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.

The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.

Typically, the following code is used to get the current cgroup:
    struct task_struct *task = bpf_get_current_task_btf();
    ... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
    struct task_struct {
        ....
        struct css_set __rcu            *cgroups;
        ....
    }
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.

Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c4bcfb38

bpf: Refactor some inode/task/sk storage functions for reuse · c83597fa

Yonghong Song authored Oct 25, 2022

Refactor codes so that inode/task/sk storage implementation
can maximally share the same code. I also added some comments
in new function bpf_local_storage_unlink_nolock() to make
codes easy to understand. There is no functionality change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c83597fa

bpf: Make struct cgroup btf id global · 5e67b8ef

Yonghong Song authored Oct 25, 2022

Make struct cgroup btf id global so later patch can reuse
the same btf id.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042840.672602-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5e67b8ef

Merge branch 'bpf: Avoid unnecessary deadlock detection and failure in task storage' · fdf45787

Alexei Starovoitov authored Oct 25, 2022

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

The commit bc235cdb ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
added deadlock detection to avoid a tracing program from recurring
on the bpf_task_storage_{get,delete}() helpers.  These helpers acquire
a spin lock and it will lead to deadlock.

It is unnecessary for the bpf_lsm and bpf_iter programs which do
not recur.  The situation is the same as the existing
bpf_pid_task_storage_{lookup,delete}_elem() which are
used in the syscall and they also do not have deadlock detection.

This set is to add new bpf_task_storage_{get,delete}() helper proto
without the deadlock detection.  The set also removes the prog->active
check from the bpf_lsm and bpf_iter program.  Please see the individual
patch for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fdf45787

selftests/bpf: Tracing prog can still do lookup under busy lock · 387b5321

Martin KaFai Lau authored Oct 25, 2022

This patch modifies the task_ls_recursion test to check that
the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update)
can still do the lockless lookup even it cannot acquire the percpu
busy lock. If the lookup succeeds, it will increment the value
by 1 and the value in the task storage map_a will become 200+1=201.
After that, BPF_PROG(on_update) tries to delete from map_a and
should get -EBUSY because it cannot acquire the percpu busy lock
after finding the data.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-10-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

387b5321

selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection · 0334b4d8

Martin KaFai Lau authored Oct 25, 2022

This patch adds a test to check for deadlock failure
in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog.
It also checks if the prog_info.recursion_misses is non zero.

The test starts with 32 threads and they are affinitized to one cpu.
In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within
one second if it is run without the previous patches of this set.

Here is the test error message before adding the no deadlock detection
version of the bpf_task_storage_{get,delete}:

test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0
test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-9-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0334b4d8

bpf: Add new bpf_task_storage_delete proto with no deadlock detection · 8a7dac37

Martin KaFai Lau authored Oct 25, 2022

The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_delete_elem()
which is called from the syscall map_delete_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_delete() helper.

This patch adds bpf_task_storage_delete proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter program.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-8-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8a7dac37

bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check · fda64ae0

Martin KaFai Lau authored Oct 25, 2022

Similar to the earlier change in bpf_task_storage_get_recur.
This patch changes bpf_task_storage_delete_recur such that it
does the lookup first. It only returns -EBUSY if it needs to
take the spinlock to do the deletion when potential deadlock
is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-7-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fda64ae0

bpf: Add new bpf_task_storage_get proto with no deadlock detection · 4279adb0

Martin KaFai Lau authored Oct 25, 2022

The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_lookup_elem()
which is called from the syscall map_lookup_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_get() helper.

This patch adds bpf_task_storage_get proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter programs.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-6-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4279adb0

bpf: Avoid taking spinlock in bpf_task_storage_get if potential deadlock is detected · e8b02296

Martin KaFai Lau authored Oct 25, 2022

bpf_task_storage_get() does a lookup and optionally inserts
new data if BPF_LOCAL_STORAGE_GET_F_CREATE is present.

During lookup, it will cache the lookup result and caching requires to
acquire a spinlock. When potential deadlock is detected (by the
bpf_task_storage_busy pcpu-counter added in
commit bc235cdb ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")),
the current behavior is returning NULL immediately to avoid deadlock. It is
too pessimistic. This patch will go ahead to do a lookup (which is a
lockless operation) but it will avoid caching it in order to avoid
acquiring the spinlock.

When lookup fails to find the data and BPF_LOCAL_STORAGE_GET_F_CREATE
is set, an insertion is needed and this requires acquiring a spinlock.
This patch will still return NULL when a potential deadlock is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-5-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e8b02296

bpf: Refactor the core bpf_task_storage_get logic into a new function · 6d65500c

Martin KaFai Lau authored Oct 25, 2022

This patch creates a new function __bpf_task_storage_get() and
moves the core logic of the existing bpf_task_storage_get()
into this new function. This new function will be shared
by another new helper proto in the latter patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-4-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6d65500c

bpf: Append _recur naming to the bpf_task_storage helper proto · 0593dd34

Martin KaFai Lau authored Oct 25, 2022

This patch adds the "_recur" naming to the bpf_task_storage_{get,delete}
proto.  In a latter patch, they will only be used by the tracing
programs that requires a deadlock detection because a tracing
prog may use bpf_task_storage_{get,delete} recursively and cause a
deadlock.

Another following patch will add a different helper proto for the non
tracing programs because they do not need the deadlock prevention.
This patch does this rename to prepare for this future proto
additions.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-3-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0593dd34

bpf: Remove prog->active check for bpf_lsm and bpf_iter · 271de525

Martin KaFai Lau authored Oct 25, 2022

The commit 64696c40 ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline")
removed prog->active check for struct_ops prog.  The bpf_lsm
and bpf_iter is also using trampoline.  Like struct_ops, the bpf_lsm
and bpf_iter have fixed hooks for the prog to attach.  The
kernel does not call the same hook in a recursive way.
This patch also removes the prog->active check for
bpf_lsm and bpf_iter.

A later patch has a test to reproduce the recursion issue
for a sleepable bpf_lsm program.

This patch appends the '_recur' naming to the existing
enter and exit functions that track the prog->active counter.
New __bpf_prog_{enter,exit}[_sleepable] function are
added to skip the prog->active tracking. The '_struct_ops'
version is also removed.

It also moves the decision on picking the enter and exit function to
the new bpf_trampoline_{enter,exit}().  It returns the '_recur' ones
for all tracing progs to use.  For bpf_lsm, bpf_iter,
struct_ops (no prog->active tracking after 64696c40), and
bpf_lsm_cgroup (no prog->active tracking after 69fd337a),
it will return the functions that don't track the prog->active.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-2-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

271de525

25 Oct, 2022 19 commits

libbpf: Btf dedup identical struct test needs check for nested structs/arrays · f3c51fe0

Alan Maguire authored Oct 24, 2022

When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module.  After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs.  However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.

To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:

__u8       __pkt_type_offset[0]; /*   128     0 */

Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions.  However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member.   A similar problem (though not
observed) could occur for struct-in-struct.

The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.

The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.

For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.

Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.

Fixes: efdd3eb8 ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com

[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire

f3c51fe0

Merge branch 'bpf: Fixes for kprobe multi on kernel modules' · 31af1aa0

Alexei Starovoitov authored Oct 25, 2022

Jiri Olsa says:

====================

hi,
Martynas reported kprobe _multi link does not resolve symbols
from kernel modules, which attach by address works.

In addition while fixing that I realized we do not take module
reference if the module has kprobe_multi link on top of it and
can be removed.

There's mo crash related to this, it will silently disappear from
ftrace tables, while kprobe_multi link stays up with no data.

This patchset has fixes for both issues.

v3 changes:
  - reorder fields in struct bpf_kprobe_multi_link [Andrii]
  - added ack [Andrii]

v2 changes:
  - added acks (Song)
  - added comment to kallsyms_callback (Song)
  - change module_callback realloc logic (Andrii)
  - get rid of macros in tests (Andrii)

thanks,
jirka
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

31af1aa0