Commits · 4fe64af23c12ae9f2c1f0ca0d556d9cb8f088dfb · Kirill Smelkov / linux

26 Oct, 2022 14 commits

libbpf: Support new cgroup local storage · 4fe64af2

Yonghong Song authored Oct 25, 2022

Add support for new cgroup local storage.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4fe64af2

bpf: Implement cgroup storage available to non-cgroup-attached bpf progs · c4bcfb38

Yonghong Song authored Oct 25, 2022

Similar to sk/inode/task storage, implement similar cgroup local storage.

There already exists a local storage implementation for cgroup-attached
bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

The life-cycle of storage is managed with the life-cycle of the
cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.

The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.

Typically, the following code is used to get the current cgroup:
    struct task_struct *task = bpf_get_current_task_btf();
    ... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
    struct task_struct {
        ....
        struct css_set __rcu            *cgroups;
        ....
    }
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.

Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c4bcfb38

bpf: Refactor some inode/task/sk storage functions for reuse · c83597fa

Yonghong Song authored Oct 25, 2022

Refactor codes so that inode/task/sk storage implementation
can maximally share the same code. I also added some comments
in new function bpf_local_storage_unlink_nolock() to make
codes easy to understand. There is no functionality change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c83597fa

bpf: Make struct cgroup btf id global · 5e67b8ef

Yonghong Song authored Oct 25, 2022

Make struct cgroup btf id global so later patch can reuse
the same btf id.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042840.672602-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5e67b8ef

Merge branch 'bpf: Avoid unnecessary deadlock detection and failure in task storage' · fdf45787

Alexei Starovoitov authored Oct 25, 2022

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

The commit bc235cdb ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
added deadlock detection to avoid a tracing program from recurring
on the bpf_task_storage_{get,delete}() helpers.  These helpers acquire
a spin lock and it will lead to deadlock.

It is unnecessary for the bpf_lsm and bpf_iter programs which do
not recur.  The situation is the same as the existing
bpf_pid_task_storage_{lookup,delete}_elem() which are
used in the syscall and they also do not have deadlock detection.

This set is to add new bpf_task_storage_{get,delete}() helper proto
without the deadlock detection.  The set also removes the prog->active
check from the bpf_lsm and bpf_iter program.  Please see the individual
patch for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fdf45787

selftests/bpf: Tracing prog can still do lookup under busy lock · 387b5321

Martin KaFai Lau authored Oct 25, 2022

This patch modifies the task_ls_recursion test to check that
the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update)
can still do the lockless lookup even it cannot acquire the percpu
busy lock. If the lookup succeeds, it will increment the value
by 1 and the value in the task storage map_a will become 200+1=201.
After that, BPF_PROG(on_update) tries to delete from map_a and
should get -EBUSY because it cannot acquire the percpu busy lock
after finding the data.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-10-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

387b5321

selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection · 0334b4d8

Martin KaFai Lau authored Oct 25, 2022

This patch adds a test to check for deadlock failure
in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog.
It also checks if the prog_info.recursion_misses is non zero.

The test starts with 32 threads and they are affinitized to one cpu.
In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within
one second if it is run without the previous patches of this set.

Here is the test error message before adding the no deadlock detection
version of the bpf_task_storage_{get,delete}:

test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0
test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-9-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0334b4d8

bpf: Add new bpf_task_storage_delete proto with no deadlock detection · 8a7dac37

Martin KaFai Lau authored Oct 25, 2022

The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_delete_elem()
which is called from the syscall map_delete_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_delete() helper.

This patch adds bpf_task_storage_delete proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter program.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-8-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8a7dac37

bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check · fda64ae0

Martin KaFai Lau authored Oct 25, 2022

Similar to the earlier change in bpf_task_storage_get_recur.
This patch changes bpf_task_storage_delete_recur such that it
does the lookup first. It only returns -EBUSY if it needs to
take the spinlock to do the deletion when potential deadlock
is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-7-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fda64ae0

bpf: Add new bpf_task_storage_get proto with no deadlock detection · 4279adb0

Martin KaFai Lau authored Oct 25, 2022

The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_lookup_elem()
which is called from the syscall map_lookup_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_get() helper.

This patch adds bpf_task_storage_get proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter programs.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-6-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4279adb0

bpf: Avoid taking spinlock in bpf_task_storage_get if potential deadlock is detected · e8b02296

Martin KaFai Lau authored Oct 25, 2022

bpf_task_storage_get() does a lookup and optionally inserts
new data if BPF_LOCAL_STORAGE_GET_F_CREATE is present.

During lookup, it will cache the lookup result and caching requires to
acquire a spinlock. When potential deadlock is detected (by the
bpf_task_storage_busy pcpu-counter added in
commit bc235cdb ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")),
the current behavior is returning NULL immediately to avoid deadlock. It is
too pessimistic. This patch will go ahead to do a lookup (which is a
lockless operation) but it will avoid caching it in order to avoid
acquiring the spinlock.

When lookup fails to find the data and BPF_LOCAL_STORAGE_GET_F_CREATE
is set, an insertion is needed and this requires acquiring a spinlock.
This patch will still return NULL when a potential deadlock is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-5-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e8b02296

bpf: Refactor the core bpf_task_storage_get logic into a new function · 6d65500c

Martin KaFai Lau authored Oct 25, 2022

This patch creates a new function __bpf_task_storage_get() and
moves the core logic of the existing bpf_task_storage_get()
into this new function. This new function will be shared
by another new helper proto in the latter patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-4-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6d65500c

bpf: Append _recur naming to the bpf_task_storage helper proto · 0593dd34

Martin KaFai Lau authored Oct 25, 2022

This patch adds the "_recur" naming to the bpf_task_storage_{get,delete}
proto.  In a latter patch, they will only be used by the tracing
programs that requires a deadlock detection because a tracing
prog may use bpf_task_storage_{get,delete} recursively and cause a
deadlock.

Another following patch will add a different helper proto for the non
tracing programs because they do not need the deadlock prevention.
This patch does this rename to prepare for this future proto
additions.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-3-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0593dd34

bpf: Remove prog->active check for bpf_lsm and bpf_iter · 271de525

Martin KaFai Lau authored Oct 25, 2022

The commit 64696c40 ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline")
removed prog->active check for struct_ops prog.  The bpf_lsm
and bpf_iter is also using trampoline.  Like struct_ops, the bpf_lsm
and bpf_iter have fixed hooks for the prog to attach.  The
kernel does not call the same hook in a recursive way.
This patch also removes the prog->active check for
bpf_lsm and bpf_iter.

A later patch has a test to reproduce the recursion issue
for a sleepable bpf_lsm program.

This patch appends the '_recur' naming to the existing
enter and exit functions that track the prog->active counter.
New __bpf_prog_{enter,exit}[_sleepable] function are
added to skip the prog->active tracking. The '_struct_ops'
version is also removed.

It also moves the decision on picking the enter and exit function to
the new bpf_trampoline_{enter,exit}().  It returns the '_recur' ones
for all tracing progs to use.  For bpf_lsm, bpf_iter,
struct_ops (no prog->active tracking after 64696c40), and
bpf_lsm_cgroup (no prog->active tracking after 69fd337a),
it will return the functions that don't track the prog->active.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-2-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

271de525

25 Oct, 2022 19 commits

libbpf: Btf dedup identical struct test needs check for nested structs/arrays · f3c51fe0

Alan Maguire authored Oct 24, 2022

When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module.  After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs.  However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.

To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:

__u8       __pkt_type_offset[0]; /*   128     0 */

Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions.  However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member.   A similar problem (though not
observed) could occur for struct-in-struct.

The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.

The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.

For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.

Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.

Fixes: efdd3eb8 ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com

[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire

f3c51fe0

Merge branch 'bpf: Fixes for kprobe multi on kernel modules' · 31af1aa0

Alexei Starovoitov authored Oct 25, 2022

Jiri Olsa says:

====================

hi,
Martynas reported kprobe _multi link does not resolve symbols
from kernel modules, which attach by address works.

In addition while fixing that I realized we do not take module
reference if the module has kprobe_multi link on top of it and
can be removed.

There's mo crash related to this, it will silently disappear from
ftrace tables, while kprobe_multi link stays up with no data.

This patchset has fixes for both issues.

v3 changes:
  - reorder fields in struct bpf_kprobe_multi_link [Andrii]
  - added ack [Andrii]

v2 changes:
  - added acks (Song)
  - added comment to kallsyms_callback (Song)
  - change module_callback realloc logic (Andrii)
  - get rid of macros in tests (Andrii)

thanks,
jirka
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

31af1aa0