Commit 2d141236 authored by Alexei Starovoitov's avatar Alexei Starovoitov

Merge branch 'Document some recent core kfunc additions'

David Vernet says:

====================

A series of recent patch sets introduced kfuncs that allowed struct
task_struct and struct cgroup objects to be used as kptrs. These were
introduced in [0], [1], and [2].

[0]: https://lore.kernel.org/lkml/20221120051004.3605026-1-void@manifault.com/
[1]: https://lore.kernel.org/lkml/20221122145300.251210-2-void@manifault.com/T/
[2]: https://lore.kernel.org/lkml/20221122055458.173143-1-void@manifault.com/

These are "core" kfuncs, in that they may be used by a wide variety of
possible BPF tracepoint or struct_ops programs, and are defined in
kernel/bpf/helpers.c. Even though as kfuncs they have no ABI stability
guarantees, they should still be properly documented. This patch set
adds that documentation.

Some other kfuncs were added recently as well, such as
bpf_rcu_read_lock() and bpf_rcu_read_unlock(). Those could and should be
added to this "Core kfuncs" section as well in subsequent patch sets.

Note that this patch set does not contain documentation for
bpf_task_acquire_not_zero(), or bpf_task_kptr_get(). As discussed in
[3], those kfuncs currently always return NULL pending resolution on how
to properly protect their arguments using RCU.

[3]: https://lore.kernel.org/all/20221206210538.597606-1-void@manifault.com/
---
Changelog:
v2 -> v3:
- Don't document bpf_task_kptr_get(), and instead provide a more
  substantive example for bpf_cgroup_kptr_get().
- Further clarify expected behavior of bpf_task_from_pid() in comments
  (Alexei)

v1 -> v2:
- Expand comment to specify that a map holds a reference to a task kptr
  if we don't end up releasing it (Alexei)
- Just read task->pid instead of using a probed read (Alexei)
====================
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents 26c386ec 36aa10ff
......@@ -222,3 +222,201 @@ type. An example is shown below::
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
}
late_initcall(init_subsystem);
3. Core kfuncs
==============
The BPF subsystem provides a number of "core" kfuncs that are potentially
applicable to a wide variety of different possible use cases and programs.
Those kfuncs are documented here.
3.1 struct task_struct * kfuncs
-------------------------------
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
used as kptrs:
.. kernel-doc:: kernel/bpf/helpers.c
:identifiers: bpf_task_acquire bpf_task_release
These kfuncs are useful when you want to acquire or release a reference to a
``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
struct_ops callback arg. For example:
.. code-block:: c
/**
* A trivial example tracepoint program that shows how to
* acquire and release a struct task_struct * pointer.
*/
SEC("tp_btf/task_newtask")
int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
{
struct task_struct *acquired;
acquired = bpf_task_acquire(task);
/*
* In a typical program you'd do something like store
* the task in a map, and the map will automatically
* release it later. Here, we release it manually.
*/
bpf_task_release(acquired);
return 0;
}
----
A BPF program can also look up a task from a pid. This can be useful if the
caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
it can acquire a reference on with bpf_task_acquire().
.. kernel-doc:: kernel/bpf/helpers.c
:identifiers: bpf_task_from_pid
Here is an example of it being used:
.. code-block:: c
SEC("tp_btf/task_newtask")
int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
{
struct task_struct *lookup;
lookup = bpf_task_from_pid(task->pid);
if (!lookup)
/* A task should always be found, as %task is a tracepoint arg. */
return -ENOENT;
if (lookup->pid != task->pid) {
/* bpf_task_from_pid() looks up the task via its
* globally-unique pid from the init_pid_ns. Thus,
* the pid of the lookup task should always be the
* same as the input task.
*/
bpf_task_release(lookup);
return -EINVAL;
}
/* bpf_task_from_pid() returns an acquired reference,
* so it must be dropped before returning from the
* tracepoint handler.
*/
bpf_task_release(lookup);
return 0;
}
3.2 struct cgroup * kfuncs
--------------------------
``struct cgroup *`` objects also have acquire and release functions:
.. kernel-doc:: kernel/bpf/helpers.c
:identifiers: bpf_cgroup_acquire bpf_cgroup_release
These kfuncs are used in exactly the same manner as bpf_task_acquire() and
bpf_task_release() respectively, so we won't provide examples for them.
----
You may also acquire a reference to a ``struct cgroup`` kptr that's already
stored in a map using bpf_cgroup_kptr_get():
.. kernel-doc:: kernel/bpf/helpers.c
:identifiers: bpf_cgroup_kptr_get
Here's an example of how it can be used:
.. code-block:: c
/* struct containing the struct task_struct kptr which is actually stored in the map. */
struct __cgroups_kfunc_map_value {
struct cgroup __kptr_ref * cgroup;
};
/* The map containing struct __cgroups_kfunc_map_value entries. */
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, struct __cgroups_kfunc_map_value);
__uint(max_entries, 1);
} __cgroups_kfunc_map SEC(".maps");
/* ... */
/**
* A simple example tracepoint program showing how a
* struct cgroup kptr that is stored in a map can
* be acquired using the bpf_cgroup_kptr_get() kfunc.
*/
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
{
struct cgroup *kptr;
struct __cgroups_kfunc_map_value *v;
s32 id = cgrp->self.id;
/* Assume a cgroup kptr was previously stored in the map. */
v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
if (!v)
return -ENOENT;
/* Acquire a reference to the cgroup kptr that's already stored in the map. */
kptr = bpf_cgroup_kptr_get(&v->cgroup);
if (!kptr)
/* If no cgroup was present in the map, it's because
* we're racing with another CPU that removed it with
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
* above, and our call to bpf_cgroup_kptr_get().
* bpf_cgroup_kptr_get() internally safely handles this
* race, and will return NULL if the task is no longer
* present in the map by the time we invoke the kfunc.
*/
return -EBUSY;
/* Free the reference we just took above. Note that the
* original struct cgroup kptr is still in the map. It will
* be freed either at a later time if another context deletes
* it from the map, or automatically by the BPF subsystem if
* it's still present when the map is destroyed.
*/
bpf_cgroup_release(kptr);
return 0;
}
----
Another kfunc available for interacting with ``struct cgroup *`` objects is
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
and return it as a cgroup kptr.
.. kernel-doc:: kernel/bpf/helpers.c
:identifiers: bpf_cgroup_ancestor
Eventually, BPF should be updated to allow this to happen with a normal memory
load in the program itself. This is currently not possible without more work in
the verifier. bpf_cgroup_ancestor() can be used as follows:
.. code-block:: c
/**
* Simple tracepoint example that illustrates how a cgroup's
* ancestor can be accessed using bpf_cgroup_ancestor().
*/
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
{
struct cgroup *parent;
/* The parent cgroup resides at the level before the current cgroup's level. */
parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
if (!parent)
return -ENOENT;
bpf_printk("Parent id is %d", parent->self.id);
/* Return the parent cgroup that was acquired above. */
bpf_cgroup_release(parent);
return 0;
}
......@@ -1904,7 +1904,7 @@ struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
}
/**
* bpf_task_release - Release the reference acquired on a struct task_struct *.
* bpf_task_release - Release the reference acquired on a task.
* @p: The task on which a reference is being released.
*/
void bpf_task_release(struct task_struct *p)
......@@ -1960,7 +1960,7 @@ struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp)
}
/**
* bpf_cgroup_release - Release the reference acquired on a struct cgroup *.
* bpf_cgroup_release - Release the reference acquired on a cgroup.
* If this kfunc is invoked in an RCU read region, the cgroup is guaranteed to
* not be freed until the current grace period has ended, even if its refcount
* drops to 0.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment