Commit 4269f69b authored by Daniel Borkmann's avatar Daniel Borkmann

Merge branch 'bpf-doc-improvements'

Andrii Nakryiko says:

====================
A bunch of BPF-related docs typo, wording and formatting fixes.

v1->v2:
- split off non-documentation changes into separate patchset
====================
Acked-by: default avatarSong Liu <songliubraving@fb.com>
Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
parents 4b911304 46604676
...@@ -36,27 +36,27 @@ consideration important quirks of other architectures) and ...@@ -36,27 +36,27 @@ consideration important quirks of other architectures) and
defines calling convention that is compatible with C calling defines calling convention that is compatible with C calling
convention of the linux kernel on those architectures. convention of the linux kernel on those architectures.
Q: can multiple return values be supported in the future? Q: Can multiple return values be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF allows only register R0 to be used as return value. A: NO. BPF allows only register R0 to be used as return value.
Q: can more than 5 function arguments be supported in the future? Q: Can more than 5 function arguments be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF calling convention only allows registers R1-R5 to be used A: NO. BPF calling convention only allows registers R1-R5 to be used
as arguments. BPF is not a standalone instruction set. as arguments. BPF is not a standalone instruction set.
(unlike x64 ISA that allows msft, cdecl and other conventions) (unlike x64 ISA that allows msft, cdecl and other conventions)
Q: can BPF programs access instruction pointer or return address? Q: Can BPF programs access instruction pointer or return address?
----------------------------------------------------------------- -----------------------------------------------------------------
A: NO. A: NO.
Q: can BPF programs access stack pointer ? Q: Can BPF programs access stack pointer ?
------------------------------------------ ------------------------------------------
A: NO. A: NO.
Only frame pointer (register R10) is accessible. Only frame pointer (register R10) is accessible.
From compiler point of view it's necessary to have stack pointer. From compiler point of view it's necessary to have stack pointer.
For example LLVM defines register R11 as stack pointer in its For example, LLVM defines register R11 as stack pointer in its
BPF backend, but it makes sure that generated code never uses it. BPF backend, but it makes sure that generated code never uses it.
Q: Does C-calling convention diminishes possible use cases? Q: Does C-calling convention diminishes possible use cases?
...@@ -66,8 +66,8 @@ A: YES. ...@@ -66,8 +66,8 @@ A: YES.
BPF design forces addition of major functionality in the form BPF design forces addition of major functionality in the form
of kernel helper functions and kernel objects like BPF maps with of kernel helper functions and kernel objects like BPF maps with
seamless interoperability between them. It lets kernel call into seamless interoperability between them. It lets kernel call into
BPF programs and programs call kernel helpers with zero overhead. BPF programs and programs call kernel helpers with zero overhead,
As all of them were native C code. That is particularly the case as all of them were native C code. That is particularly the case
for JITed BPF programs that are indistinguishable from for JITed BPF programs that are indistinguishable from
native kernel C code. native kernel C code.
...@@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed? ...@@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
------------------------------------------------------------------------ ------------------------------------------------------------------------
A: Soft yes. A: Soft yes.
At least for now until BPF core has support for At least for now, until BPF core has support for
bpf-to-bpf calls, indirect calls, loops, global variables, bpf-to-bpf calls, indirect calls, loops, global variables,
jump tables, read only sections and all other normal constructs jump tables, read-only sections, and all other normal constructs
that C code can produce. that C code can produce.
Q: Can loops be supported in a safe way? Q: Can loops be supported in a safe way?
...@@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like? ...@@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like?
A: This was necessary to avoid introducing flags into ISA which are A: This was necessary to avoid introducing flags into ISA which are
impossible to make generic and efficient across CPU architectures. impossible to make generic and efficient across CPU architectures.
Q: why BPF_DIV instruction doesn't map to x64 div? Q: Why BPF_DIV instruction doesn't map to x64 div?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because if we picked one-to-one relationship to x64 it would have made A: Because if we picked one-to-one relationship to x64 it would have made
it more complicated to support on arm64 and other archs. Also it it more complicated to support on arm64 and other archs. Also it
needs div-by-zero runtime check. needs div-by-zero runtime check.
Q: why there is no BPF_SDIV for signed divide operation? Q: Why there is no BPF_SDIV for signed divide operation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because it would be rarely used. llvm errors in such case and A: Because it would be rarely used. llvm errors in such case and
prints a suggestion to use unsigned divide instead prints a suggestion to use unsigned divide instead.
Q: Why BPF has implicit prologue and epilogue? Q: Why BPF has implicit prologue and epilogue?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -5,43 +5,35 @@ BPF Type Format (BTF) ...@@ -5,43 +5,35 @@ BPF Type Format (BTF)
1. Introduction 1. Introduction
*************** ***************
BTF (BPF Type Format) is the meta data format which BTF (BPF Type Format) is the metadata format which encodes the debug info
encodes the debug info related to BPF program/map. related to BPF program/map. The name BTF was used initially to describe data
The name BTF was used initially to describe types. The BTF was later extended to include function info for defined
data types. The BTF was later extended to include subroutines, and line info for source/line information.
function info for defined subroutines, and line info
for source/line information. The debug info is used for map pretty print, function signature, etc. The
function signature enables better bpf program/function kernel symbol. The line
The debug info is used for map pretty print, function info helps generate source annotated translated byte code, jited code and
signature, etc. The function signature enables better verifier log.
bpf program/function kernel symbol.
The line info helps generate
source annotated translated byte code, jited code
and verifier log.
The BTF specification contains two parts, The BTF specification contains two parts,
* BTF kernel API * BTF kernel API
* BTF ELF file format * BTF ELF file format
The kernel API is the contract between The kernel API is the contract between user space and kernel. The kernel
user space and kernel. The kernel verifies verifies the BTF info before using it. The ELF file format is a user space
the BTF info before using it. contract between ELF file and libbpf loader.
The ELF file format is a user space contract
between ELF file and libbpf loader.
The type and string sections are part of the The type and string sections are part of the BTF kernel API, describing the
BTF kernel API, describing the debug info debug info (mostly types related) referenced by the bpf program. These two
(mostly types related) referenced by the bpf program. sections are discussed in details in :ref:`BTF_Type_String`.
These two sections are discussed in
details in :ref:`BTF_Type_String`.
.. _BTF_Type_String: .. _BTF_Type_String:
2. BTF Type and String Encoding 2. BTF Type and String Encoding
******************************* *******************************
The file ``include/uapi/linux/btf.h`` provides high The file ``include/uapi/linux/btf.h`` provides high-level definition of how
level definition on how types/strings are encoded. types/strings are encoded.
The beginning of data blob must be:: The beginning of data blob must be::
...@@ -59,25 +51,23 @@ The beginning of data blob must be:: ...@@ -59,25 +51,23 @@ The beginning of data blob must be::
}; };
The magic is ``0xeB9F``, which has different encoding for big and little The magic is ``0xeB9F``, which has different encoding for big and little
endian system, and can be used to test whether BTF is generated for endian systems, and can be used to test whether BTF is generated for big- or
big or little endian target. little-endian target. The ``btf_header`` is designed to be extensible with
The btf_header is designed to be extensible with hdr_len equal to ``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is
``sizeof(struct btf_header)`` when the data blob is generated. generated.
2.1 String Encoding 2.1 String Encoding
=================== ===================
The first string in the string section must be a null string. The first string in the string section must be a null string. The rest of
The rest of string table is a concatenation of other null-treminated string table is a concatenation of other null-terminated strings.
strings.
2.2 Type Encoding 2.2 Type Encoding
================= =================
The type id ``0`` is reserved for ``void`` type. The type id ``0`` is reserved for ``void`` type. The type section is parsed
The type section is parsed sequentially and the type id is assigned to sequentially and type id is assigned to each recognized type starting from id
each recognized type starting from id ``1``. ``1``. Currently, the following types are supported::
Currently, the following types are supported::
#define BTF_KIND_INT 1 /* Integer */ #define BTF_KIND_INT 1 /* Integer */
#define BTF_KIND_PTR 2 /* Pointer */ #define BTF_KIND_PTR 2 /* Pointer */
...@@ -122,9 +112,9 @@ Each type contains the following common data:: ...@@ -122,9 +112,9 @@ Each type contains the following common data::
}; };
}; };
For certain kinds, the common data are followed by kind specific data. For certain kinds, the common data are followed by kind-specific data. The
The ``name_off`` in ``struct btf_type`` specifies the offset in the string table. ``name_off`` in ``struct btf_type`` specifies the offset in the string table.
The following details encoding of each kind. The following sections detail encoding of each kind.
2.2.1 BTF_KIND_INT 2.2.1 BTF_KIND_INT
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
...@@ -136,7 +126,7 @@ The following details encoding of each kind. ...@@ -136,7 +126,7 @@ The following details encoding of each kind.
* ``info.vlen``: 0 * ``info.vlen``: 0
* ``size``: the size of the int type in bytes. * ``size``: the size of the int type in bytes.
``btf_type`` is followed by a ``u32`` with following bits arrangement:: ``btf_type`` is followed by a ``u32`` with the following bits arrangement::
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24) #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16) #define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
...@@ -148,39 +138,33 @@ The ``BTF_INT_ENCODING`` has the following attributes:: ...@@ -148,39 +138,33 @@ The ``BTF_INT_ENCODING`` has the following attributes::
#define BTF_INT_CHAR (1 << 1) #define BTF_INT_CHAR (1 << 1)
#define BTF_INT_BOOL (1 << 2) #define BTF_INT_BOOL (1 << 2)
The ``BTF_INT_ENCODING()`` provides extra information, signness, The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or
char, or bool, for the int type. The char and bool encoding bool, for the int type. The char and bool encoding are mostly useful for
are mostly useful for pretty print. At most one encoding can pretty print. At most one encoding can be specified for the int type.
be specified for the int type.
The ``BTF_INT_BITS()`` specifies the number of actual bits held by
this int type. For example, a 4-bit bitfield encodes
``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8``
must be equal to or greater than ``BTF_INT_BITS()`` for the type.
The maximum value of ``BTF_INT_BITS()`` is 128.
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int
calculate values for this int. For example, a bitfield struct type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4.
member has The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()``
for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
* btf member bit offset 100 from the start of the structure, The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
* btf member pointing to an int type, for this int. For example, a bitfield struct member has: * btf member bit
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` offset 100 from the start of the structure, * btf member pointing to an int
type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
Then in the struct memory layout, this member will occupy Then in the struct memory layout, this member will occupy ``4`` bits starting
``4`` bits starting from bits ``100 + 2 = 102``. from bits ``100 + 2 = 102``.
Alternatively, the bitfield struct member can be the following to Alternatively, the bitfield struct member can be the following to access the
access the same bits as the above: same bits as the above:
* btf member bit offset 102, * btf member bit offset 102,
* btf member pointing to an int type, * btf member pointing to an int type,
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
The original intention of ``BTF_INT_OFFSET()`` is to provide The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of
flexibility of bitfield encoding. bitfield encoding. Currently, both llvm and pahole generate
Currently, both llvm and pahole generates ``BTF_INT_OFFSET() = 0`` ``BTF_INT_OFFSET() = 0`` for all int types.
for all int types.
2.2.2 BTF_KIND_PTR 2.2.2 BTF_KIND_PTR
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
...@@ -204,7 +188,7 @@ No additional type data follow ``btf_type``. ...@@ -204,7 +188,7 @@ No additional type data follow ``btf_type``.
* ``info.vlen``: 0 * ``info.vlen``: 0
* ``size/type``: 0, not used * ``size/type``: 0, not used
btf_type is followed by one "struct btf_array":: ``btf_type`` is followed by one ``struct btf_array``::
struct btf_array { struct btf_array {
__u32 type; __u32 type;
...@@ -217,27 +201,25 @@ The ``struct btf_array`` encoding: ...@@ -217,27 +201,25 @@ The ``struct btf_array`` encoding:
* ``index_type``: the index type * ``index_type``: the index type
* ``nelems``: the number of elements for this array (``0`` is also allowed). * ``nelems``: the number of elements for this array (``0`` is also allowed).
The ``index_type`` can be any regular int types The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``,
(u8, u16, u32, u64, unsigned __int128). ``u64``, ``unsigned __int128``). The original design of including
The original design of including ``index_type`` follows dwarf ``index_type`` follows DWARF, which has an ``index_type`` for its array type.
which has a ``index_type`` for its array type.
Currently in BTF, beyond type verification, the ``index_type`` is not used. Currently in BTF, beyond type verification, the ``index_type`` is not used.
The ``struct btf_array`` allows chaining through element type to represent The ``struct btf_array`` allows chaining through element type to represent
multiple dimensional arrays. For example, ``int a[5][6]``, the following multidimensional arrays. For example, for ``int a[5][6]``, the following type
type system illustrates the chaining: information illustrates the chaining:
* [1]: int * [1]: int
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6`` * [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5`` * [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
Currently, both pahole and llvm collapse multiple dimensional array Currently, both pahole and llvm collapse multidimensional array into
into one dimensional array, e.g., ``a[5][6]``, the btf_array.nelems one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is
equal to ``30``. This is because the original use case is map pretty equal to ``30``. This is because the original use case is map pretty print
print where the whole array is dumped out so one dimensional array where the whole array is dumped out so one-dimensional array is enough. As
is enough. As more BTF usage is explored, pahole and llvm can be more BTF usage is explored, pahole and llvm can be changed to generate proper
changed to generate proper chained representation for chained representation for multidimensional arrays.
multiple dimensional arrays.
2.2.4 BTF_KIND_STRUCT 2.2.4 BTF_KIND_STRUCT
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
...@@ -264,28 +246,26 @@ multiple dimensional arrays. ...@@ -264,28 +246,26 @@ multiple dimensional arrays.
* ``type``: the member type * ``type``: the member type
* ``offset``: <see below> * ``offset``: <see below>
If the type info ``kind_flag`` is not set, the offset contains If the type info ``kind_flag`` is not set, the offset contains only bit offset
only bit offset of the member. Note that the base type of the of the member. Note that the base type of the bitfield can only be int or enum
bitfield can only be int or enum type. If the bitfield size type. If the bitfield size is 32, the base type can be either int or enum
is 32, the base type can be either int or enum type. type. If the bitfield size is not 32, the base type must be int, and int type
If the bitfield size is not 32, the base type must be int, ``BTF_INT_BITS()`` encodes the bitfield size.
and int type ``BTF_INT_BITS()`` encodes the bitfield size.
If the ``kind_flag`` is set, the ``btf_member.offset`` If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member
contains both member bitfield size and bit offset. The bitfield size and bit offset. The bitfield size and bit offset are calculated
bitfield size and bit offset are calculated as below.:: as below.::
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24) #define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff) #define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
In this case, if the base type is an int type, it must In this case, if the base type is an int type, it must be a regular int type:
be a regular int type:
* ``BTF_INT_OFFSET()`` must be 0. * ``BTF_INT_OFFSET()`` must be 0.
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``. * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
The following kernel patch introduced ``kind_flag`` and The following kernel patch introduced ``kind_flag`` and explained why both
explained why both modes exist: modes exist:
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3 https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
...@@ -382,11 +362,11 @@ No additional type data follow ``btf_type``. ...@@ -382,11 +362,11 @@ No additional type data follow ``btf_type``.
No additional type data follow ``btf_type``. No additional type data follow ``btf_type``.
A BTF_KIND_FUNC defines, not a type, but a subprogram (function) whose A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose
signature is defined by ``type``. The subprogram is thus an instance of signature is defined by ``type``. The subprogram is thus an instance of that
that type. The BTF_KIND_FUNC may in turn be referenced by a func_info in type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
the :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
:ref:`BPF_Prog_Load` (ABI). (ABI).
2.2.13 BTF_KIND_FUNC_PROTO 2.2.13 BTF_KIND_FUNC_PROTO
~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -405,13 +385,13 @@ the :ref:`BTF_Ext_Section` (ELF) or in the arguments to ...@@ -405,13 +385,13 @@ the :ref:`BTF_Ext_Section` (ELF) or in the arguments to
__u32 type; __u32 type;
}; };
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then
then ``btf_param.name_off`` must point to a valid C identifier ``btf_param.name_off`` must point to a valid C identifier except for the
except for the possible last argument representing the variable possible last argument representing the variable argument. The btf_param.type
argument. The btf_param.type refers to parameter type. refers to parameter type.
If the function has variable arguments, the last parameter If the function has variable arguments, the last parameter is encoded with
is encoded with ``name_off = 0`` and ``type = 0``. ``name_off = 0`` and ``type = 0``.
3. BTF Kernel API 3. BTF Kernel API
***************** *****************
...@@ -459,10 +439,9 @@ The workflow typically looks like: ...@@ -459,10 +439,9 @@ The workflow typically looks like:
3.1 BPF_BTF_LOAD 3.1 BPF_BTF_LOAD
================ ================
Load a blob of BTF data into kernel. A blob of data Load a blob of BTF data into kernel. A blob of data, described in
described in :ref:`BTF_Type_String` :ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
can be directly loaded into the kernel. is returned to a userspace.
A ``btf_fd`` returns to userspace.
3.2 BPF_MAP_CREATE 3.2 BPF_MAP_CREATE
================== ==================
...@@ -484,18 +463,18 @@ In libbpf, the map can be defined with extra annotation like below: ...@@ -484,18 +463,18 @@ In libbpf, the map can be defined with extra annotation like below:
}; };
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and
key and value types for the map. value types for the map. During ELF parsing, libbpf is able to extract
During ELF parsing, libbpf is able to extract key/value type_id's key/value type_id's and assign them to BPF_MAP_CREATE attributes
and assigned them to BPF_MAP_CREATE attributes automatically. automatically.
.. _BPF_Prog_Load: .. _BPF_Prog_Load:
3.3 BPF_PROG_LOAD 3.3 BPF_PROG_LOAD
================= =================
During prog_load, func_info and line_info can be passed to kernel with During prog_load, func_info and line_info can be passed to kernel with proper
proper values for the following attributes: values for the following attributes:
:: ::
__u32 insn_cnt; __u32 insn_cnt;
...@@ -522,9 +501,9 @@ The func_info and line_info are an array of below, respectively.:: ...@@ -522,9 +501,9 @@ The func_info and line_info are an array of below, respectively.::
__u32 line_col; /* line number and column number */ __u32 line_col; /* line number and column number */
}; };
func_info_rec_size is the size of each func_info record, and line_info_rec_size func_info_rec_size is the size of each func_info record, and
is the size of each line_info record. Passing the record size to kernel make line_info_rec_size is the size of each line_info record. Passing the record
it possible to extend the record itself in the future. size to kernel make it possible to extend the record itself in the future.
Below are requirements for func_info: Below are requirements for func_info:
* func_info[0].insn_off must be 0. * func_info[0].insn_off must be 0.
...@@ -532,7 +511,7 @@ Below are requirements for func_info: ...@@ -532,7 +511,7 @@ Below are requirements for func_info:
bpf func boundaries. bpf func boundaries.
Below are requirements for line_info: Below are requirements for line_info:
* the first insn in each func must points to a line_info record. * the first insn in each func must have a line_info record pointing to it.
* the line_info insn_off is in strictly increasing order. * the line_info insn_off is in strictly increasing order.
For line_info, the line number and column number are defined as below: For line_info, the line number and column number are defined as below:
...@@ -543,40 +522,38 @@ For line_info, the line number and column number are defined as below: ...@@ -543,40 +522,38 @@ For line_info, the line number and column number are defined as below:
3.4 BPF_{PROG,MAP}_GET_NEXT_ID 3.4 BPF_{PROG,MAP}_GET_NEXT_ID
In kernel, every loaded program, map or btf has a unique id. In kernel, every loaded program, map or btf has a unique id. The id won't
The id won't change during the life time of the program, map or btf. change during the lifetime of a program, map, or btf.
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for
returns all id's, one for each command, to user space, for bpf each command, to user space, for bpf program or maps, respectively, so an
program or maps, inspection tool can inspect all programs and maps.
so the inspection tool can inspect all programs and maps.
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID 3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
The introspection tool cannot use id to get details about program or maps. An introspection tool cannot use id to get details about program or maps.
A file descriptor needs to be obtained first for reference counting purpose. A file descriptor needs to be obtained first for reference-counting purpose.
3.6 BPF_OBJ_GET_INFO_BY_FD 3.6 BPF_OBJ_GET_INFO_BY_FD
========================== ==========================
Once a program/map fd is acquired, the introspection tool can Once a program/map fd is acquired, an introspection tool can get the detailed
get the detailed information from kernel about this fd, information from kernel about this fd, some of which are BTF-related. For
some of which is btf related. For example, example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
``bpf_map_info`` returns ``btf_id``, key/value type id. ``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated
``bpf_prog_info`` returns ``btf_id``, func_info and line info bpf byte codes, and jited_line_info.
for translated bpf byte codes, and jited_line_info.
3.7 BPF_BTF_GET_FD_BY_ID 3.7 BPF_BTF_GET_FD_BY_ID
======================== ========================
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
bpf syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
Then, with command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the
loaded into the kernel with BPF_BTF_LOAD, can be retrieved. kernel with BPF_BTF_LOAD, can be retrieved.
With the btf blob, ``bpf_map_info`` and ``bpf_prog_info``, the introspection With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection
tool has full btf knowledge and is able to pretty print map key/values, tool has full btf knowledge and is able to pretty print map key/values, dump
dump func signatures, dump line info along with byte/jit codes. func signatures and line info, along with byte/jit codes.
4. ELF File Format Interface 4. ELF File Format Interface
**************************** ****************************
...@@ -584,19 +561,19 @@ dump func signatures, dump line info along with byte/jit codes. ...@@ -584,19 +561,19 @@ dump func signatures, dump line info along with byte/jit codes.
4.1 .BTF section 4.1 .BTF section
================ ================
The .BTF section contains type and string data. The format of this section The .BTF section contains type and string data. The format of this section is
is same as the one describe in :ref:`BTF_Type_String`. same as the one describe in :ref:`BTF_Type_String`.
.. _BTF_Ext_Section: .. _BTF_Ext_Section:
4.2 .BTF.ext section 4.2 .BTF.ext section
==================== ====================
The .BTF.ext section encodes func_info and line_info which The .BTF.ext section encodes func_info and line_info which needs loader
needs loader manipulation before loading into the kernel. manipulation before loading into the kernel.
The specification for .BTF.ext section is defined at The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h``
``tools/lib/bpf/btf.h`` and ``tools/lib/bpf/btf.c``. and ``tools/lib/bpf/btf.c``.
The current header of .BTF.ext section:: The current header of .BTF.ext section::
...@@ -613,9 +590,9 @@ The current header of .BTF.ext section:: ...@@ -613,9 +590,9 @@ The current header of .BTF.ext section::
__u32 line_info_len; __u32 line_info_len;
}; };
It is very similar to .BTF section. Instead of type/string section, It is very similar to .BTF section. Instead of type/string section, it
it contains func_info and line_info section. See :ref:`BPF_Prog_Load` contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details
for details about func_info and line_info record format. about func_info and line_info record format.
The func_info is organized as below.:: The func_info is organized as below.::
...@@ -624,9 +601,9 @@ The func_info is organized as below.:: ...@@ -624,9 +601,9 @@ The func_info is organized as below.::
btf_ext_info_sec for section #2 /* func_info for section #2 */ btf_ext_info_sec for section #2 /* func_info for section #2 */
... ...
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure ``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when
when .BTF.ext is generated. btf_ext_info_sec, defined below, is .BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of
the func_info for each specific ELF section.:: func_info for each specific ELF section.::
struct btf_ext_info_sec { struct btf_ext_info_sec {
__u32 sec_name_off; /* offset to section name */ __u32 sec_name_off; /* offset to section name */
...@@ -644,14 +621,14 @@ The line_info is organized as below.:: ...@@ -644,14 +621,14 @@ The line_info is organized as below.::
btf_ext_info_sec for section #2 /* line_info for section #2 */ btf_ext_info_sec for section #2 /* line_info for section #2 */
... ...
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure ``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when
when .BTF.ext is generated. .BTF.ext is generated.
The interpretation of ``bpf_func_info->insn_off`` and The interpretation of ``bpf_func_info->insn_off`` and
``bpf_line_info->insn_off`` is different between kernel API and ELF API. ``bpf_line_info->insn_off`` is different between kernel API and ELF API. For
For kernel API, the ``insn_off`` is the instruction offset in the unit kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct
of ``struct bpf_insn``. For ELF API, the ``insn_off`` is the byte offset bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
from the beginning of section (``btf_ext_info_sec->sec_name_off``). beginning of section (``btf_ext_info_sec->sec_name_off``).
5. Using BTF 5. Using BTF
************ ************
...@@ -659,10 +636,9 @@ from the beginning of section (``btf_ext_info_sec->sec_name_off``). ...@@ -659,10 +636,9 @@ from the beginning of section (``btf_ext_info_sec->sec_name_off``).
5.1 bpftool map pretty print 5.1 bpftool map pretty print
============================ ============================
With BTF, the map key/value can be printed based on fields rather than With BTF, the map key/value can be printed based on fields rather than simply
simply raw bytes. This is especially raw bytes. This is especially valuable for large structure or if your data
valuable for large structure or if you data structure structure has bitfields. For example, for the following map,::
has bitfields. For example, for the following map,::
enum A { A1, A2, A3, A4, A5 }; enum A { A1, A2, A3, A4, A5 };
typedef enum A ___A; typedef enum A ___A;
...@@ -702,9 +678,9 @@ bpftool is able to pretty print like below: ...@@ -702,9 +678,9 @@ bpftool is able to pretty print like below:
5.2 bpftool prog dump 5.2 bpftool prog dump
===================== =====================
The following is an example to show func_info and line_info The following is an example showing how func_info and line_info can help prog
can help prog dump with better kernel symbol name, function prototype dump with better kernel symbol names, function prototypes and line
and line information.:: information.::
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...] [...]
...@@ -733,10 +709,11 @@ and line information.:: ...@@ -733,10 +709,11 @@ and line information.::
; counts = bpf_map_lookup_elem(&btf_map, &key); ; counts = bpf_map_lookup_elem(&btf_map, &key);
[...] [...]
5.3 verifier log 5.3 Verifier Log
================ ================
The following is an example how line_info can help verifier failure debug.:: The following is an example of how line_info can help debugging verification
failure.::
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
* is modified as below. * is modified as below.
...@@ -765,8 +742,8 @@ You need latest pahole ...@@ -765,8 +742,8 @@ You need latest pahole
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't support .BTF.ext or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't
and btf BTF_KIND_FUNC type yet. For example,:: support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,::
-bash-4.4$ cat t.c -bash-4.4$ cat t.c
struct t { struct t {
...@@ -783,8 +760,9 @@ and btf BTF_KIND_FUNC type yet. For example,:: ...@@ -783,8 +760,9 @@ and btf BTF_KIND_FUNC type yet. For example,::
c type_id=2 bitfield_size=2 bits_offset=5 c type_id=2 bitfield_size=2 bits_offset=5
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target only. The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target
The assembly code (-S) is able to show the BTF encoding in assembly format.:: only. The assembly code (-S) is able to show the BTF encoding in assembly
format.::
-bash-4.4$ cat t2.c -bash-4.4$ cat t2.c
typedef int __int32; typedef int __int32;
...@@ -867,4 +845,4 @@ The assembly code (-S) is able to show the BTF encoding in assembly format.:: ...@@ -867,4 +845,4 @@ The assembly code (-S) is able to show the BTF encoding in assembly format.::
7. Testing 7. Testing
********** **********
Kernel bpf selftest `test_btf.c` provides extensive set of BTF related tests. Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
...@@ -829,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9 ...@@ -829,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack. out of registers and would have to resort to spill/fill to stack.
Internal BPF can used as generic assembler for last step performance Internal BPF can be used as a generic assembler for last step performance
optimizations, socket filters and seccomp are using it as assembler. Tracing optimizations, socket filters and seccomp are using it as assembler. Tracing
filters may use it as assembler to generate code from kernel. In kernel usage filters may use it as assembler to generate code from kernel. In kernel usage
may not be bounded by security considerations, since generated internal BPF code may not be bounded by security considerations, since generated internal BPF code
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment