- 05 Apr, 2022 8 commits
-
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Add libbpf support for USDT (User Statically-Defined Tracing) probes. USDTs is important part of tracing, and BPF, ecosystem, widely used in mission-critical production applications for observability, performance analysis, and debugging. And while USDTs themselves are pretty complicated abstraction built on top of uprobes, for end-users USDT is as natural a primitive as uprobes themselves. And thus it's important for libbpf to provide best possible user experience when it comes to build tracing applications relying on USDTs. USDTs historically presented a lot of challenges for libbpf's no compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of on-the-fly source code generation and compilation using its embedded Clang toolchain, which was impractical for more lightweight and thus more rigid libbpf-based approach. But still, with enough diligence and BPF cookies it's possible to implement USDT support that feels as natural as tracing any uprobe. This patch set is the culmination of such effort to add libbpf USDT support following the spirit and philosophy of BPF CO-RE (even though it's not inherently relying on BPF CO-RE much, see patch #1 for some notes regarding this). Each respective patch has enough details and explanations, so I won't go into details here. In the end, I think the overall usability of libbpf's USDT support *exceeds* the status quo set by BCC due to the elimination of awkward runtime USDT supporting code generation. It also exceeds BCC's capabilities due to the use of BPF cookie. This eliminates the need to determine a USDT call site (and thus specifics about how exactly to fetch arguments) based on its *absolute IP address*, which is impossible with shared libraries if no PID is specified (as we then just *can't* know absolute IP at which shared library is loaded, because it might be different for each process). With BPF cookie this is not a problem as we record "call site ID" directly in a BPF cookie value. This makes it possible to do a system-wide tracing of a USDT defined in a shared library. Think about tracing some USDT in libc across any process in the system, both running at the time of attachment and all the new processes started *afterwards*. This is a very powerful capability that allows more efficient observability and tracing tooling. Once this functionality lands, the plan is to extend libbpf-bootstrap ([0]) with an USDT example. It will also become possible to start converting BCC tools that rely on USDTs to their libbpf-based counterparts ([1]). It's worth noting that preliminary version of this code was currently used and tested in production code running fleet-wide observability toolkit. Libbpf functionality is broken down into 5 mostly logically independent parts, for ease of reviewing: - patch #1 adds BPF-side implementation; - patch #2 adds user-space APIs and wires bpf_link for USDTs; - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes, dealing with memory segments, relative vs absolute addresses, etc; - patch #4 adds internal ID allocation and setting up/tearing down of BPF-side state (spec and IP-to-ID mapping); - patch #5 implements x86/x86-64-specific logic of parsing USDT argument specifications; - patch #6 adds testing of various basic aspects of handling of USDT; - patch #7 extends the set of tests with more combinations of semaphore, executable vs shared library, and PID filter options. [0] https://github.com/libbpf/libbpf-bootstrap [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools v2->v3: - fix typos, leave link to systemtap doc, acks, etc (Dave); - include sys/sdt.h to avoid extra system-wide package dependencies; v1->v2: - huge high-level comment describing how all the moving parts fit together (Alan, Alexei); - switched from `__hidden __weak` to `static inline __noinline` for now, as there is a bug in BPF linker breaking final BPF object file due to invalid .BTF.ext data; I want to fix it separately at which point I'll switch back to __hidden __weak again. The fix isn't trivial, so I don't want to block on that. Same for __weak variable lookup bug that Henqi reported. - various fixes and improvements, addressing other feedback (Alan, Hengqi); Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Hengqi Chen <hengqi.chen@gmail.com> ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Extend urandom_read helper binary to include USDTs of 4 combinations: semaphore/semaphoreless (refcounted and non-refcounted) and based in executable or shared library. We also extend urandom_read with ability to report it's own PID to parent process and wait for parent process to ready itself up for tracing urandom_read. We utilize popen() and underlying pipe properties for proper signaling. Once urandom_read is ready, we add few tests to validate that libbpf's USDT attachment handles all the above combinations of semaphore (or lack of it) and static or shared library USDTs. Also, we validate that libbpf handles shared libraries both with PID filter and without one (i.e., -1 for PID argument). Having the shared library case tested with and without PID is important because internal logic differs on kernels that don't support BPF cookies. On such older kernels, attaching to USDTs in shared libraries without specifying concrete PID doesn't work in principle, because it's impossible to determine shared library's load address to derive absolute IPs for uprobe attachments. Without absolute IPs, it's impossible to perform correct look up of USDT spec based on uprobe's absolute IP (the only kind available from BPF at runtime). This is not the problem on newer kernels with BPF cookie as we don't need IP-to-ID lookup because BPF cookie value *is* spec ID. So having those two situations as separate subtests is good because libbpf CI is able to test latest selftests against old kernels (e.g., 4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment for old kernels, but will still leave PID-specific one enabled to validate this legacy logic is working correctly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-8-andrii@kernel.org
-
Andrii Nakryiko authored
Add semaphore-based USDT to test_progs itself and write basic tests to valicate both auto-attachment and manual attachment logic, as well as BPF-side functionality. Also add subtests to validate that libbpf properly deduplicates USDT specs and handles spec overflow situations correctly, as well as proper "rollback" of partially-attached multi-spec USDT. BPF-side of selftest intentionally consists of two files to validate that usdt.bpf.h header can be included from multiple source code files that are subsequently linked into final BPF object file without causing any symbol duplication or other issues. We are validating that __weak maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as intended. USDT selftests utilize sys/sdt.h header that on Ubuntu systems comes from systemtap-sdt-devel package. But to simplify everyone's life, including CI but especially casual contributors to bpf/bpf-next that are trying to build selftests, I've checked in sys/sdt.h header from [0] directly. This way it will work on all architectures and distros without having to figure it out for every relevant combination and adding any extra implicit package dependencies. [0] https://sourceware.org/git?p=systemtap.git;a=blob_plain;f=includes/sys/sdt.h;h=ca0162b4dc57520b96638c8ae79ad547eb1dd3a1;hb=HEADSigned-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-7-andrii@kernel.org
-
Andrii Nakryiko authored
Add x86/x86_64-specific USDT argument specification parsing. Each architecture will require their own logic, as all this is arch-specific assembly-based notation. Architectures that libbpf doesn't support for USDTs will pr_warn() with specific error and return -ENOTSUP. We use sscanf() as a very powerful and easy to use string parser. Those spaces in sscanf's format string mean "skip any whitespaces", which is pretty nifty (and somewhat little known) feature. All this was tested on little-endian architecture, so bit shifts are probably off on big-endian, which our CI will hopefully prove. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org
-
Andrii Nakryiko authored
Last part of architecture-agnostic user-space USDT handling logic is to set up BPF spec and, optionally, IP-to-ID maps from user-space. usdt_manager performs a compact spec ID allocation to utilize fixed-sized BPF maps as efficiently as possible. We also use hashmap to deduplicate USDT arg spec strings and map identical strings to single USDT spec, minimizing the necessary BPF map size. usdt_manager supports arbitrary sequences of attachment and detachment, both of the same USDT and multiple different USDTs and internally maintains a free list of unused spec IDs. bpf_link_usdt's logic is extended with proper setup and teardown of this spec ID free list and supporting BPF maps. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org
-
Andrii Nakryiko authored
Implement architecture-agnostic parts of USDT parsing logic. The code is the documentation in this case, it's futile to try to succinctly describe how USDT parsing is done in any sort of concreteness. But still, USDTs are recorded in special ELF notes section (.note.stapsdt), where each USDT call site is described separately. Along with USDT provider and USDT name, each such note contains USDT argument specification, which uses assembly-like syntax to describe how to fetch value of USDT argument. USDT arg spec could be just a constant, or a register, or a register dereference (most common cases in x86_64), but it technically can be much more complicated cases, like offset relative to global symbol and stuff like that. One of the later patches will implement most common subset of this for x86 and x86-64 architectures, which seems to handle a lot of real-world production application. USDT arg spec contains a compact encoding allowing usdt.bpf.h from previous patch to handle the above 3 cases. Instead of recording which register might be needed, we encode register's offset within struct pt_regs to simplify BPF-side implementation. USDT argument can be of different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle this, libbpf pre-calculates necessary bit shifts to do proper casting and sign-extension in a short sequences of left and right shifts. The rest is in the code with sometimes extensive comments and references to external "documentation" for USDTs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org
-
Andrii Nakryiko authored
Wire up libbpf USDT support APIs without yet implementing all the nitty-gritty details of USDT discovery, spec parsing, and BPF map initialization. User-visible user-space API is simple and is conceptually very similar to uprobe API. bpf_program__attach_usdt() API allows to programmatically attach given BPF program to a USDT, specified through binary path (executable or shared lib), USDT provider and name. Also, just like in uprobe case, PID filter is specified (0 - self, -1 - any process, or specific PID). Optionally, USDT cookie value can be specified. Such single API invocation will try to discover given USDT in specified binary and will use (potentially many) BPF uprobes to attach this program in correct locations. Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that represents this attachment. It is a virtual BPF link that doesn't have direct kernel object, as it can consist of multiple underlying BPF uprobe links. As such, attachment is not atomic operation and there can be brief moment when some USDT call sites are attached while others are still in the process of attaching. This should be taken into consideration by user. But bpf_program__attach_usdt() guarantees that in the case of success all USDT call sites are successfully attached, or all the successfuly attachments will be detached as soon as some USDT call sites failed to be attached. So, in theory, there could be cases of failed bpf_program__attach_usdt() call which did trigger few USDT program invocations. This is unavoidable due to multi-uprobe nature of USDT and has to be handled by user, if it's important to create an illusion of atomicity. USDT BPF programs themselves are marked in BPF source code as either SEC("usdt"), in which case they won't be auto-attached through skeleton's <skel>__attach() method, or it can have a full definition, which follows the spirit of fully-specified uprobes: SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's attach method will attempt auto-attachment. Similarly, generic bpf_program__attach() will have enought information to go off of for parameterless attachment. USDT BPF programs are actually uprobes, and as such for kernel they are marked as BPF_PROG_TYPE_KPROBE. Another part of this patch is USDT-related feature probing: - BPF cookie support detection from user-space; - detection of kernel support for auto-refcounting of USDT semaphore. The latter is optional. If kernel doesn't support such feature and USDT doesn't rely on USDT semaphores, no error is returned. But if libbpf detects that USDT requires setting semaphores and kernel doesn't support this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't support poking process's memory directly to increment semaphore value, like BCC does on legacy kernels, due to inherent raciness and danger of such process memory manipulation. Libbpf let's kernel take care of this properly or gives up. Logistically, all the extra USDT-related infrastructure of libbpf is put into a separate usdt.c file and abstracted behind struct usdt_manager. Each bpf_object has lazily-initialized usdt_manager pointer, which is only instantiated if USDT programs are attempted to be attached. Closing BPF object frees up usdt_manager resources. usdt_manager keeps track of USDT spec ID assignment and few other small things. Subsequent patches will fill out remaining missing pieces of USDT initialization and setup logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
-
Andrii Nakryiko authored
Add BPF-side implementation of libbpf-provided USDT support. This consists of single header library, usdt.bpf.h, which is meant to be used from user's BPF-side source code. This header is added to the list of installed libbpf header, along bpf_helpers.h and others. BPF-side implementation consists of two BPF maps: - spec map, which contains "a USDT spec" which encodes information necessary to be able to fetch USDT arguments and other information (argument count, user-provided cookie value, etc) at runtime; - IP-to-spec-ID map, which is only used on kernels that don't support BPF cookie feature. It allows to lookup spec ID based on the place in user application that triggers USDT program. These maps have default sizes, 256 and 1024, which are chosen conservatively to not waste a lot of space, but handling a lot of common cases. But there could be cases when user application needs to either trace a lot of different USDTs, or USDTs are heavily inlined and their arguments are located in a lot of differing locations. For such cases it might be necessary to size those maps up, which libbpf allows to do by overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros. It is an important aspect to keep in mind. Single USDT (user-space equivalent of kernel tracepoint) can have multiple USDT "call sites". That is, single logical USDT is triggered from multiple places in user application. This can happen due to function inlining. Each such inlined instance of USDT invocation can have its own unique USDT argument specification (instructions about the location of the value of each of USDT arguments). So while USDT looks very similar to usual uprobe or kernel tracepoint, under the hood it's actually a collection of uprobes, each potentially needing different spec to know how to fetch arguments. User-visible API consists of three helper functions: - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT; - bpf_usdt_arg(), which reads value of specified USDT argument (by it's zero-indexed position) and returns it as 64-bit value; - bpf_usdt_cookie(), which functions like BPF cookie for USDT programs; this is necessary as libbpf doesn't allow specifying actual BPF cookie and utilizes it internally for USDT support implementation. Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into BPF program. On kernels that don't support BPF cookie it is used to fetch absolute IP address of the underlying uprobe. usdt.bpf.h also provides BPF_USDT() macro, which functions like BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to get access to USDT arguments, if USDT definition is static and known to the user. It is expected that majority of use cases won't have to use bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover all their needs. Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to detect kernel support for BPF cookie. If BPF CO-RE dependency is undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to either a boolean constant (or equivalently zero and non-zero), or even point it to its own .rodata variable that can be specified from user's application user-space code. It is important that BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus .rodata and not just .data), as otherwise BPF code will still contain bpf_get_attach_cookie() BPF helper call and will fail validation at runtime, if not dead-code eliminated. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org
-
- 04 Apr, 2022 19 commits
-
-
Ilya Leoshkevich authored
attach_probe selftest fails on Debian-based distros with `failed to resolve full path for 'libc.so.6'`. The reason is that these distros embraced multiarch to the point where even for the "main" architecture they store libc in /lib/<triple>. This is configured in /etc/ld.so.conf and in theory it's possible to replicate the loader's parsing and processing logic in libbpf, however a much simpler solution is to just enumerate the known library paths. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404225020.51029-1-iii@linux.ibm.com
-
Ilya Leoshkevich authored
attach_probe selftest fails on aarch64 with `failed to create kprobe 'sys_nanosleep+0x0' perf event: No such file or directory`. This is because, like on several other architectures, nanosleep has a prefix. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404142101.27900-1-iii@linux.ibm.com
-
Andrii Nakryiko authored
Milan Landaverde says: ==================== With the addition of the syscall prog type we should now be able to see feature probe info for that prog type: $ bpftool feature probe kernel ... eBPF program_type syscall is available ... eBPF helpers supported for program type syscall: ... - bpf_sys_bpf - bpf_sys_close And for the link types, their names should aid in the output. Before: $ bpftool link show 50: type 7 prog 5042 bpf_cookie 0 pids vfsstat(394433) After: $ bpftool link show 57: perf_event prog 5058 bpf_cookie 0 pids vfsstat(394725) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Milan Landaverde authored
Previously [1], we were using bpf_probe_prog_type which returned a bool, but the new libbpf_probe_bpf_prog_type can return a negative error code on failure. This change decides for bpftool to declare a program type is not available on probe failure. [1] https://lore.kernel.org/bpf/20220202225916.3313522-3-andrii@kernel.org/Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220331154555.422506-4-milan@mdaverde.com
-
Milan Landaverde authored
Will display the link type names in bpftool link show output Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220331154555.422506-3-milan@mdaverde.com
-
Milan Landaverde authored
In addition to displaying the program type in bpftool prog show this enables us to be able to query bpf_prog_type_syscall availability through feature probe as well as see which helpers are available in those programs (such as bpf_sys_bpf and bpf_sys_close) Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220331154555.422506-2-milan@mdaverde.com
-
Quentin Monnet authored
The script for checking that various lists of types in bpftool remain in sync with the UAPI BPF header uses a regex to parse enum bpf_prog_type. If this enum contains a set of values different from the list of program types in bpftool, it complains. This script should have reported the addition, some time ago, of the new BPF_PROG_TYPE_SYSCALL, which was not reported to bpftool's program types list. It failed to do so, because it failed to parse that new type from the enum. This is because the new value, in the BPF header, has an explicative comment on the same line, and the regex does not support that. Let's update the script to support parsing enum values when they have comments on the same line. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404140944.64744-1-quentin@isovalent.com
-
Alexander Lobakin authored
Users of the xdp_sample_user infra should be explicitly linked with the standard math library (`-lm`). Otherwise, the following happens: /usr/bin/ld: xdp_sample_user.c:(.text+0x59fc): undefined reference to `ceil' /usr/bin/ld: xdp_sample_user.c:(.text+0x5a0d): undefined reference to `ceil' /usr/bin/ld: xdp_sample_user.c:(.text+0x5adc): undefined reference to `floor' /usr/bin/ld: xdp_sample_user.c:(.text+0x5b01): undefined reference to `ceil' /usr/bin/ld: xdp_sample_user.c:(.text+0x5c1e): undefined reference to `floor' /usr/bin/ld: xdp_sample_user.c:(.text+0x5c43): undefined reference to `ceil [...] That happened previously, so there's a block of linkage flags in the Makefile. xdp_router_ipv4 has been transferred to this infra quite recently, but hasn't been added to it. Fix. Fixes: 85bf1f51 ("samples: bpf: Convert xdp_router_ipv4 to XDP samples helper") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404115451.1116478-1-alexandr.lobakin@intel.com
-
Song Chen authored
At the end of the test, we already print out prog <prog number>: map ids <...> <...> Value is the number read from kernel through bpf map, further print out verify map:<map id> val:<...> will help users to understand the program runs successfully. Signed-off-by: Song Chen <chensong_2000@189.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648889828-12417-1-git-send-email-chensong_2000@189.cn
-
Yuntao Wang authored
Since core relos is an optional part of the .BTF.ext ELF section, we should skip parsing it instead of returning -EINVAL if header size is less than offsetofend(struct btf_ext_header, core_relo_len). Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404005320.1723055-1-ytcoode@gmail.com
-
Andrii Nakryiko authored
Alan Maguire says: ==================== This patch series focuses on supporting name-based attach - similar to that supported for kprobes - for uprobe BPF programs. Currently attach for such probes is done by determining the offset manually, so the aim is to try and mimic the simplicity of kprobe attach, making use of uprobe opts to specify a name string. Patch 1 supports expansion of the binary_path argument used for bpf_program__attach_uprobe_opts(), allowing it to determine paths for programs and shared objects automatically, allowing for specification of "libc.so.6" rather than the full path "/usr/lib64/libc.so.6". Patch 2 adds the "func_name" option to allow uprobe attach by name; the mechanics are described there. Having name-based support allows us to support auto-attach for uprobes; patch 3 adds auto-attach support while attempting to handle backwards-compatibility issues that arise. The format supported is u[ret]probe/binary_path:[raw_offset|function[+offset]] For example, to attach to libc malloc: SEC("uprobe//usr/lib64/libc.so.6:malloc") ..or, making use of the path computation mechanisms introduced in patch 1 SEC("uprobe/libc.so.6:malloc") Finally patch 4 add tests to the attach_probe selftests covering attach by name, with patch 5 covering skeleton auto-attach. Changes since v4 [1]: - replaced strtok_r() usage with copying segments from static char *; avoids unneeded string allocation (Andrii, patch 1) - switched to using access() instead of stat() when checking path-resolved binary (Andrii, patch 1) - removed computation of .plt offset for instrumenting shared library calls within binaries. Firstly it proved too brittle, and secondly it was somewhat unintuitive in that this form of instrumentation did not support function+offset as the "local function in binary" and "shared library function in shared library" cases did. We can still instrument library calls, just need to do it in the library .so (patch 2) - added binary path logging in cases where it was missing (Andrii, patch 2) - avoid strlen() calcuation in checking name match (Andrii, patch 2) - reword comments for func_name option (Andrii, patch 2) - tightened SEC() name validation to support "u[ret]probe" and fail on other permutations that do not support auto-attach (i.e. have u[ret]probe/binary_path:func format (Andrii, patch 3) - fixed selftests to fail independently rather than skip remainder on failure (Andrii, patches 4,5) Changes since v3 [2]: - reworked variable naming to fit better with libbpf conventions (Andrii, patch 2) - use quoted binary path in log messages (Andrii, patch 2) - added path determination mechanisms using LD_LIBRARY_PATH/PATH and standard locations (patch 1, Andrii) - changed section lookup to be type+name (if name is specified) to simplify use cases (patch 2, Andrii) - fixed .plt lookup scheme to match symbol table entries with .plt index via the .rela.plt table; also fix the incorrect assumption that the code in the .plt that does library linking is the same size as .plt entries (it just happens to be on x86_64) - aligned with pluggable section support such that uprobe SEC() names that do not conform to auto-attach format do not cause skeleton load failure (patch 3, Andrii) - no longer need to look up absolute path to libraries used by test_progs since we have mechanism to determine path automatically - replaced CHECK()s with ASSERT*()s for attach_probe test (Andrii, patch 4) - added auto-attach selftests also (Andrii, patch 5) Changes since RFC [3]: - used "long" for addresses instead of ssize_t (Andrii, patch 1). - used gelf_ interfaces to avoid assumptions about 64-bit binaries (Andrii, patch 1) - clarified string matching in symbol table lookups (Andrii, patch 1) - added support for specification of shared object functions in a non-shared object binary. This approach instruments the Procedure Linking Table (PLT) - malloc@PLT. - changed logic in symbol search to check dynamic symbol table first, then fall back to symbol table (Andrii, patch 1). - modified auto-attach string to require "/" separator prior to path prefix i.e. uprobe//path/to/binary (Andrii, patch 2) - modified auto-attach string to use ':' separator (Andrii, patch 2) - modified auto-attach to support raw offset (Andrii, patch 2) - modified skeleton attach to interpret -ESRCH errors as a non-fatal "unable to auto-attach" (Andrii suggested -EOPNOTSUPP but my concern was it might collide with other instances where that value is returned and reflects a failure to attach a to-be-expected attachment rather than skip a program that does not present an auto-attachable section name. Admittedly -EOPNOTSUPP seems a more natural value here). - moved library path retrieval code to trace_helpers (Andrii, patch 3) [1] https://lore.kernel.org/bpf/1647000658-16149-1-git-send-email-alan.maguire@oracle.com/ [2] https://lore.kernel.org/bpf/1643645554-28723-1-git-send-email-alan.maguire@oracle.com/ [3] https://lore.kernel.org/bpf/1642678950-19584-1-git-send-email-alan.maguire@oracle.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Alan Maguire authored
tests that verify auto-attach works for function entry/return for local functions in program and library functions in a library. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-6-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
add tests that verify attaching by name for 1. local functions in a program 2. library functions in a shared object ...succeed for uprobe and uretprobes using new "func_name" option for bpf_program__attach_uprobe_opts(). Also verify auto-attach works where uprobe, path to binary and function name are specified, but fails with -EOPNOTSUPP with a SEC name that does not specify binary path/function. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-5-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
Now that u[ret]probes can use name-based specification, it makes sense to add support for auto-attach based on SEC() definition. The format proposed is SEC("u[ret]probe/binary:[raw_offset|[function_name[+offset]]") For example, to trace malloc() in libc: SEC("uprobe/libc.so.6:malloc") ...or to trace function foo2 in /usr/bin/foo: SEC("uprobe//usr/bin/foo:foo2") Auto-attach is done for all tasks (pid -1). prog can be an absolute path or simply a program/library name; in the latter case, we use PATH/LD_LIBRARY_PATH to resolve the full path, falling back to standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if the file is not found via environment-variable specified locations. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
kprobe attach is name-based, using lookups of kallsyms to translate a function name to an address. Currently uprobe attach is done via an offset value as described in [1]. Extend uprobe opts for attach to include a function name which can then be converted into a uprobe-friendly offset. The calcualation is done in several steps: 1. First, determine the symbol address using libelf; this gives us the offset as reported by objdump 2. If the function is a shared library function - and the binary provided is a shared library - no further work is required; the address found is the required address 3. Finally, if the function is local, subtract the base address associated with the object, retrieved from ELF program headers. The resultant value is then added to the func_offset value passed in to specify the uprobe attach address. So specifying a func_offset of 0 along with a function name "printf" will attach to printf entry. The modes of operation supported are then 1. to attach to a local function in a binary; function "foo1" in "/usr/bin/foo" 2. to attach to a shared library function in a shared library - function "malloc" in libc. [1] https://www.kernel.org/doc/html/latest/trace/uprobetracer.htmlSigned-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-3-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
bpf_program__attach_uprobe_opts() requires a binary_path argument specifying binary to instrument. Supporting simply specifying "libc.so.6" or "foo" should be possible too. Library search checks LD_LIBRARY_PATH, then /usr/lib64, /usr/lib. This allows users to run BPF programs prefixed with LD_LIBRARY_PATH=/path2/lib while still searching standard locations. Similarly for non .so files, we check PATH and /usr/bin, /usr/sbin. Path determination will be useful for auto-attach of BPF uprobe programs using SEC() definition. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-2-git-send-email-alan.maguire@oracle.com
-
Lorenzo Bianconi authored
Rely on the libbpf skeleton facility and other utilities provided by XDP sample helpers in xdp_router_ipv4 sample. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/7f4d98ee2c13c04d5eb924eebf79ced32fee8418.1647414711.git.lorenzo@kernel.org
-
Haiyue Wang authored
The commit 8fd88691 ("bpf: Add BTF_KIND_FLOAT to uapi") has extended the BTF kind bitfield from 4 to 5 bits, correct the comment. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220403115327.205964-1-haiyue.wang@intel.com
-
Yuntao Wang authored
Currently, when we run test_progs with just executable file name, for example 'PATH=. test_progs-no_alu32', cd_flavor_subdir() will not check if test_progs is running as a flavored test runner and switch into corresponding sub-directory. This will cause test_progs-no_alu32 executed by the 'PATH=. test_progs-no_alu32' command to run in the wrong directory and load the wrong BPF objects. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220403135245.1713283-1-ytcoode@gmail.com
-
- 03 Apr, 2022 3 commits
-
-
Haowen Bai authored
Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck: ./tools/testing/selftests/bpf/progs/test_xdp_noinline.c:567:9-10: WARNING: return of 0/1 in function 'get_packet_dst' with return type bool ./tools/testing/selftests/bpf/progs/test_l4lb_noinline.c:221:9-10: WARNING: return of 0/1 in function 'get_packet_dst' with return type bool Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1648779354-14700-1-git-send-email-baihaowen@meizu.com
-
Nikolay Borisov authored
Since commit 6521f891 ("namei: prepare for idmapped mounts") vfs_link's prototype was changed, the kprobe definition in profiler selftest in turn wasn't updated. The result is that all argument after the first are now stored in different registers. This means that self-test has been broken ever since. Fix it by updating the kprobe definition accordingly. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220331140949.1410056-1-nborisov@suse.com
-
Jakob Koschel authored
To move the list iterator variable into the list_for_each_entry_*() macro in the future it should be avoided to use the list iterator variable after the loop body. To *never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use the found variable (existed & supported) and simply checking if the variable was set, can determine if the break/goto was hit. [1] https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220331091929.647057-1-jakobkoschel@gmail.com
-
- 01 Apr, 2022 4 commits
-
-
Yauheni Kaliuta authored
The test fails: # ./test_offload.py [...] Test bpftool bound info reporting (own ns)... FAIL: 3 BPF maps loaded, expected 2 File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 1177, in <module> check_dev_info(False, "") File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 645, in check_dev_info maps = bpftool_map_list(expected=2, ns=ns) File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 190, in bpftool_map_list fail(True, "%d BPF maps loaded, expected %d" % File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 86, in fail tb = "".join(traceback.extract_stack().format()) Some base maps do not have names and they cannot be added due to compatibility with older kernels, see [0]. So, just skip the unnamed maps. [0] https://lore.kernel.org/bpf/CAEf4BzY66WPKQbDe74AKZ6nFtZjq5e+G3Ji2egcVytB9R6_sGQ@mail.gmail.com/Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220329081100.9705-1-ykaliuta@redhat.com
-
Yuntao Wang authored
The attr->value_size is already assigned to smap->map.value_size in bpf_map_init_from_attr(), there is no need to do it again in stack_map_alloc(). Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/bpf/20220323073626.958652-1-ytcoode@gmail.com
-
Eyal Birger authored
Was never used in bpf_sk_assign_test(), and was removed from handle_{tcp,udp}() in commit 0b9ad56b ("selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test"). Fixes: 0b9ad56b ("selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220329154914.3718658-1-eyal.birger@gmail.com
-
Jiapeng Chong authored
Clean the following coccicheck warning: ./kernel/trace/bpf_trace.c:2263:34-35: WARNING opportunity for swap(). ./kernel/trace/bpf_trace.c:2264:40-41: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220322062149.109180-1-jiapeng.chong@linux.alibaba.com
-
- 31 Mar, 2022 6 commits
-
-
Xu Kuohai authored
Add test case to enusre that the caller and callee's fp offsets are correct during tail call (mainly asserting for arm64 JIT). Tested on both big-endian and little-endian arm64 qemu, result: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed] test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-6-xukuohai@huawei.com
-
Xu Kuohai authored
This patch adds tests to verify the behavior of BPF_LDX/BPF_STX + BPF_B/BPF_H/BPF_W/BPF_DW with negative offset, small positive offset, large positive offset, and misaligned offset. Tested on both big-endian and little-endian arm64 qemu, result: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]'] test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-5-xukuohai@huawei.com
-
Xu Kuohai authored
The BPF STX/LDX instruction uses offset relative to the FP to address stack space. Since the BPF_FP locates at the top of the frame, the offset is usually a negative number. However, arm64 str/ldr immediate instruction requires that offset be a positive number. Therefore, this patch tries to convert the offsets. The method is to find the negative offset furthest from the FP firstly. Then add it to the FP, calculate a bottom position, called FPB, and then adjust the offsets in other STR/LDX instructions relative to FPB. FPB is saved using the callee-saved register x27 of arm64 which is not used yet. Before adjusting the offset, the patch checks every instruction to ensure that the FP does not change in run-time. If the FP may change, no offset is adjusted. For example, for the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // #8 bc: ldr x2, [x19, x10] [...] With this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, #8] 40: str x0, [x27, #16] 44: str x0, [x27, #24] 48: str x0, [x27, #32] 4c: str x0, [x27, #40] 50: str x0, [x27, #48] 54: str x0, [x27, #56] 58: str x0, [x27, #64] 5c: str x0, [x27, #72] 60: str x0, [x27, #80] 64: str x0, [x27, #88] 68: str x0, [x27, #96] 6c: str x0, [x27, #104] 70: str x0, [x27, #112] 74: str x0, [x27, #120] 78: str x0, [x27, #128] 7c: ldr x2, [x19, #8] [...] Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com
-
Xu Kuohai authored
The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This patch introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = *(u64 *)(r1 + 0): without optimization: mov x10, 0 ldr x1, [x0, x10] with optimization: ldr x1, [x0, 0] If the offset is negative, or is not aligned correctly, or exceeds max value, rollback to the use of temporary register. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-3-xukuohai@huawei.com
-
Xu Kuohai authored
This patch introduces ldr/str with immediate offset support to simplify the JIT implementation of BPF LDX/STX instructions on arm64. Although arm64 ldr/str immediate is available in pre-index, post-index and unsigned offset forms, the unsigned offset form is sufficient for BPF, so this patch only adds this type. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-2-xukuohai@huawei.com
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull more networking updates from Jakub Kicinski: "Networking fixes and rethook patches. Features: - kprobes: rethook: x86: replace kretprobe trampoline with rethook Current release - regressions: - sfc: avoid null-deref on systems without NUMA awareness in the new queue sizing code Current release - new code bugs: - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when interface is down Previous releases - always broken: - openvswitch: correct neighbor discovery target mask field in the flow dump - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak - rxrpc: fix call timer start racing with call destruction - rxrpc: fix null-deref when security type is rxrpc_no_security - can: fix UAF bugs around echo skbs in multiple drivers Misc: - docs: move netdev-FAQ to the 'process' section of the documentation" * tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices openvswitch: Add recirc_id to recirc warning rxrpc: fix some null-ptr-deref bugs in server_key.c rxrpc: Fix call timer start racing with call destruction net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware net: hns3: fix the concurrency between functions reading debugfs docs: netdev: move the netdev-FAQ to the process pages docs: netdev: broaden the new vs old code formatting guidelines docs: netdev: call out the merge window in tag checking docs: netdev: add missing back ticks docs: netdev: make the testing requirement more stringent docs: netdev: add a question about re-posting frequency docs: netdev: rephrase the 'should I update patchwork' question docs: netdev: rephrase the 'Under review' question docs: netdev: shorten the name and mention msgid for patch status docs: netdev: note that RFC postings are allowed any time docs: netdev: turn the net-next closed into a Warning docs: netdev: move the patch marking section up docs: netdev: minor reword docs: netdev: replace references to old archives ...
-