- 08 Oct, 2018 15 commits
-
-
Quentin Monnet authored
Now that there is at least one driver supporting BPF-to-BPF function calls, lift the restriction, in the verifier, on hardware offload of eBPF programs containing such calls. But prevent jit_subprogs(), still in the verifier, from being run for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
Mark instructions that use pointers to areas in the stack outside of the current stack frame, and process them accordingly in mem_op_stack(). This way, we also support BPF-to-BPF calls where the caller passes a pointer to data in its own stack frame to the callee (typically, when the caller passes an address to one of its local variables located in the stack, as an argument). Thanks to Jakub and Jiong for figuring out how to deal with this case, I just had to turn their email discussion into this patch. Suggested-by: Jiong Wang <jiong.wang@netronome.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
When pre-processing the instructions, it is trivial to detect what subprograms are using R6, R7, R8 or R9 as destination registers. If a subprogram uses none of those, then we do not need to jump to the subroutines dedicated to saving and restoring callee-saved registers in its prologue and epilogue. This patch introduces detection of callee-saved registers in subprograms and prevents the JIT from adding calls to those subroutines whenever we can: we save some instructions in the translated program, and some time at runtime on BPF-to-BPF calls and returns. If no subprogram needs to save those registers, we can avoid appending the subroutines at the end of the program. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
On performing a BPF-to-BPF call, we first jump to a subroutine that pushes callee-saved registers (R6~R9) to the stack, and from there we goes to the start of the callee next. In order to do so, the caller must pass to the subroutine the address of the NFP instruction to jump to at the end of that subroutine. This cannot be reliably implemented when translated the caller, as we do not always know the start offset of the callee yet. This patch implement the required fixup step for passing the start offset in the callee via the register used by the subroutine to hold its return address. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
Relocation for targets of BPF-to-BPF calls are required at the end of translation. Update the nfp_fixup_branches() function in that regard. When checking that the last instruction of each bloc is a branch, we must account for the length of the instructions required to pop the return address from the stack. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
Offloaded programs using BPF-to-BPF calls use the stack to store the return address when calling into a subprogram. Callees also need some space to save eBPF registers R6 to R9. And contrarily to kernel verifier, we align stack frames on 64 bytes (and not 32). Account for all this when checking the stack size limit before JIT-ing the program. This means we have to recompute maximum stack usage for the program, we cannot get the value from the kernel. In addition to adapting the checks on stack usage, move them to the finalize() callback, now that we have it and because such checks are part of the verification step rather than translation. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
This is the main patch for the logics of BPF-to-BPF calls in the nfp driver. The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were used to call helpers and exit from the program, respectively; make them usable for calling into, or returning from, a BPF subprogram as well. For all calls, push the return address as well as the callee-saved registers (R6 to R9) to the stack, and pop them upon returning from the calls. In order to limit the overhead in terms of instruction number, this is done through dedicated subroutines. Jumping to the callee actually consists in jumping to the subroutine, that "returns" to the callee: this will require some fixup for passing the address in a later patch. Similarly, returning consists in jumping to the subroutine, which pops registers and then return directly to the caller (but no fixup is needed here). Return to the caller is performed with the RTN instruction newly added to the JIT. For the few steps where we need to know what subprogram an instruction belongs to, the struct nfp_insn_meta is extended with a new subprog_idx field. Note that checks on the available stack size, to take into account the additional requirements associated to BPF-to-BPF calls (storing R6-R9 and return addresses), are added in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will require additional processing before translation starts, in order to record and mark jump destinations. We also mark the instructions where each subprogram begins. This will be used in a following commit to determine where to add prologues for subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
The checks related to eBPF helper calls are performed each time the nfp driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks are not relevant for BPF-to-BPF call (same instruction code, different value in source register), so just skip the checks for such calls. While at it, rename the function that runs those checks to make it clear they apply to _helper_ calls only. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
In order to support BPF-to-BPF calls in offloaded programs, the nfp driver must collect information about the distinct subprograms: namely, the number of subprograms composing the complete program and the stack depth of those subprograms. The latter in particular is non-trivial to collect, so we copy those elements from the kernel verifier via the newly added post-verification hook. The struct nfp_prog is extended to store this information. Stack depths are stored in an array of dedicated structs. Subprogram start indexes are not collected. Instead, meta instructions associated to the start of a subprogram will be marked with a flag in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
In preparation for support for BPF to BPF calls in offloaded programs, rename the "stack_depth" field of the struct nfp_prog as "stack_frame_depth". This is to make it clear that the field refers to the maximum size of the current stack frame (as opposed to the maximum size of the whole stack memory). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Quentin Monnet authored
In preparation for BPF-to-BPF calls in offloaded programs, add a new function attribute to the struct bpf_prog_offload_ops so that drivers supporting eBPF offload can hook at the end of program verification, and potentially extract information collected by the verifier. Implement a minimal callback (returning 0) in the drivers providing the structs, namely netdevsim and nfp. This will be useful in the nfp driver, in later commits, to extract the number of subprograms as well as the stack depth for those subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Arthur Fabre authored
bpf_asm and the other classic BPF tools support jump conditions comparing register A to register X, in addition to comparing register A with constant K. Only the latter was documented in filter.txt, add two new addressing modes that describe the former. Signed-off-by: Arthur Fabre <arthur@arthurfabre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Alexei Starovoitov authored
libbpf is maturing as a library and gaining features that no other bpf libraries support (BPF Type Format, bpf to bpf calls, etc) Many Apache2 licensed projects (like bcc, bpftrace, gobpf, cilium, etc) would like to use libbpf, but cannot do this yet, since Apache Foundation explicitly states that LGPL is incompatible with Apache2. Hence let's relicense libbpf as dual license LGPL-2.1 or BSD-2-Clause, since BSD-2 is compatible with Apache2. Dual LGPL or Apache2 is invalid combination. Fix license mistake in Makefile as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Beckett <david.beckett@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Wang Nan <wangnan0@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Björn Töpel authored
The AF_XDP socket struct can exist in three different, implicit states: setup, bound and released. Setup is prior the socket has been bound to a device. Bound is when the socket is active for receive and send. Released is when the process/userspace side of the socket is released, but the sock object is still lingering, e.g. when there is a reference to the socket in an XSKMAP after process termination. The Rx fast-path code uses the "dev" member of struct xdp_sock to check whether a socket is bound or relased, and the Tx code uses the struct xdp_umem "xsk_list" member in conjunction with "dev" to determine the state of a socket. However, the transition from bound to released did not tear the socket down in correct order. On the Rx side "dev" was cleared after synchronize_net() making the synchronization useless. On the Tx side, the internal queues were destroyed prior removing them from the "xsk_list". This commit corrects the cleanup order, and by doing so xdp_del_sk_umem() can be simplified and one synchronize_net() can be removed. Fixes: 965a9909 ("xsk: add support for bind for Rx") Fixes: ac98d8aa ("xsk: wire upp Tx zero-copy functions") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 05 Oct, 2018 6 commits
-
-
Daniel Borkmann authored
Magnus Karlsson says: ==================== Previously, the xsk code did not record which umem was bound to a specific queue id. This was not required if all drivers were zero-copy enabled as this had to be recorded in the driver anyway. So if a user tried to bind two umems to the same queue, the driver would say no. But if copy-mode was first enabled and then zero-copy mode (or the reverse order), we mistakenly enabled both of them on the same umem leading to buggy behavior. The main culprit for this is that we did not store the association of umem to queue id in the copy case and only relied on the driver reporting this. As this relation was not stored in the driver for copy mode (it does not rely on the AF_XDP NDOs), this obviously could not work. This patch fixes the problem by always recording the umem to queue id relationship in the netdev_queue and netdev_rx_queue structs. This way we always know what kind of umem has been bound to a queue id and can act appropriately at bind time. To make the bind semantics consistent with ethtool queue manipulations and to facilitate the implementation of drivers, we also forbid decreasing the number of queues/channels with ethtool if there is an active AF_XDP socket in the set of queues that are disabled. Jakub, please take a look at your patches. The last one I had to change slightly to make it fit with the new interface xdp_get_umem_from_qid(). An added bonus with this function is that we, in the future, can also use it from the driver to get a umem, thus simplifying driver implementations (and later remove the umem from the NDO completely). Björn will mail patches, at a later point in time, using this in the i40e and ixgbe drivers, that removes a good chunk of code from the ZC implementations. I also made your code aware of Tx queues. If we create a socket that only has a Tx queue, then the queue id will refer to a Tx queue id only and could be larger than the available amount of Rx queues. Please take a look at it. Differences against v1: * Included patches from Jakub that forbids decreasing the number of active queues if a queue to be deactivated has an AF_XDP socket. These have been adapted somewhat to the new interfaces in patch 2. * Removed redundant check against real_num_[rt]x_queue in xsk_bind * Only need to test against real_num_[rt]x_queues in xdp_clear_umem_at_qid. Patch 1: Introduces a umem reference in the netdev_rx_queue and netdev_queue structs. Patch 2: Records which queue_id is bound to which umem and make sure that you cannot bind two different umems to the same queue_id. Patch 3: Pre patch to ethtool_set_channels. Patch 4: Forbid decreasing the number of active queues if a deactivated queue has an AF_XDP socket. Patch 5: Simplify xdp_clear_umem_at_qid now when ethtool cannot deactivate the queue id we are running on. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Magnus Karlsson authored
As we now do not allow ethtool to deactivate the queue id we are running an AF_XDP socket on, we can simplify the implementation of xdp_clear_umem_at_qid(). Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
We already check the RSS indirection table does not use queues which would be disabled by channel reconfiguration. Make sure user does not try to disable queues which have a UMEM and zero-copy AF_XDP socket installed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Jakub Kicinski authored
ethtool_set_channels() validates the config against driver's max settings. It retrieves the current config and stores it in a variable called max. This was okay when only max settings were accessed but we will soon want to access current settings as well, so calling the entire structure max makes the code less readable. While at it drop unnecessary parenthesis. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Magnus Karlsson authored
Previously, the xsk code did not record which umem was bound to a specific queue id. This was not required if all drivers were zero-copy enabled as this had to be recorded in the driver anyway. So if a user tried to bind two umems to the same queue, the driver would say no. But if copy-mode was first enabled and then zero-copy mode (or the reverse order), we mistakenly enabled both of them on the same umem leading to buggy behavior. The main culprit for this is that we did not store the association of umem to queue id in the copy case and only relied on the driver reporting this. As this relation was not stored in the driver for copy mode (it does not rely on the AF_XDP NDOs), this obviously could not work. This patch fixes the problem by always recording the umem to queue id relationship in the netdev_queue and netdev_rx_queue structs. This way we always know what kind of umem has been bound to a queue id and can act appropriately at bind time. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Magnus Karlsson authored
These references to the umem will be used to store information on what kind of AF_XDP umem that is bound to a queue id, if any. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 04 Oct, 2018 10 commits
-
-
Konrad Djimeli authored
Fix a simple typo: Completetion -> Completion Signed-off-by: Konrad Djimeli <kdjimeli@igalia.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Bo YU authored
There is a warning when compiling bpf sample programs in sample/bpf: make -C /home/foo/bpf/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/home/foo/bpf/samples/bpf/../../ O= HOSTCC /home/foo/bpf/samples/bpf/tracex3_user.o /home/foo/bpf/samples/bpf/tracex3_user.c:20:0: warning: "ARRAY_SIZE" redefined #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) In file included from /home/foo/bpf/samples/bpf/tracex3_user.c:18:0: ./tools/testing/selftests/bpf/bpf_util.h:48:0: note: this is the location of the previous definition # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) Signed-off-by: Bo YU <tsu.yubo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Daniel Borkmann authored
Andrey Ignatov says: ==================== This patch set renames a few interfaces in libbpf, mostly netlink related, so that all symbols provided by the library have only three possible prefixes: % nm -D tools/lib/bpf/libbpf.so | \ awk '$2 == "T" {sub(/[_\(].*/, "", $3); if ($3) print $3}' | \ sort | \ uniq -c 91 bpf 8 btf 14 libbpf libbpf is used more and more outside kernel tree. That means the library should follow good practices in library design and implementation to play well with third party code that uses it. One of such practices is to have a common prefix (or a few) for every interface, function or data structure, library provides. It helps to avoid name conflicts with other libraries and keeps API/ABI consistent. Inconsistent names in libbpf already cause problems in real life. E.g. an application can't use both libbpf and libnl due to conflicting symbols (specifically nla_parse, nla_parse_nested and a few others). Some of problematic global symbols are not part of ABI and can be restricted from export with either visibility attribute/pragma or export map (what is useful by itself and can be done in addition). That won't solve the problem for those that are part of ABI though. Also export restrictions would help only in DSO case. If third party application links libbpf statically it won't help, and people do it (e.g. Facebook links most of libraries statically, including libbpf). libbpf already uses the following prefixes for its interfaces: * bpf_ for bpf system call wrappers, program/map/elf-object abstractions and a few other things; * btf_ for BTF related API; * libbpf_ for everything else. The patch adds libbpf_ prefix to interfaces that use none of mentioned above prefixes and don't fit well into the first two categories. Long term benefits of having common prefix should outweigh possible inconvenience of changing API for those functions now. Patches 2-4 add libbpf_ prefix to libbpf interfaces: separate patch per header. Other patches are simple improvements in API. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
Make bpf_program__load consistent with other interfaces: use __u32 instead of u32. That in turn fixes build of samples: In file included from ./samples/bpf/trace_output_user.c:21:0: ./tools/lib/bpf/libbpf.h:132:9: error: unknown type name ‘u32’ u32 kern_version); ^ Fixes: commit 29cd77f4 ("libbpf: Support loading individual progs") Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
Rename include guards to have consistent names "__LIBBPF_<header_name>". Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
libbpf is used more and more outside kernel tree. That means the library should follow good practices in library design and implementation to play well with third party code that uses it. One of such practices is to have a common prefix (or a few) for every interface, function or data structure, library provides. I helps to avoid name conflicts with other libraries and keeps API consistent. Inconsistent names in libbpf already cause problems in real life. E.g. an application can't use both libbpf and libnl due to conflicting symbols. Having common prefix will help to fix current and avoid future problems. libbpf already uses the following prefixes for its interfaces: * bpf_ for bpf system call wrappers, program/map/elf-object abstractions and a few other things; * btf_ for BTF related API; * libbpf_ for everything else. The patch renames function in str_error.h to have libbpf_ prefix since it misses one and doesn't fit well into the first two categories. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
libbpf is used more and more outside kernel tree. That means the library should follow good practices in library design and implementation to play well with third party code that uses it. One of such practices is to have a common prefix (or a few) for every interface, function or data structure, library provides. I helps to avoid name conflicts with other libraries and keeps API consistent. Inconsistent names in libbpf already cause problems in real life. E.g. an application can't use both libbpf and libnl due to conflicting symbols. Having common prefix will help to fix current and avoid future problems. libbpf already uses the following prefixes for its interfaces: * bpf_ for bpf system call wrappers, program/map/elf-object abstractions and a few other things; * btf_ for BTF related API; * libbpf_ for everything else. The patch adds libbpf_ prefix to interfaces in nlattr.h that use none of mentioned above prefixes and doesn't fit well into the first two categories. Since affected part of API is used in bpftool, the patch applies corresponding change to bpftool as well. Having it in a separate patch will cause a state of tree where bpftool is broken what may not be a good idea. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
libbpf is used more and more outside kernel tree. That means the library should follow good practices in library design and implementation to play well with third party code that uses it. One of such practices is to have a common prefix (or a few) for every interface, function or data structure, library provides. I helps to avoid name conflicts with other libraries and keeps API consistent. Inconsistent names in libbpf already cause problems in real life. E.g. an application can't use both libbpf and libnl due to conflicting symbols. Having common prefix will help to fix current and avoid future problems. libbpf already uses the following prefixes for its interfaces: * bpf_ for bpf system call wrappers, program/map/elf-object abstractions and a few other things; * btf_ for BTF related API; * libbpf_ for everything else. The patch adds libbpf_ prefix to functions and typedef in libbpf.h that use none of mentioned above prefixes and doesn't fit well into the first two categories. Since affected part of API is used in bpftool, the patch applies corresponding change to bpftool as well. Having it in a separate patch will cause a state of tree where bpftool is broken what may not be a good idea. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrey Ignatov authored
This typedef is used only by implementation in netlink.c. Nothing uses it in public API. Move it to netlink.c. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Stephen Rothwell reports the following link failure with IPv6 as module: x86_64-linux-gnu-ld: net/core/filter.o: in function `sk_lookup': (.text+0x19219): undefined reference to `__udp6_lib_lookup' Fix the build by only enabling the IPv6 socket lookup if IPv6 support is compiled into the kernel. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 03 Oct, 2018 9 commits
-
-
Daniel Borkmann authored
Joe Stringer says: ==================== This series proposes a new helper for the BPF API which allows BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. The series is structured roughly into: * Misc refactor * Add the socket pointer type * Add reference tracking to ensure that socket references are freed * Extend the BPF API to add sk_lookup_xxx() / sk_release() functions * Add tests/documentation The helper proposed in this series includes a parameter for a tuple which must be filled in by the caller to determine the socket to look up. The simplest case would be filling with the contents of the packet, ie mapping the packet's 5-tuple into the parameter. In common cases, it may alternatively be useful to reverse the direction of the tuple and perform a lookup, to find the socket that initiates this connection; and if the BPF program ever performs a form of IP address translation, it may further be useful to be able to look up arbitrary tuples that are not based upon the packet, but instead based on state held in BPF maps or hardcoded in the BPF program. Currently, access into the socket's fields are limited to those which are otherwise already accessible, and are restricted to read-only access. Changes since v3: * New patch: "bpf: Reuse canonical string formatter for ctx errs" * Add PTR_TO_SOCKET to is_ctx_reg(). * Add a few new checks to prevent mixing of socket/non-socket pointers. * Swap order of checks in sock_filter_is_valid_access(). * Prefix register spill macros with "bpf_". * Add acks from previous round * Rebase Changes since v2: * New patch: "selftests/bpf: Generalize dummy program types". This enables adding verifier tests for socket lookup with tail calls. * Define the semantics of the new helpers more clearly in uAPI header. * Fix release of caller_net when netns is not specified. * Use skb->sk to find caller net when skb->dev is unavailable. * Fix build with !CONFIG_NET. * Replace ptr_id defensive coding when releasing reference state with an internal error (-EFAULT). * Remove flags argument to sk_release(). * Add several new assembly tests suggested by Daniel. * Add a few new C tests. * Fix typo in verifier error message. Changes since v1: * Limit netns_id field to 32 bits * Reuse reg_type_mismatch() in more places * Reduce the number of passes at convert_ctx_access() * Replace ptr_id defensive coding when releasing reference state with an internal error (-EFAULT) * Rework 'struct bpf_sock_tuple' to allow passing a packet pointer * Allow direct packet access from helper * Fix compile error with CONFIG_IPV6 enabled * Improve commit messages Changes since RFC: * Split up sk_lookup() into sk_lookup_tcp(), sk_lookup_udp(). * Only take references on the socket when necessary. * Make sk_release() only free the socket reference in this case. * Fix some runtime reference leaks: * Disallow BPF_LD_[ABS|IND] instructions while holding a reference. * Disallow bpf_tail_call() while holding a reference. * Prevent the same instruction being used for reference and other pointer type. * Simplify locating copies of a reference during helper calls by caching the pointer id from the caller. * Fix kbuild compilation warnings with particular configs. * Improve code comments describing the new verifier pieces. * Tested by Nitin ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Document the new pointer types in the verifier and how the pointer ID tracking works to ensure that references which are taken are later released. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Add some tests that demonstrate and test the balanced lookup/free nature of socket lookup. Section names that start with "fail" represent programs that are expected to fail verification; all others should succeed. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Allow the individual program load to be invoked. This will help with testing, where a single ELF may contain several sections, some of which denote subprograms that are expected to fail verification, along with some which are expected to pass verification. By allowing programs to be iterated and individually loaded, each program can be independently checked against its expected verification result. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
reference tracking: leak potential reference reference tracking: leak potential reference on stack reference tracking: leak potential reference on stack 2 reference tracking: zero potential reference reference tracking: copy and zero potential references reference tracking: release reference without check reference tracking: release reference reference tracking: release reference twice reference tracking: release reference twice inside branch reference tracking: alloc, check, free in one subbranch reference tracking: alloc, check, free in both subbranches reference tracking in call: free reference in subprog reference tracking in call: free reference in subprog and outside reference tracking in call: alloc & leak reference in subprog reference tracking in call: alloc in subprog, release outside reference tracking in call: sk_ptr leak into caller stack reference tracking in call: sk_ptr spill into caller stack reference tracking: allow LD_ABS reference tracking: forbid LD_ABS while holding reference reference tracking: allow LD_IND reference tracking: forbid LD_IND while holding reference reference tracking: check reference or tail call reference tracking: release reference then tail call reference tracking: leak possible reference over tail call reference tracking: leak checked reference over tail call reference tracking: mangle and release sock_or_null reference tracking: mangle and release sock reference tracking: access member reference tracking: write to member reference tracking: invalid 64-bit access of member reference tracking: access after release reference tracking: direct access for lookup unpriv: spill/fill of different pointers stx - ctx and sock unpriv: spill/fill of different pointers stx - leak sock unpriv: spill/fill of different pointers stx - sock and ctx (read) unpriv: spill/fill of different pointers stx - sock and ctx (write) Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Don't hardcode the dummy program types to SOCKET_FILTER type, as this prevents testing bpf_tail_call in conjunction with other program types. Instead, use the program type specified in the test case. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and bpf_sk_lookup_udp() which allows BPF programs to find out if there is a socket listening on this host, and returns a socket pointer which the BPF program can then access to determine, for instance, whether to forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the socket, so when a BPF program makes use of this function, it must subsequently pass the returned pointer into the newly added sk_release() to return the reference. By way of example, the following pseudocode would filter inbound connections at XDP if there is no corresponding service listening for the traffic: struct bpf_sock_tuple tuple; struct bpf_sock_ops *sk; populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); if (!sk) { // Couldn't find a socket listening for this traffic. Drop. return TC_ACT_SHOT; } bpf_sk_release(sk, 0); return TC_ACT_OK; Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
Allow helper functions to acquire a reference and return it into a register. Specific pointer types such as the PTR_TO_SOCKET will implicitly represent such a reference. The verifier must ensure that these references are released exactly once in each path through the program. To achieve this, this commit assigns an id to the pointer and tracks it in the 'bpf_func_state', then when the function or program exits, verifies that all of the acquired references have been freed. When the pointer is passed to a function that frees the reference, it is removed from the 'bpf_func_state` and all existing copies of the pointer in registers are marked invalid. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joe Stringer authored
An upcoming commit will need very similar copy/realloc boilerplate, so refactor the existing stack copy/realloc functions into macros to simplify it. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-