- 11 Jul, 2022 8 commits
-
-
Xu Kuohai authored
This is arm64 version of commit fec56f58 ("bpf: Introduce BPF trampoline"). A bpf trampoline converts native calling convention to bpf calling convention and is used to implement various bpf features, such as fentry, fexit, fmod_ret and struct_ops. This patch does essentially the same thing that bpf trampoline does on x86. Tested on Raspberry Pi 4B and qemu: #18 /1 bpf_tcp_ca/dctcp:OK #18 /2 bpf_tcp_ca/cubic:OK #18 /3 bpf_tcp_ca/invalid_license:OK #18 /4 bpf_tcp_ca/dctcp_fallback:OK #18 /5 bpf_tcp_ca/rel_setsockopt:OK #18 bpf_tcp_ca:OK #51 /1 dummy_st_ops/dummy_st_ops_attach:OK #51 /2 dummy_st_ops/dummy_init_ret_value:OK #51 /3 dummy_st_ops/dummy_init_ptr_arg:OK #51 /4 dummy_st_ops/dummy_multiple_args:OK #51 dummy_st_ops:OK #57 /1 fexit_bpf2bpf/target_no_callees:OK #57 /2 fexit_bpf2bpf/target_yes_callees:OK #57 /3 fexit_bpf2bpf/func_replace:OK #57 /4 fexit_bpf2bpf/func_replace_verify:OK #57 /5 fexit_bpf2bpf/func_sockmap_update:OK #57 /6 fexit_bpf2bpf/func_replace_return_code:OK #57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK #57 /8 fexit_bpf2bpf/func_replace_multi:OK #57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK #57 fexit_bpf2bpf:OK #237 xdp_bpf2bpf:OK Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-5-xukuohai@huawei.com
-
Xu Kuohai authored
Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline can be patched with it. When the target address is NULL, the original instruction is patched to a NOP. When the target address and the source address are within the branch range, the original instruction is patched to a bl instruction to the target address directly. To support attaching bpf trampoline to both regular kernel function and bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two instructions are inserted at the beginning of bpf prog, the first one saves the return address to x9, and the second is a nop which will be patched to a bl instruction when a bpf trampoline is attached. However, when a bpf trampoline is attached to bpf prog, the distance between target address and source address may exceed 128MB, the maximum branch range, because bpf trampoline and bpf prog are allocated separately with vmalloc. So long jump should be handled. When a bpf prog is constructed, a plt pointing to empty trampoline dummy_tramp is placed at the end: bpf_prog: mov x9, lr nop // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target This is also the state when no trampoline is attached. When a short-jump bpf trampoline is attached, the patchsite is patched to a bl instruction to the trampoline directly: bpf_prog: mov x9, lr bl <short-jump bpf trampoline address> // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target When a long-jump bpf trampoline is attached, the plt target is filled with the trampoline address and the patchsite is patched to a bl instruction to the plt: bpf_prog: mov x9, lr bl plt // patchsite ... ret plt: ldr x10, target br x10 target: .quad <long-jump bpf trampoline address> dummy_tramp is used to prevent another CPU from jumping to an unknown location during the patching process, making the patching process easier. The patching process is as follows: 1. when neither the old address or the new address is a long jump, the patchsite is replaced with a bl to the new address, or nop if the new address is NULL; 2. when the old address is not long jump but the new one is, the branch target address is written to plt first, then the patchsite is replaced with a bl instruction to the plt; 3. when the old address is long jump but the new one is not, the address of dummy_tramp is written to plt first, then the patchsite is replaced with a bl to the new address, or a nop if the new address is NULL; 4. when both the old address and the new address are long jump, the new address is written to plt and the patchsite is not changed. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
-
Xu Kuohai authored
Add LDR (literal) instruction to load data from address relative to PC. This instruction will be used to implement long jump from bpf prog to bpf trampoline in the follow-up patch. The instruction encoding: 3 2 2 2 0 0 0 7 6 4 5 0 +-----+-------+---+-----+-------------------------------------+--------+ | 0 x | 0 1 1 | 0 | 0 0 | imm19 | Rt | +-----+-------+---+-----+-------------------------------------+--------+ for 32-bit, variant x == 0; for 64-bit, x == 1. branch_imm_common() is used to check the distance between pc and target address, since it's reused by this patch and LDR (literal) is not a branch instruction, rename it to label_imm_common(). Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
-
Xu Kuohai authored
Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags() to check the input flags. This check is architecture independent. So, to be consistent with x86, arm64 should also do this check before generating bpf trampoline. However, the BPF_TRAMP_F_XXX flags are not used by user code and the flags argument is almost constant at compile time, so this run time check is a bit redundant. Remove is_valid_bpf_tramp_flags() and add some comments to the usage of BPF_TRAMP_F_XXX flags, as suggested by Alexei. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com
-
Liu Jian authored
In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags + frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1), the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the judgment condition "msg_rx->sg.start==msg_rx->sg.end" and "i != msg_rx->sg.end" can not work. As a result, the processed msg cannot be deleted from ingress_msg list. But the length of all the sge of the msg has changed to 0. Then the next recvmsg syscall will process the msg repeatedly, because the length of sge is 0, the -EFAULT error is always returned. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com
-
Jilin Yuan authored
Delete the redundant word 'test'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jilin Yuan authored
Delete the redundant word 'driver'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
sewookseo authored
If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 09 Jul, 2022 22 commits
-
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Linus Walleij authored
If there is a MAC address specified in the device tree, then use it. This is already perfectly legal to specify in accordance with the generic ethernet-controller.yaml schema. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Walleij authored
If the firmware does not provide a MAC address to the driver, fall back to generating a random MAC address. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We want to kfree(table) if @table has been kmalloced, ie for non initial network namespace. Fixes: 849d5aa3 ("af_unix: Do not call kmemdup() for init_net's sysctl table.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Mat Martineau says: ==================== mptcp: Self test improvements and a header tweak Patch 1 moves a definition to a header so it can be used in a struct declaration. Patch 2 adjusts a time threshold for a selftest that runs much slower on debug kernels (and even more on slow CI infrastructure), to reduce spurious failures. Patches 3 & 4 improve userspace PM test coverage. Patches 5 & 6 clean up output from a test script and selftest helper tool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
The usage header of pm_nl_ctl command doesn't match with the context. So this patch adds the missing userspace PM keywords 'ann', 'rem', 'csf', 'dsf', 'events' and 'listen' in it. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
There're some 'Terminated' messages in the output of userspace pm tests script after killing './pm_nl_ctl events' processes: Created network namespaces ns1, ns2 [OK] ./userspace_pm.sh: line 166: 13735 Terminated ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1 ./userspace_pm.sh: line 172: 13737 Terminated ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1 Established IPv4 MPTCP Connection ns2 => ns1 [OK] ./userspace_pm.sh: line 166: 13753 Terminated ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1 ./userspace_pm.sh: line 172: 13755 Terminated ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1 Established IPv6 MPTCP Connection ns2 => ns1 [OK] ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token [OK] This patch adds a helper kill_wait(), in it using 'wait $pid 2>/dev/null' commands after 'kill $pid' to avoid printing out these Terminated messages. Use this helper instead of using 'kill $pid'. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds userspace pm subflow tests support for mptcp_join.sh script. Add userspace pm create subflow and destroy test cases in userspace_tests(). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds userspace pm tests support for mptcp_join.sh script. Add userspace pm add_addr and rm_addr test cases in userspace_tests(). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The mentioned test measures the transfer run-time to verify that the user-space program is able to use the full aggregate B/W. Even on (virtual) link-speed-bound tests, debug kernel can slow down the transfer enough to cause sporadic test failures. Instead of unconditionally raising the maximum allowed run-time, tweak when the running kernel is a debug one, and use some simple/ rough heuristic to guess such scenarios. Note: this intentionally avoids looking for /boot/config-<version> as the latter file is not always available in our reference CI environments. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to include/net/mptcp.h. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
bcm63xx_enetsw_driver and bcm63xx_enet_driver are only used in bcm63xx_enet.c now, change them to static. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220707135801.1483941-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
When we are operating in SGMII inband mode, it implies that there is a PHY connected, and the ethtool advertisement for autoneg applies to the PHY, not the SGMII link. When in 1000base-X mode, then this applies to the 802.3z link and needs to be applied to the PCS. Fix this. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1o9Ng2-005Qbe-3H@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Antoine Tenart authored
A description is missing for the net.core.high_order_alloc_disable option in admin-guide/sysctl/net.rst ; add it. The above sysctl option was introduced by commit ce27ec60 ("net: add high_order_alloc_disable sysctl/static key"). Thanks to Eric for running again the benchmark cited in the above commit, showing this knob is now mostly of historical importance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Justin Stitt authored
When building with Clang we encounter this warning: | net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short' | but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat] | _leave(" = %d [set %hx]", ret, y); y is a u32 but the format specifier is `%hx`. Going from unsigned int to short int results in a loss of data. This is surely not intended behavior. If it is intended, the warning should be suppressed through other means. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== tls: pad strparser, internal header, decrypt_ctx etc. A grab bag of non-functional refactoring to make the series which will let us decrypt into a fresh skb smaller. Patches in this series are not strictly required to get the decryption into a fresh skb going, they are more in the "things which had been annoying me for a while" category. ==================== Link: https://lore.kernel.org/r/20220708010314.1451462-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
tls_wait_data() sets the return code as an output parameter and always returns ctx->recv_pkt on success. Return the error code directly and let the caller read the skb from the context. Use positive return code to indicate ctx->recv_pkt is ready. While touching the definition of the function rename it. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jump to the free() call, instead of having to remember to free the memory in multiple places. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
The max size of iv + aad + tail is 22B. That's smaller than a single sg entry (32B). Don't bother with the memory packing, just create a struct which holds the max size of those members. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
sk_skb_cb lives within skb->cb[]. skb->cb[] straddles 2 cache lines, each containing 24B of data. The first cache line does not contain much interesting information for users of strparser, so pad things a little. Previously strp_msg->full_len would live in the first cache line and strp_msg->offset in the second. We need to reorder the 8 byte temp_reg with struct tls_msg to prevent a 4B hole which would push the struct over 48B. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 08 Jul, 2022 10 commits
-
-
Maxim Mikityanskiy authored
When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member BPF_F_CURRENT_NETNS are not exposed. This commit allows building the xdp_synproxy selftest in such cases. Note that nf_conntrack must be loaded before running the test if it's compiled as a module. This commit also allows this selftest to be successfully compiled when CONFIG_NF_CONNTRACK is disabled. One unused local variable of type struct bpf_ct_opts is also removed. Fixes: fb5cd0ce ("selftests/bpf: Add selftests for raw syncookie helpers") Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
-
Daniel Müller authored
This change addresses a comment made earlier [0] about a missing return of an error when __bpf_core_types_match is invoked from bpf_core_composites_match, which could have let to us erroneously ignoring errors. Regarding the typedef name check pointed out in the same context, it is not actually an issue, because callers of the function perform a name check for the root type anyway. To make that more obvious, let's add comments to the function (similar to what we have for bpf_core_types_are_compat, which is called in pretty much the same context). [0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
-
James Hilliard authored
It seems the gcc preprocessor breaks with pragmas when surrounding __attribute__. Disable these pragmas on GCC due to upstream bugs see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400 Fixes errors like: error: expected identifier or '(' before '#pragma' 106 | SEC("cgroup/bind6") | ^~~ error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma' 114 | char _license[] SEC("license") = "GPL"; | ^~~ Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com
-
Stanislav Fomichev authored
Syzkaller reports the following crash: RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline] RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline] RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610 With the following reproducer: bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80) Because we don't enforce expected_attach_type for XDP programs, we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP' part in check_return_code and follow up with testing `prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto` is NULL. Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that attach ..." condition. Also, don't skip return code check for LSM/STRUCT_OPS. The above actually brings an issue with existing selftest which tries to return EPERM from void inet_csk_clone. Fix the test (and move called_socket_clone to make sure it's not incremented in case of an error) and add a new one to explicitly verify this condition. Fixes: 69fd337a ("bpf: per-cgroup lsm flavor") Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
-
Sieng-Piaw Liew authored
napi_build_skb() reuses NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. Use napi_consume_skb() to feed the cache with skbuff_heads of completed Tx, so it's never empty. The budget parameter is added to indicate NAPI context, as a value of zero can be passed in the case of netpoll. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP allocates 'fast clones' skbs for packets in tx queues. Currently, __alloc_skb() initializes the companion fclone field to SKB_FCLONE_CLONE, and leaves other fields untouched. It makes sense to defer this init much later in skb_clone(), because all fclone fields are copied and hot in cpu caches at that time. This removes one cache line miss in __alloc_skb(), cost seen on an host with 256 cpus all competing on memory accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Kelam authored
Current implementation is such that driver first resets the existing PFC config before applying new pfc configuration. This creates a problem like once PF or VFs requests PFC config previous pfc config by other PFVfs is getting reset. This patch fixes the problem by removing unnecessary resetting of PFC config. Also configure Pause quanta value to smaller as current value is too high. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Müller authored
This change adds a type based test involving the restrict type qualifier to the BPF selftests. On the btfgen path, this will verify that bpftool correctly handles the corresponding RESTRICT BTF kind. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220706212855.1700615-3-deso@posteo.net
-
Daniel Müller authored
This change adjusts bpftool's type marking logic, as used in conjunction with TYPE_EXISTS relocations, to correctly recognize and handle the RESTRICT BTF kind. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220623212205.2805002-1-deso@posteo.net/T/#m4c75205145701762a4b398e0cdb911d5b5305ffc Link: https://lore.kernel.org/bpf/20220706212855.1700615-2-deso@posteo.net
-
Maciej Fijalkowski authored
Lukas reported that after commit f3660063 ("libbpf: move xsk.{c,h} into selftests/bpf") MAINTAINERS file needed an update. In the meantime, Magnus removed AF_XDP samples in commit cfb5a2db ("bpf, samples: Remove AF_XDP samples"), but selftests part still misses its entry in MAINTAINERS. Now that xdpxceiver became xskxceiver, tools/testing/selftests/bpf/*xsk* will match all of the files related to AF_XDP testing (test_xsk.sh, xskxceiver, xsk_prereqs.sh, xsk.{c,h}). Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707111613.49031-3-maciej.fijalkowski@intel.com
-