- 05 May, 2023 12 commits
-
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== As more and more real-world BPF programs become more complex and increasingly use subprograms (both static and global), scalar precision tracking and its (previously weak) support for BPF subprograms (and callbacks as a special case of that) is becoming more and more of an issue and limitation. Couple that with increasing reliance on state equivalence (BPF open-coded iterators have a hard requirement for state equivalence to converge and successfully validate loops), and it becomes pretty critical to address this limitation and make precision tracking universally supported for BPF programs of any complexity and composition. This patch set teaches BPF verifier to support SCALAR precision backpropagation across multiple frames (for subprogram calls and callback simulations) and addresses most practical situations (SCALAR stack loads/stores using registers other than r10 being the last remaining limitation, though thankfully rarely used in practice). Main logic is explained in details in patch #8. The rest are preliminary preparations, refactorings, clean ups, and fixes. See respective patches for details. Patch #8 has also veristat comparison of results for selftests, Cilium, and some of Meta production BPF programs before and after these changes. v2->v3: - drop bitcnt and ifs from bt_xxx() helpers (Alexei); v1->v2: - addressed review feedback form Alexei, adjusted commit messages, comments, added verbose(), WARN_ONCE(), etc; - re-ran all the tests and veristat on selftests, cilium, and meta-internal code: no new changes and no kernel warnings. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Now that precision propagation is supported fully in the presence of subprogs, there is no need to work around iter test. Revert original workaround. This reverts be7dbd27 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests"). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add a bunch of tests validating verifier's precision backpropagation logic in the presence of subprog calls and/or callback-calling helpers/kfuncs. We validate the following conditions: - subprog_result_precise: static subprog r0 result precision handling; - global_subprog_result_precise: global subprog r0 precision shortcutting, similar to BPF helper handling; - callback_result_precise: similarly r0 marking precise for callback-calling helpers; - parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global: propagation of precision for callee-saved registers bypassing static/global subprogs; - parent_callee_saved_reg_precise_with_callback: same as above, but in the presence of callback-calling helper; - parent_stack_slot_precise, parent_stack_slot_precise_global: similar to above, but instead propagating precision of stack slot (spilled SCALAR reg); - parent_stack_slot_precise_with_callback: same as above, but in the presence of callback-calling helper; - subprog_arg_precise: propagation of precision of static subprog's input argument back to caller; - subprog_spill_into_parent_stack_slot_precise: negative test validating that verifier currently can't support backtracking of stack access with non-r10 register, we validate that we fallback to forcing precision for all SCALARs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add support precision backtracking in the presence of subprogram frames in jump history. This means supporting a few different kinds of subprogram invocation situations, all requiring a slightly different handling in precision backtracking handling logic: - static subprogram calls; - global subprogram calls; - callback-calling helpers/kfuncs. For each of those we need to handle a few precision propagation cases: - what to do with precision of subprog returns (r0); - what to do with precision of input arguments; - for all of them callee-saved registers in caller function should be propagated ignoring subprog/callback part of jump history. N.B. Async callback-calling helpers (currently only bpf_timer_set_callback()) are transparent to all this because they set a separate async callback environment and thus callback's history is not shared with main program's history. So as far as all the changes in this commit goes, such helper is just a regular helper. Let's look at all these situation in more details. Let's start with static subprogram being called, using an exxerpt of a simple main program and its static subprog, indenting subprog's frame slightly to make everything clear. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; fr0: r1, r6 22: r0 = r1; fr0: r6; fr1: r1 23: exit fr0: r6; fr1: r0 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6 15: exit As can be seen above main function is passing 123 as single argument to an identity (`return x;`) subprog. Returned value is used to adjust map pointer offset, which forces r0 to be marked as precise. Then instruction #14 does the same for callee-saved r6, which will have to be backtracked all the way to instruction #9. For brevity, precision sets for instruction #13 and #14 are combined in the diagram above. First, for subprog calls, r0 returned from subprog (in frame 0) has to go into subprog's frame 1, and should be cleared from frame 0. So we go back into subprog's frame knowing we need to mark r0 precise. We then see that insn #22 sets r0 from r1, so now we care about marking r1 precise. When we pop up from subprog's frame back into caller at insn #11 we keep r1, as it's an argument-passing register, so we eventually find `10: r1 = 123;` and satify precision propagation chain for insn #13. This example demonstrates two sets of rules: - r0 returned after subprog call has to be moved into subprog's r0 set; - *static* subprog arguments (r1-r5) are moved back to caller precision set. Let's look at what happens with callee-saved precision propagation. Insn #14 mark r6 as precise. When we get into subprog's frame, we keep r6 in frame 0's precision set *only*. Subprog itself has its own set of independent r6-r10 registers and is not affected. When we eventually made our way out of subprog frame we keep r6 in precision set until we reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is perhaps the simplest aspect, it always stays in its original frame. That's pretty much all we have to do to support precision propagation across *static subprog* invocation. Let's look at what happens when we have global subprog invocation. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; # global subprog fr0: r6 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Starting from insn #13, r0 has to be precise. We backtrack all the way to insn #11 (call pc+10) and see that subprog is global, so was already validated in isolation. As opposed to static subprog, global subprog always returns unknown scalar r0, so that satisfies precision propagation and we drop r0 from precision set. We are done for insns #13. Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`. Here we need to recognize that this is effectively both exit and entry to global subprog, which means we stay in caller's frame. So we carry on with r6 still in precision set, until we satisfy it at insn #9. The only hard part with global subprogs is just knowing when it's a global func. Lastly, callback-calling helpers and kfuncs do simulate subprog calls, so jump history will have subprog instructions in between caller program's instructions, but the rules of propagating r0 and r1-r5 differ, because we don't actually directly call callback. We actually call helper/kfunc, which at runtime will call subprog, so the only difference between normal helper/kfunc handling is that we need to make sure to skip callback simulatinog part of jump history. Let's look at an example to make this clearer. frame 0 frame 1 precision set ======= ======= ============= 8: r6 = 456; 9: r1 = 123; fr0: r6 10: r2 = &callback; fr0: r6 11: call bpf_loop; fr0: r6 22: r0 = r1; fr0: r6 fr1: 23: exit fr0: r6 fr1: 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit` we see that this isn't actually a static subprog call (it's `call bpf_loop;` helper call instead). So we clear r0 from precision set. For callee-saved register, there is no difference: it stays in frame 0's precision set, we go through insn #22 and #23, ignoring them until we get back to caller frame 0, eventually satisfying precision backtrack logic at insn #8 (`r6 = 456;`). Assuming callback needed to set r0 as precise at insn #23, we'd backtrack to insn #22, switching from r0 to r1, and then at the point when we pop back to frame 0 at insn #11, we'll clear r1-r5 from precision set, as we don't really do a subprog call directly, so there is no input argument precision propagation. That's pretty much it. With these changes, it seems like the only still unsupported situation for precision backpropagation is the case when program is accessing stack through registers other than r10. This is still left as unsupported (though rare) case for now. As for results. For selftests, few positive changes for bigger programs, cls_redirect in dynptr variant benefitting the most: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ---------------------------------------- ------------- --------- --------- ---------------- pyperf600_bpf_loop.bpf.linked1.o on_event 2060 2002 -58 (-2.82%) test_cls_redirect_dynptr.bpf.linked1.o cls_redirect 15660 2914 -12746 (-81.39%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 61620 59088 -2532 (-4.11%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 109980 86278 -23702 (-21.55%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 97716 85147 -12569 (-12.86%) Cilium progress don't really regress. They don't use subprogs and are mostly unaffected, but some other fixes and improvements could have changed something. This doesn't appear to be the case: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ------------- ------------------------------ --------- --------- ------------ bpf_host.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_xdp.o tail_handle_nat_fwd_ipv6 12475 12504 +29 (+0.23%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6363 6371 +8 (+0.13%) Looking at (somewhat anonymized) Meta production programs, we see mostly insignificant variation in number of instructions, with one program (syar_bind6_protect6) benefitting the most at -17%. [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0' Program Insns (A) Insns (B) Insns (DIFF) ------------------------ --------- --------- ---------------- on_request_context_event 597 585 -12 (-2.01%) read_async_py_stack 43789 43657 -132 (-0.30%) read_sync_py_stack 35041 37599 +2558 (+7.30%) rrm_usdt 946 940 -6 (-0.63%) sysarmor_inet6_bind 28863 28249 -614 (-2.13%) sysarmor_inet_bind 28845 28240 -605 (-2.10%) syar_bind4_protect4 154145 147640 -6505 (-4.22%) syar_bind6_protect6 165242 137088 -28154 (-17.04%) syar_task_exit_setgid 21289 19720 -1569 (-7.37%) syar_task_exit_setuid 21290 19721 -1569 (-7.37%) do_uprobe 19967 19413 -554 (-2.77%) tw_twfw_ingress 215877 204833 -11044 (-5.12%) tw_twfw_tc_in 215877 204833 -11044 (-5.12%) But checking duration (wall clock) differences, that is the actual time taken by verifier to validate programs, we see a sometimes dramatic improvements, all the way to about 16x improvements: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20 Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) ---------------------------------------- ----------------- ----------------- -------------------- tw_twfw_ingress 4488374 272836 -4215538 (-93.92%) tw_twfw_tc_in 4339111 268175 -4070936 (-93.82%) tw_twfw_egress 3521816 270751 -3251065 (-92.31%) tw_twfw_tc_eg 3472878 284294 -3188584 (-91.81%) balancer_ingress 343119 291391 -51728 (-15.08%) syar_bind6_protect6 78992 64782 -14210 (-17.99%) ttls_tc_ingress 11739 8176 -3563 (-30.35%) kprobe__security_inode_link 13864 11341 -2523 (-18.20%) read_sync_py_stack 21927 19442 -2485 (-11.33%) read_async_py_stack 30444 28136 -2308 (-7.58%) syar_task_exit_setuid 10256 8440 -1816 (-17.71%) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
When precision backtracking bails out due to some unsupported sequence of instructions (e.g., stack access through register other than r10), we need to mark all SCALAR registers as precise to be safe. Currently, though, we mark SCALARs precise only starting from the state we detected unsupported condition, which could be one of the parent states of the actual current state. This will leave some registers potentially not marked as precise, even though they should. So make sure we start marking scalars as precise from current state (env->cur_state). Further, we don't currently detect a situation when we end up with some stack slots marked as needing precision, but we ran out of available states to find the instructions that populate those stack slots. This is akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be handled similarly by falling back to marking all SCALARs precise. Add this check when we run out of states. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Fix propagate_precision() logic to perform propagation of all necessary registers and stack slots across all active frames *in one batch step*. Doing this for each register/slot in each individual frame is wasteful, but the main problem is that backtracking of instruction in any frame except the deepest one just doesn't work. This is due to backtracking logic relying on jump history, and available jump history always starts (or ends, depending how you view it) in current frame. So, if prog A (frame #0) called subprog B (frame #1) and we need to propagate precision of, say, register R6 (callee-saved) within frame #0, we actually don't even know where jump history that corresponds to prog A even starts. We'd need to skip subprog part of jump history first to be able to do this. Luckily, with struct backtrack_state and __mark_chain_precision() handling bitmasks tracking/propagation across all active frames at the same time (added in previous patch), propagate_precision() can be both fixed and sped up by setting all the necessary bits across all frames and then performing one __mark_chain_precision() pass. This makes it unnecessary to skip subprog parts of jump history. We also improve logging along the way, to clearly specify which registers' and slots' precision markings are propagated within which frame. Each frame will have dedicated line and all registers and stack slots from that frame will be reported in format similar to precision backtrack regs/stack logging. E.g.: frame 1: propagating r1,r2,r3,fp-8,fp-16 frame 0: propagating r3,r9,fp-120 Fixes: 529409ea ("bpf: propagate precision across all frames, not just the last one") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Teach __mark_chain_precision logic to maintain register/stack masks across all active frames when going from child state to parent state. Currently this should be mostly no-op, as precision backtracking usually bails out when encountering subprog entry/exit. It's not very apparent from the diff due to increased indentation, but the logic remains the same, except everything is done on specific `fr` frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(), where frame index is passed explicitly, instead of using current frame number. We also adjust logging to emit affected frame number. And we also add better logging of human-readable register and stack slot masks, similar to previous patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add helper to format register and stack masks in more human-readable format. Adjust logging a bit during backtrack propagation and especially during forcing precision fallback logic to make it clearer what's going on (with log_level=2, of course), and also start reporting affected frame depth. This is in preparation for having more than one active frame later when precision propagation between subprog calls is added. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Add struct backtrack_state and straightforward API around it to keep track of register and stack masks used and maintained during precision backtracking process. Having this logic separately allow to keep high-level backtracking algorithm cleaner, but also it sets us up to cleanly keep track of register and stack masks per frame, allowing (with some further logic adjustments) to perform precision backpropagation across multiple frames (i.e., subprog calls). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
When handling instructions that read register slots, mark relevant stack slots as scratched so that verifier log would contain those slots' states, in addition to currently emitted registers with stack slot offsets. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Sometimes during debugging it's important that BPF program is loaded with BPF_F_TEST_STATE_FREQ flag set to force verifier to do frequent state checkpointing. Teach veristat to do this when -t ("test state") flag is specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kenjiro Nakayama authored
To make comments about arc and riscv arch in bpf_tracing.h accurate, this patch fixes the comment about arc and adds the comment for riscv. Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230504035443.427927-1-nakayamakenjiro@gmail.com
-
- 02 May, 2023 2 commits
-
-
Kui-Feng Lee authored
Only print the warning message if you are writing to "/proc/sys/kernel/unprivileged_bpf_disabled". The kernel may print an annoying warning when you read "/proc/sys/kernel/unprivileged_bpf_disabled" saying WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks! However, this message is only meaningful when the feature is disabled or enabled. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com
-
Yonghong Song authored
In one of our internal testing, we found a case where - uapi struct bpf_tcp_sock is in vmlinux.h where vmlinux.h is not generated from the testing kernel - struct bpf_tcp_sock is not in vmlinux BTF The above combination caused bpf load failure as the following memory access struct bpf_tcp_sock *tcp_sock = ...; ... tcp_sock->snd_cwnd ... needs CORE relocation but the relocation cannot be resolved since the kernel BTF does not have corresponding type. Similar to other previous cases (nf_conn___init, tcp6_sock, mctcp_sock, etc.), add the type to vmlinux BTF with BTF_EMIT_TYPE macro. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230502180543.1832140-1-yhs@fb.com
-
- 01 May, 2023 4 commits
-
-
Andrii Nakryiko authored
Stephen Veiss says: ==================== BPF selftests have ALLOWLIST and DENYLIST files, used to control which tests are run in CI. These files are currently parsed by a shell script. [1] This patchset allows those files to be specified directly on the test_progs command line (eg, as -a @ALLOWLIST). This also fixes a bug in the existing test filter code causing unnecessary duplicate top-level test filter entries to be created. [1] https://github.com/kernel-patches/vmtest/blob/57feb460047b69f891cf4afe3cc860794a2ced17/ci/vmtest/run_selftests.sh#L21-L27 --- v2: - error handling style changes per reviewer comments - fdopen return value checking in test_parse_test_list_file v1: https://lore.kernel.org/bpf/20230425225401.1075796-1-sveiss@meta.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Stephen Veiss authored
Improve test selection logic when using -a/-b/-d/-t options. The list of tests to include or exclude can now be read from a file, specified as @<filename>. The file contains one name (or wildcard pattern) per line, and comments beginning with # are ignored. These options can be passed multiple times to read more than one file. Signed-off-by: Stephen Veiss <sveiss@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230427225333.3506052-3-sveiss@meta.com
-
Stephen Veiss authored
Split the logic to insert new tests into test filter sets out from parse_test_list. Fix the subtest insertion logic to reuse an existing top-level test filter, which prevents the creation of duplicate top-level test filters each with a single subtest. Signed-off-by: Stephen Veiss <sveiss@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230427225333.3506052-2-sveiss@meta.com
-
Martin KaFai Lau authored
The btf_dump/struct_data selftest is failing with: [...] test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264 [...] The reason is in btf_dump_type_data_check_overflow(). It does not use BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead, it is using the enum size which is 4. It had been working till the recent commit 4e04143c ("fs_context: drop the unused lsm_flags member") removed an integer member which also removed the 4 bytes padding at the end of the fs_context. Missing this 4 bytes padding exposed this bug. In particular, when btf_dump_type_data_check_overflow() reaches the member 'phase', -E2BIG is returned. The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In btf_dump_type_data_check_overflow(), it does a different size check when bit_sz is not zero. The current fs_context: [3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3 'FS_CONTEXT_FOR_MOUNT' val=0 'FS_CONTEXT_FOR_SUBMOUNT' val=1 'FS_CONTEXT_FOR_RECONFIGURE' val=2 [3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7 'FS_CONTEXT_CREATE_PARAMS' val=0 'FS_CONTEXT_CREATING' val=1 'FS_CONTEXT_AWAITING_MOUNT' val=2 'FS_CONTEXT_AWAITING_RECONF' val=3 'FS_CONTEXT_RECONF_PARAMS' val=4 'FS_CONTEXT_RECONFIGURING' val=5 'FS_CONTEXT_FAILED' val=6 [3602] STRUCT 'fs_context' size=264 vlen=21 'ops' type_id=3603 bits_offset=0 'uapi_mutex' type_id=235 bits_offset=64 'fs_type' type_id=872 bits_offset=1216 'fs_private' type_id=21 bits_offset=1280 'sget_key' type_id=21 bits_offset=1344 'root' type_id=781 bits_offset=1408 'user_ns' type_id=251 bits_offset=1472 'net_ns' type_id=984 bits_offset=1536 'cred' type_id=1785 bits_offset=1600 'log' type_id=3621 bits_offset=1664 'source' type_id=42 bits_offset=1792 'security' type_id=21 bits_offset=1856 's_fs_info' type_id=21 bits_offset=1920 'sb_flags' type_id=20 bits_offset=1984 'sb_flags_mask' type_id=20 bits_offset=2016 's_iflags' type_id=20 bits_offset=2048 'purpose' type_id=3600 bits_offset=2080 bitfield_size=8 'phase' type_id=3601 bits_offset=2088 bitfield_size=8 'need_free' type_id=67 bits_offset=2096 bitfield_size=1 'global' type_id=67 bits_offset=2097 bitfield_size=1 'oldapi' type_id=67 bits_offset=2098 bitfield_size=1 Fixes: 920d16af ("libbpf: BTF dumper support for typed data") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230428013638.1581263-1-martin.lau@linux.dev
-
- 28 Apr, 2023 1 commit
-
-
Martin KaFai Lau authored
It is reported that the fexit_sleep never returns in aarch64. The remaining tests cannot start. Put this test into DENYLIST.aarch64 for now so that other tests can continue to run in the CI. Acked-by: Manu Bretelle <chantr4@gmail.com> Reported-by: Manu Bretelle <chantra@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 27 Apr, 2023 18 commits
-
-
Yonghong Song authored
The selftest test_global_funcs/global_func1 failed with the latest clang17. The reason is due to upstream ArgumentPromotionPass ([1]), which may manipulate static function parameters and cause inlining although the funciton is marked as noinline. The original code: static __attribute__ ((noinline)) int f0(int var, struct __sk_buff *skb) { return skb->len; } __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return f0(0, skb) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } After ArgumentPromotionPass, the code is translated to static __attribute__ ((noinline)) int f0(int var, int skb_len) { return skb_len; } __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return f0(0, skb->len) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } And later llvm InstCombine phase recognized that f0() simplify returns the value of the second argument and removed f0() completely and the final code looks like: __attribute__ ((noinline)) int f1(struct __sk_buff *skb) { ... return skb->len + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4); } If f0() is not inlined, the verification will fail with stack size 544 for a particular callchain. With f0() inlined, the maximum stack size is 512 which is in the limit. Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass from hoisting the code to its caller, and this fixed the test failure. [1] https://reviews.llvm.org/D148269Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230425174744.1758515-1-yhs@fb.com
-
Xueming Feng authored
When using `bpftool map dump` with map_of_maps, it is usually more convenient to show the inner map id instead of raw value. We are changing the plain print behavior to show inner_map_id instead of hex value, this would help with quick look up of inner map with `bpftool map dump id <inner_map_id>`. To avoid disrupting scripted behavior, we will add a new `inner_map_id` field to json output instead of replacing value. plain print: ``` $ bpftool map dump id 138 Without Patch: key: fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 27 16 06 00 value: 8b 00 00 00 Found 1 element With Patch: key: fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 27 16 06 00 inner_map_id: 139 Found 1 element ``` json print: ``` $ bpftool -p map dump id 567 Without Patch: [{ "key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00" ], "value": ["0x38","0x02","0x00","0x00" ] } ] With Patch: [{ "key": ["0xc0","0x00","0x02","0x05","0x27","0x16","0x06","0x00" ], "value": ["0x38","0x02","0x00","0x00" ], "inner_map_id": 568 } ] ``` Signed-off-by: Xueming Feng <kuro@kuroa.me> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230427120313.43574-1-kuro@kuroa.meSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Kal Conley authored
Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an active DMA mapping. pool->dma_pages needs to be read anyway to access the map so this compiles to more efficient code. Signed-off-by: Kal Conley <kal.conley@dectris.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com
-
Florent Revest authored
Now that ftrace supports direct call on arm64, BPF tracing programs work on that architecture. This fixes the vast majority of BPF selftests except for: - multi_kprobe programs which require fprobe, not available on arm64 yet - tracing_struct which requires trampoline support to access struct args This patch updates the list of BPF selftests which are known to fail so the BPF CI can validate the tests which pass now. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230427143207.635263-1-revest@chromium.org
-
Jesper Dangaard Brouer authored
To correlate the hardware RX timestamp with something, add tracking of two software timestamps both clock source CLOCK_TAI (see description in man clock_gettime(2)). XDP metadata is extended with xdp_timestamp for capturing when XDP received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I could not find a BPF helper for getting CLOCK_REALTIME, which would have been preferred. In userspace when AF_XDP sees the packet another software timestamp is recorded via clock_gettime() also clock source CLOCK_TAI. Example output shortly after loading igc driver: poll: 1 (0) skip=1 fail=0 redir=2 xsk_ring_cons__peek: 1 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000 rx_hash: 0x82A96531 with RSS type:0x1 rx_timestamp: 1681740540304898909 (sec:1681740540.3049) XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec) AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec) 0x12557a8: complete idx=9 addr=9000 The first observation is that the 37 sec difference between RX HW vs XDP timestamps, which indicate hardware is likely clock source CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised with a 37 sec offset. The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the userspace wakeup time. On this hardware it was caused by CPU idle sleep states, which can be reduced by tuning /dev/cpu_dma_latency. View current requested/allowed latency bound via: hexdump --format '"%d\n"' /dev/cpu_dma_latency More explanation of the output and how this can be used to identify clock drift for the HW clock can be seen here[1]: [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.orgSigned-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182466298.616355.2544377890818617459.stgit@firesoul
-
Jesper Dangaard Brouer authored
The NIC hardware RX timestamping mechanism adds an optional tailored header before the MAC header containing packet reception time. Optional depending on RX descriptor TSIP status bit (IGC_RXDADV_STAT_TSIP). In case this bit is set driver does offset adjustments to packet data start and extracts the timestamp. The timestamp need to be extracted before invoking the XDP bpf_prog, because this area just before the packet is also accessible by XDP via data_meta context pointer (and helper bpf_xdp_adjust_meta). Thus, an XDP bpf_prog can potentially overwrite this and corrupt data that we want to extract with the new kfunc for reading the timestamp. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182465791.616355.2583922957423587914.stgit@firesoul
-
Jesper Dangaard Brouer authored
This implements XDP hints kfunc for RX-hash (xmo_rx_hash). The HW rss hash type is handled via mapping table. This igc driver (default config) does L3 hashing for UDP packets (excludes UDP src/dest ports in hash calc). Meaning RSS hash type is L3 based. Tested that the igc_rss_type_num for UDP is either IGC_RSS_TYPE_HASH_IPV4 or IGC_RSS_TYPE_HASH_IPV6. This patch also updates AF_XDP zero-copy function igc_clean_rx_irq_zc() to use the xdp_buff wrapper struct igc_xdp_buff. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182465285.616355.2701740913376314790.stgit@firesoul
-
Jesper Dangaard Brouer authored
Driver specific metadata data for XDP-hints kfuncs are propagated via tail extending the struct xdp_buff with a locally scoped driver struct. Zero-Copy AF_XDP/XSK does similar tricks via struct xdp_buff_xsk. This xdp_buff_xsk struct contains a CB area (24 bytes) that can be used for extending the locally scoped driver into. The XSK_CHECK_PRIV_TYPE define catch size violations build time. The changes needed for AF_XDP zero-copy in igc_clean_rx_irq_zc() is done in next patch, because the member rx_desc isn't available at this point. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182464779.616355.3761989884165609387.stgit@firesoul
-
Jesper Dangaard Brouer authored
When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a ("igc: Add transmit and receive fastpath and interrupt handlers"), the hardware wasn't configured to provide RSS hash, thus it made sense to not enable net_device NETIF_F_RXHASH feature bit. The NIC hardware was configured to enable RSS hash info in v5.2 via commit 2121c271 ("igc: Add multiple receive queues control supporting"), but forgot to set the NETIF_F_RXHASH feature bit. The original implementation of igc_rx_hash() didn't extract the associated pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of this patch are about extracting the RSS Type from the hardware and mapping this to enum pkt_hash_types. This was based on Foxville i225 software user manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03). For UDP it's worth noting that RSS (type) hashing have been disabled both for IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP) because hardware RSS doesn't handle fragmented pkts well when enabled (can cause out-of-order). This results in PKT_HASH_TYPE_L3 for UDP packets, and hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have the effect that netstack will do a software based hash calc calling into flow_dissect, but only when code calls skb_get_hash(), which doesn't necessary happen for local delivery. For QA verification testing I wrote a small bpftrace prog: [0] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/monitor_skb_hash_on_dev.bt Fixes: 2121c271 ("igc: Add multiple receive queues control supporting") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182464270.616355.11391652654430626584.stgit@firesoul
-
Kui-Feng Lee authored
A new link type, BPF_LINK_TYPE_STRUCT_OPS, was added to attach struct_ops to links. (226bc6ae) It would be helpful for users to know which map is associated with the link. The assumption was that every link is associated with a BPF program, but this does not hold true for struct_ops. It would be better to display map_id instead of prog_id for struct_ops links. However, some tools may rely on the old assumption and need a prog_id. The discussion on the mailing list suggests that tools should parse JSON format. We will maintain the existing JSON format by adding a map_id without removing prog_id. As for plain text format, we will remove prog_id from the header line and add a map_id for struct_ops links. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230421214131.352662-1-kuifeng@meta.com
-
Joe Stringer authored
Extend the bpf hashmap docs to include a brief description of the internals of the LRU map type (setting appropriate API expectations), including the original commit message from Martin and a variant on the graph that I had presented during my Linux Plumbers Conference 2022 talk on "Pressure feedback for LRU map types"[0]. The node names in the dot file correspond roughly to the functions where the logic for those decisions or steps is defined, to help curious developers to cross-reference and update this logic if the details of the LRU implementation ever differ from this description. [0] https://lpc.events/event/16/contributions/1368/Signed-off-by: Joe Stringer <joe@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230422172054.3355436-2-joe@isovalent.com
-
Joe Stringer authored
Depending on the map type and flags for LRU, different properties are global or percpu. Add a table to describe these. Signed-off-by: Joe Stringer <joe@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230422172054.3355436-1-joe@isovalent.com
-
Daniel Borkmann authored
Add a test case to check for precision marking of safe paths. Ensure that the verifier will not prematurely prune scalars contributing to registers needing precision. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Joanne Koong authored
Add various tests for the added dynptr convenience helpers. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230420071414.570108-6-joannelkoong@gmail.com
-
Joanne Koong authored
The cloned dynptr will point to the same data as its parent dynptr, with the same type, offset, size and read-only properties. Any writes to a dynptr will be reflected across all instances (by 'instance', this means any dynptrs that point to the same underlying data). Please note that data slice and dynptr invalidations will affect all instances as well. For example, if bpf_dynptr_write() is called on an skb-type dynptr, all data slices of dynptr instances to that skb will be invalidated as well (eg data slices of any clones, parents, grandparents, ...). Another example is if a ringbuf dynptr is submitted, any instance of that dynptr will be invalidated. Changing the view of the dynptr (eg advancing the offset or trimming the size) will only affect that dynptr and not affect any other instances. One example use case where cloning may be helpful is for hashing or iterating through dynptr data. Cloning will allow the user to maintain the original view of the dynptr for future use, while also allowing views to smaller subsets of the data after the offset is advanced or the size is trimmed. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230420071414.570108-5-joannelkoong@gmail.com
-
Joanne Koong authored
bpf_dynptr_size returns the number of usable bytes in a dynptr. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230420071414.570108-4-joannelkoong@gmail.com
-
Joanne Koong authored
bpf_dynptr_is_null returns true if the dynptr is null / invalid (determined by whether ptr->data is NULL), else false if the dynptr is a valid dynptr. bpf_dynptr_is_rdonly returns true if the dynptr is read-only, else false if the dynptr is read-writable. If the dynptr is null / invalid, false is returned by default. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20230420071414.570108-3-joannelkoong@gmail.com
-
Joanne Koong authored
Add a new kfunc int bpf_dynptr_adjust(struct bpf_dynptr_kern *ptr, u32 start, u32 end); which adjusts the dynptr to reflect the new [start, end) interval. In particular, it advances the offset of the dynptr by "start" bytes, and if end is less than the size of the dynptr, then this will trim the dynptr accordingly. Adjusting the dynptr interval may be useful in certain situations. For example, when hashing which takes in generic dynptrs, if the dynptr points to a struct but only a certain memory region inside the struct should be hashed, adjust can be used to narrow in on the specific region to hash. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230420071414.570108-2-joannelkoong@gmail.com
-
- 26 Apr, 2023 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds authored
Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target, mpi3mr, hisi_sas, arcmsr). The major core change is the constification of the host templates (which touches everything) along with other minor fixups and clean ups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command() scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately scsi: cxlflash: s/semahpore/semaphore/ scsi: lpfc: Silence an incorrect device output scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation scsi: scsi_debug: Fix missing error code in scsi_debug_init() scsi: hisi_sas: Work around build failure in suspend function scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() scsi: mpt3sas: Fix an issue when driver is being removed scsi: mpt3sas: Remove HBA BIOS version in the kernel log scsi: target: core: Fix invalid memory access scsi: scsi_debug: Drop sdebug_queue scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll() scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd scsi: scsi_debug: Use scsi_block_requests() to block queues scsi: scsi_debug: Protect block_unblock_all_queues() with mutex scsi: scsi_debug: Change shost list lock to a mutex ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libataLinus Torvalds authored
Pull ata updates from Damien Le Moal: - Many cleanups of the pata_parport driver and of its protocol modules (Ondrej) - Remove unused code (ata_id_xxx() functions) (Sergey) - Add Add UniPhier SATA controller DT bindings (Kunihiko) - Fix dependencies for the Freescale QorIQ AHCI SATA controller driver (Geert) - DT property handling improvements (Rob) * tag 'ata-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (57 commits) ata: pata_parport-bpck6: Declare mode_map as static ata: pata_parport-bpck6: Remove dependency on 64BIT ata: pata_parport-bpck6: reduce indents in bpck6_open ata: pata_parport-bpck6: delete ppc6lnx.c ata: pata_parport-bpck6: move defines and mode_map to bpck6.c ata: pata_parport-bpck6: move ppc6_wr_data_byte to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_rd_data_byte to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_send_cmd to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_deselect to bpck6.c and rename ata: pata_parport-bpck6: merge ppc6_select into bpck6_open ata: pata_parport-bpck6: move ppc6_open to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_wr_extout to bpck6.c and rename ata: pata_parport-bpck6: move ppc6_wait_for_fifo to bpck6.c and rename ata: pata_parport-bpck6: merge ppc6_wr_data_blk into bpck6_write_block ata: pata_parport-bpck6: merge ppc6_rd_data_blk into bpck6_read_block ata: pata_parport-bpck6: merge ppc6_wr_port16_blk into bpck6_write_block ata: pata_parport-bpck6: merge ppc6_rd_port16_blk into bpck6_read_block ata: pata_parport-bpck6: merge ppc6_wr_port into bpck6_write_regr ata: pata_parport-bpck6: merge ppc6_rd_port into bpck6_read_regr ata: pata_parport-bpck6: remove ppc6_close ...
-