- 15 Jul, 2021 3 commits
-
-
He Fengqing authored
In bpf_patch_insn_data(), we first use the bpf_patch_insn_single() to insert new instructions, then use adjust_insn_aux_data() to adjust insn_aux_data. If the old env->prog have no enough room for new inserted instructions, we use bpf_prog_realloc to construct new_prog and free the old env->prog. There have two errors here. First, if adjust_insn_aux_data() return ENOMEM, we should free the new_prog. Second, if adjust_insn_aux_data() return ENOMEM, bpf_patch_insn_data() will return NULL, and env->prog has been freed in bpf_prog_realloc, but we will use it in bpf_check(). So in this patch, we make the adjust_insn_aux_data() never fails. In bpf_patch_insn_data(), we first pre-malloc memory for the new insn_aux_data, then call bpf_patch_insn_single() to insert new instructions, at last call adjust_insn_aux_data() to adjust insn_aux_data. Fixes: 8041902d ("bpf: adjust insn_aux_data when patching insns") Signed-off-by: He Fengqing <hefengqing@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210714101815.164322-1-hefengqing@huawei.com
-
Kuniyuki Iwashima authored
Fix s/BPF_MAP_TYPE_REUSEPORT_ARRAY/BPF_MAP_TYPE_REUSEPORT_SOCKARRAY/ typo in bpf.h. Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210714124317.67526-1-kuniyu@amazon.co.jp
-
Alexei Starovoitov authored
Commit 47316f4a missed updating tools/.../bpf.h. Sync it. Fixes: 47316f4a ("bpf: Support input xdp_md context in BPF_PROG_TEST_RUN") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 12 Jul, 2021 1 commit
-
-
Martynas Pumputis authored
When loading a BPF program with a pinned map, the loader checks whether the pinned map can be reused, i.e. their properties match. To derive such of the pinned map, the loader invokes BPF_OBJ_GET_INFO_BY_FD and then does the comparison. Unfortunately, on < 4.12 kernels the BPF_OBJ_GET_INFO_BY_FD is not available, so loading the program fails with the following error: libbpf: failed to get map info for map FD 5: Invalid argument libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc/globals/cilium_call_policy': parameter mismatch" libbpf: map 'cilium_call_policy': error reusing pinned map libbpf: map 'cilium_call_policy': failed to create: Invalid argument(-22) libbpf: failed to load object 'bpf_overlay.o' To fix this, fallback to derivation of the map properties via /proc/$PID/fdinfo/$MAP_FD if BPF_OBJ_GET_INFO_BY_FD fails with EINVAL, which can be used as an indicator that the kernel doesn't support the latter. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210712125552.58705-1-m@lambda.lt
-
- 08 Jul, 2021 12 commits
-
-
Jesper Dangaard Brouer authored
Experience from production shows queue size of 192 is too small, as this caused packet drops during cpumap-enqueue on RX-CPU. This can be diagnosed with xdp_monitor sample program. This bpftrace program was used to diagnose the problem in more detail: bpftrace -e ' tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) } tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }' Watch out for the @enq_drops counter. The @drop_net counter can happen when netstack gets invalid packets, so don't despair it can be natural, and that counter will likely disappear in newer kernels as it was a source of confusion (look at netstat info for reason of the netstack @drop_net counters). The production system was configured with CPU power-saving C6 state. Learn more in this blogpost[1]. And wakeup latency in usec for the states are: # grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133 Deepest state take 133 usec to wakeup from (133/10^6). The link speed is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With MTU size packets this is 275 packets, and with minimum Ethernet (incl intergap overhead) 84 bytes it is 4948 packets. Clearly default queue size is too small. Setting default cpumap queue to 2048 as worst-case (small packet) at 10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets. Thus, if a packet burst on RX-CPU will enqueue packets to a remote cpumap CPU that is in deep-sleep state it can overrun the cpumap queue. The production system was also configured to avoid deep-sleep via: tuned-adm profile network-latency [1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul
-
Alexei Starovoitov authored
Kumar Kartikeya says: ==================== This small series makes some improvements to generic XDP mode and brings it closer to native XDP. Patch 1 splits out generic XDP processing into reusable parts, patch 2 adds pointer friendly wrappers for bitops (not have to cast back and forth the address of local pointer to unsigned long *), patch 3 implements generic cpumap support (details in commit) and patch 4 allows devmap bpf prog execution before generic_xdp_tx is called. Patch 5 just updates a couple of selftests to adapt to changes in behavior (in that specifying devmap/cpumap prog fd in generic mode is now allowed). Changelog: ---------- v5 -> v6 v5: https://lore.kernel.org/bpf/20210701002759.381983-1-memxor@gmail.com * Put rcpu->prog check before RCU-bh section to avoid do_softirq (Jesper) v4 -> v5 v4: https://lore.kernel.org/bpf/20210628114746.129669-1-memxor@gmail.com * Add comments and examples for new bitops macros (Alexei) v3 -> v4 v3: https://lore.kernel.org/bpf/20210622202835.1151230-1-memxor@gmail.com * Add detach now that attach of XDP program succeeds (Toke) * Clean up the test to use new ASSERT macros v2 -> v3 v2: https://lore.kernel.org/bpf/20210622195527.1110497-1-memxor@gmail.com * list_for_each_entry -> list_for_each_entry_safe (due to deletion of skb) v1 -> v2 v1: https://lore.kernel.org/bpf/20210620233200.855534-1-memxor@gmail.com * Move __ptr_{set,clear,test}_bit to bitops.h (Toke) Also changed argument order to match the bit op they wrap. * Remove map value size checking functions for cpumap/devmap (Toke) * Rework prog run for skb in cpu_map_kthread_run (Toke) * Set skb->dev to dst->dev after devmap prog has run * Don't set xdp rxq that will be overwritten in cpumap prog run ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Kumar Kartikeya Dwivedi authored
Support for cpumap and devmap entry progs in previous commits means the test needs to be updated for the new semantics. Also take this opportunity to convert it from CHECK macros to the new ASSERT macros. Since xdp_cpumap_attach has no subtest, put the sole test inside the test_xdp_cpumap_attach function. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-6-memxor@gmail.com
-
Kumar Kartikeya Dwivedi authored
This lifts the restriction on running devmap BPF progs in generic redirect mode. To match native XDP behavior, it is invoked right before generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/ XDP_DROP actions. We also return 0 even if devmap program drops the packet, as semantically redirect has already succeeded and the devmap prog is the last point before TX of the packet to device where it can deliver a verdict on the packet. This also means it must take care of freeing the skb, as xdp_do_generic_redirect callers only do that in case an error is returned. Since devmap entry prog is supported, remove the check in generic_xdp_install entirely. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-5-memxor@gmail.com
-
Kumar Kartikeya Dwivedi authored
This change implements CPUMAP redirect support for generic XDP programs. The idea is to reuse the cpu map entry's queue that is used to push native xdp frames for redirecting skb to a different CPU. This will match native XDP behavior (in that RPS is invoked again for packet reinjected into networking stack). To be able to determine whether the incoming skb is from the driver or cpumap, we reuse skb->redirected bit that skips generic XDP processing when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on it has been lifted and it is always available. >From the redirect side, we add the skb to ptr_ring with its lowest bit set to 1. This should be safe as skb is not 1-byte aligned. This allows kthread to discern between xdp_frames and sk_buff. On consumption of the ptr_ring item, the lowest bit is unset. In the end, the skb is simply added to the list that kthread is anyway going to maintain for xdp_frames converted to skb, and then received again by using netif_receive_skb_list. Bulking optimization for generic cpumap is left as an exercise for a future patch for now. Since cpumap entry progs are now supported, also remove check in generic_xdp_install for the cpumap. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-4-memxor@gmail.com
-
Kumar Kartikeya Dwivedi authored
cpumap needs to set, clear, and test the lowest bit in skb pointer in various places. To make these checks less noisy, add pointer friendly bitop macros that also do some typechecking to sanitize the argument. These wrap the non-atomic bitops __set_bit, __clear_bit, and test_bit but for pointer arguments. Pointer's address has to be passed in and it is treated as an unsigned long *, since width and representation of pointer and unsigned long match on targets Linux supports. They are prefixed with double underscore to indicate lack of atomicity. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-3-memxor@gmail.com
-
Kumar Kartikeya Dwivedi authored
This helper can later be utilized in code that runs cpumap and devmap programs in generic redirect mode and adjust skb based on changes made to xdp_buff. When returning XDP_REDIRECT/XDP_TX, it invokes __skb_push, so whenever a generic redirect path invokes devmap/cpumap prog if set, it must __skb_pull again as we expect mac header to be pulled. It also drops the skb_reset_mac_len call after do_xdp_generic, as the mac_header and network_header are advanced by the same offset, so the difference (mac_len) remains constant. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-2-memxor@gmail.com
-
Alexei Starovoitov authored
Zvi Effron says: ==================== This patchset adds support for passing an xdp_md via ctx_in/ctx_out in bpf_attr for BPF_PROG_TEST_RUN of XDP programs. Patch 1 adds a function to validate XDP meta data lengths. Patch 2 adds initial support for passing XDP meta data in addition to packet data. Patch 3 adds support for also specifying the ingress interface and rx queue. Patch 4 adds selftests to ensure functionality is correct. Changelog: ---------- v7->v8 v7: https://lore.kernel.org/bpf/20210624211304.90807-1-zeffron@riotgames.com/ * Fix too long comment line in patch 3 v6->v7 v6: https://lore.kernel.org/bpf/20210617232904.1899-1-zeffron@riotgames.com/ * Add Yonghong Song's Acked-by to commit message in patch 1 * Add Yonghong Song's Acked-by to commit message in patch 2 * Extracted the post-update of the xdp_md context into a function (again) * Validate that the rx queue was registered with XDP info * Decrement the reference count on a found netdevice on failure to find a valid rx queue * Decrement the reference count on a found netdevice after the XDP program is run * Drop Yonghong Song's Acked-By for patch 3 because of patch changes * Improve a comment in the selftests * Drop Yonghong Song's Acked-By for patch 4 because of patch changes v5->v6 v5: https://lore.kernel.org/bpf/20210616224712.3243-1-zeffron@riotgames.com/ * Correct commit messages in patches 1 and 3 * Add Acked-by to commit message in patch 4 * Use gotos instead of returns to correctly free resources in bpf_prog_test_run_xdp * Rename xdp_metalen_valid to xdp_metalen_invalid * Improve the function signature for xdp_metalen_invalid * Merged declaration of ingress_ifindex and rx_queue_index into one line v4->v5 v4: https://lore.kernel.org/bpf/20210604220235.6758-1-zeffron@riotgames.com/ * Add new patch to introduce xdp_metalen_valid inline function to avoid duplicated code from net/core/filter.c * Correct size of bad_ctx in selftests * Make all declarations reverse Christmas tree * Move data check from xdp_convert_md_to_buff to bpf_prog_test_run_xdp * Merge xdp_convert_buff_to_md into bpf_prog_test_run_xdp * Fix line too long * Extracted common checks in selftests to a helper function * Removed redundant assignment in selftests * Reordered test cases in selftests * Check data against 0 instead of data_meta in selftests * Made selftests use EINVAL instead of hardcoded 22 * Dropped "_" from XDP function name * Changed casts in XDP program from unsigned long to long * Added a comment explaining the use of the loopback interface in selftests * Change parameter order in xdp_convert_md_to_buff to be input first * Assigned xdp->ingress_ifindex and xdp->rx_queue_index to local variables in xdp_convert_md_to_buff * Made use of "meta data" versus "metadata" consistent in comments and commit messages v3->v4 v3: https://lore.kernel.org/bpf/20210602190815.8096-1-zeffron@riotgames.com/ * Clean up nits * Validate xdp_md->data_end in bpf_prog_test_run_xdp * Remove intermediate metalen variables v2 -> v3 v2: https://lore.kernel.org/bpf/20210527201341.7128-1-zeffron@riotgames.com/ * Check errno first in selftests * Use DECLARE_LIBBPF_OPTS * Rename tattr to opts in selftests * Remove extra new line * Rename convert_xdpmd_to_xdpb to xdp_convert_md_to_buff * Rename convert_xdpb_to_xdpmd to xdp_convert_buff_to_md * Move declaration of device and rxqueue in xdp_convert_md_to_buff to patch 2 * Reorder the kfree calls in bpf_prog_test_run_xdp v1 -> v2 v1: https://lore.kernel.org/bpf/20210524220555.251473-1-zeffron@riotgames.com * Fix null pointer dereference with no context * Use the BPF skeleton and replace CHECK with ASSERT macros ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Zvi Effron authored
Add a test for using xdp_md as a context to BPF_PROG_TEST_RUN for XDP programs. The test uses a BPF program that takes in a return value from XDP meta data, then reduces the size of the XDP meta data by 4 bytes. Test cases validate the possible failure cases for passing in invalid xdp_md contexts, that the return value is successfully passed in, and that the adjusted meta data is successfully copied out. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-5-zeffron@riotgames.com
-
Zvi Effron authored
Support specifying the ingress_ifindex and rx_queue_index of xdp_md contexts for BPF_PROG_TEST_RUN. The intended use case is to allow testing XDP programs that make decisions based on the ingress interface or RX queue. If ingress_ifindex is specified, look up the device by the provided index in the current namespace and use its xdp_rxq for the xdp_buff. If the rx_queue_index is out of range, or is non-zero when the ingress_ifindex is 0, return -EINVAL. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-4-zeffron@riotgames.com
-
Zvi Effron authored
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for BPF_PROG_TEST_RUN. The intended use case is to pass some XDP meta data to the test runs of XDP programs that are used as tail calls. For programs that use bpf_prog_test_run_xdp, support xdp_md input and output. Unlike with an actual xdp_md during a non-test run, data_meta must be 0 because it must point to the start of the provided user data. From the initial xdp_md, use data and data_end to adjust the pointers in the generated xdp_buff. All other non-zero fields are prohibited (with EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially different) xdp_md back to the userspace. We require all fields of input xdp_md except the ones we explicitly support to be set to zero. The expectation is that in the future we might add support for more fields and we want to fail explicitly if the user runs the program on the kernel where we don't yet support them. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
-
Zvi Effron authored
This commit prepares to use the XDP meta data length check in multiple places by making it into a static inline function instead of a literal. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-2-zeffron@riotgames.com
-
- 01 Jul, 2021 12 commits
-
-
David S. Miller authored
Marek Behún says: ==================== dsa: mv88e6xxx: Topaz fixes here comes some fixes for the Topaz family (Marvell 88E6141 / 88E6341) which I found out about when I compared the Topaz' operations structure with that one of Peridot (6390). This is v2. In v1, I accidentally sent patches generated from wrong branch and the 5th patch does not contain a necessary change in serdes.c. Changes from v1: - the fifth patch, "enable SerDes RX stats for Topaz", needs another change in serdes.c - Andrew's Reviewed-by to 1,2,3,4 and 6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit bf3504ce ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") added support for dumping SerDes PCS registers via ethtool -d for Peridot. The same implementation is also valid for Topaz, but was not enabled at the time. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: bf3504ce ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit 0df95287 ("mv88e6xxx: Add serdes Rx statistics") added support for RX statistics on SerDes ports for Peridot. This same implementation is also valid for Topaz, but was not enabled at the time. We need to use the generic .serdes_get_lane() method instead of the Peridot specific one in the stats methods so that on Topaz the proper one is used. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 0df95287 ("mv88e6xxx: Add serdes Rx statistics") Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit 23e8b470 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.") introduced ATU hash algorithm access via devlink, but did not enable it for Topaz. Enable this feature also for Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 23e8b470 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit 9e5baf9b ("net: dsa: mv88e6xxx: add RMU disable op") introduced .rmu_disable() method with implementation for several models, but forgot to add Topaz, which can use the Peridot implementation. Use the Peridot implementation of .rmu_disable() on Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 9e5baf9b ("net: dsa: mv88e6xxx: add RMU disable op") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit 40cff8fc ("net: dsa: mv88e6xxx: Fix stats histogram mode") introduced wrong .stats_set_histogram() method for Topaz family. The Peridot method should be used instead. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: 40cff8fc ("net: dsa: mv88e6xxx: Fix stats histogram mode") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marek Behún authored
Commit f3a2cd32 ("net: dsa: mv88e6xxx: introduce .port_set_policy") introduced .port_set_policy() method with implementation for several models, but forgot to add Topaz, which can use the 6352 implementation. Use the 6352 implementation of .port_set_policy() on Topaz. Signed-off-by: Marek Behún <kabel@kernel.org> Fixes: f3a2cd32 ("net: dsa: mv88e6xxx: introduce .port_set_policy") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The DSA core has a layered structure, and even though we end up returning 0 (success) to user space when setting a bonding/team upper that can't be offloaded, some parts of the framework actually need to know that we couldn't offload that. For example, if dsa_switch_lag_join returns 0 as it currently does, dsa_port_lag_join has no way to tell a successful offload from a software fallback, and it will call dsa_port_bridge_join afterwards. Then we'll think we're offloading the bridge master of the LAG, when in fact we're not even offloading the LAG. In turn, this will make us set skb->offload_fwd_mark = true, which is incorrect and the bridge doesn't like it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Geetha sowjanya says: ==================== Dynamic LMTST region setup This patch series allows RVU PF/VF to allocate memory for LMTST operations instead of using memory reserved by firmware which is mapped as device memory. The LMTST mapping table contains the RVU PF/VF LMTST memory base address entries. This table is used by hardware for LMTST operations. Patch1 introduces new mailbox message to update the LMTST table with the new allocated memory address. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geetha sowjanya authored
The current driver uses static LMTST region allocated by firmware. This memory gets populated as PF/VF BAR2. RVU PF/VF driver ioremap the memory as device memory for NIX/NPA operation. Since the memory is mapped as device memory we see performance degration. To address this issue this patch implements runtime memory allocation. RVU PF/VF allocates memory during device probe and share the base address with RVU AF. RVU AF then configure the LMT MAP table accordingly. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geetha sowjanya authored
This patch extends the lmtst_tbl_setup_req mbox to support run time LMTST configuration. RVU PF/VF and DPDK/ODP allocates a LMT region, creates a translation entry for a device via VFIO IOCTLs. This IOVA is shared with AF through above mbox. AF then uses RVU_SMMU transulation Widget and gets PA for the IOVA and updates the LMTtable entry for that device. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Harman Kalra authored
Introducing a new mailbox to support updating lmt entries and common lmt base address scheme i.e. multiple pcifuncs can share lmt region to reduce L1 cache pressure for application. Parameters passed to mailbox includes the primary pcifunc value whose lmt regions will be shared by other secondary pcifuncs. Here secondary pcifunc will be the one who is calling the mailbox. For example: By default each pcifunc has its own LMT base address: PCIFUNC1 LMT_BASE_ADDR A PCIFUNC2 LMT_BASE_ADDR B PCIFUNC3 LMT_BASE_ADDR C PCIFUNC4 LMT_BASE_ADDR D Application will choose PCIFUNC1 as base/primary pcifunc and as and when other pcifunc(secondary pcifuncs) gets probed, this mailbox will be called and LMTST table will be updated as: PCIFUNC1 LMT_BASE_ADDR A PCIFUNC2 LMT_BASE_ADDR A PCIFUNC3 LMT_BASE_ADDR A PCIFUNC4 LMT_BASE_ADDR A On FLR lmtst map table gets resetted to the default lmt base addresses for all secondary pcifuncs. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Jun, 2021 12 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds authored
Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: - Fix a small inconsistency (bug) in load tracking, caught by a new warning that several people reported. - Flip CONFIG_SCHED_CORE to default-disabled, and update the Kconfig help text. * tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Disable CONFIG_SCHED_CORE by default sched/fair: Ensure _sum and _avg values stay consistent
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
Pull microblaze updates from Michal Simek: - Remove unused PAGE_UP/DOWN macros - Fix trivial spelling mistake * tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze: arch: microblaze: Fix spelling mistake "vesion" -> "version" microblaze: Cleanup unused functions
-
git://github.com/micah-morton/linuxLinus Torvalds authored
Pull SafeSetID update from Micah Morton: "One very minor code cleanup change that marks a variable as __initdata" * tag 'safesetid-5.14' of git://github.com/micah-morton/linux: LSM: SafeSetID: Mark safesetid_initialized as __initdata
-
git://github.com/cschaufler/smack-nextLinus Torvalds authored
Pull smack updates from Casey Schaufler: "There is nothing more significant than an improvement to a byte count check in smackfs. All changes have been in next for weeks" * tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next: Smack: fix doc warning Revert "Smack: Handle io_uring kernel thread privileges" smackfs: restrict bytes count in smk_set_cipso() security/smack/: fix misspellings using codespell tool
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/auditLinus Torvalds authored
Pull audit updates from Paul Moore: "Another merge window, another small audit pull request. Four patches in total: one is cosmetic, one removes an unnecessary initialization, one renames some enum values to prevent name collisions, and one converts list_del()/list_add() to list_move(). None of these are earth shattering and all pass the audit-testsuite tests while merging cleanly on top of your tree from earlier today" * tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: remove unnecessary 'ret' initialization audit: remove trailing spaces and tabs audit: Use list_move instead of list_del/list_add audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition audit: add blank line after variable declarations
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds authored
Pull SELinux updates from Paul Moore: - The slow_avc_audit() function is now non-blocking so we can remove the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of avc_has_perm(). - Use kmemdup() instead of kcalloc()+copy when copying parts of the SELinux policydb. - The InfiniBand device name is now passed by reference when possible in the SELinux code, removing a strncpy(). - Minor cleanups including: constification of avtab function args, removal of useless LSM/XFRM function args, SELinux kdoc fixes, and removal of redundant assignments. * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit() selinux: slow_avc_audit has become non-blocking selinux: Fix kernel-doc selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC lsm_audit,selinux: pass IB device name by reference selinux: Remove redundant assignment to rc selinux: Corrected comment to match kernel-doc comment selinux: delete selinux_xfrm_policy_lookup() useless argument selinux: constify some avtab function arguments selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull clang feature updates from Kees Cook: - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the face of the noinstr attribute, paving the way for PGO and fixing GCOV. (Nick Desaulniers) - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor) - Small fixes to CFI. (Mark Rutland, Nathan Chancellor) * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR compiler_attributes.h: cleanups for GCC 4.9+ compiler_attributes.h: define __no_profile, add to noinstr x86, lto: Enable Clang LTO for 32-bit as well CFI: Move function_nocfi() into compiler.h MAINTAINERS: Add Clang CFI section
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block driver updates from Jens Axboe: "Pretty calm round, mostly just NVMe and a bit of MD: - NVMe updates (via Christoph) - improve the APST configuration algorithm (Alexey Bogoslavsky) - look for StorageD3Enable on companion ACPI device (Mario Limonciello) - allow selecting the network interface for TCP connections (Martin Belanger) - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King, Christoph) - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner) - MD updates (Via Song) - iostats rewrite (Guoqing Jiang) - raid5 lock contention optimization (Gal Ofri) - Fall through warning fix (Gustavo) - Misc fixes (Gustavo, Jiapeng)" * tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value loop: Fix missing discard support when using LOOP_CONFIGURE nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established ...
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds authored
Pull HID updates from Jiri Kosina: - patch series that ensures that hid-multitouch driver disables touch and button-press reporting on hid-mt devices during suspend when the device is not configured as a wakeup-source, from Hans de Goede - support for ISH DMA on Intel EHL platform, from Even Xu - support for Renoir and Cezanne SoCs, Ambient Light Sensor and Human Presence Detection sensor for amd-sfh driver, from Basavaraj Natikar - other assorted code cleanups and device-specific fixes/quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (45 commits) HID: thrustmaster: Switch to kmemdup() when allocate change_request HID: multitouch: Disable event reporting on suspend when the device is not a wakeup-source HID: logitech-dj: Implement may_wakeup ll-driver callback HID: usbhid: Implement may_wakeup ll-driver callback HID: core: Add hid_hw_may_wakeup() function HID: input: Add support for Programmable Buttons HID: wacom: Correct base usage for capacitive ExpressKey status bits HID: amd_sfh: Add initial support for HPD sensor HID: amd_sfh: Extend ALS support for newer AMD platform HID: amd_sfh: Extend driver capabilities for multi-generation support HID: surface-hid: Fix get-report request HID: sony: fix freeze when inserting ghlive ps3/wii dongles HID: usbkbd: Avoid GFP_ATOMIC when GFP_KERNEL is possible HID: amd_sfh: change in maintainer HID: intel-ish-hid: ipc: Specify that EHL no cache snooping HID: intel-ish-hid: ishtp: Add dma_no_cache_snooping() callback HID: intel-ish-hid: Set ISH driver depends on x86 HID: hid-input: add Surface Go battery quirk HID: intel-ish-hid: Fix minor typos in comments HID: usbmouse: Avoid GFP_ATOMIC when GFP_KERNEL is possible ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasLinus Torvalds authored
Pull EDAC updates from Tony Luck: "Various fixes and support for new CPUs: - Clean up error messages from thunderx_edac - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload - Use %pR to print resources in aspeed_edac - Add Yazen Ghannam as MAINTAINER for AMD edac drivers - Fix Ice Lake and Sapphire Rapids drivers to report correct "near" or "far" device for errors in 2LM configurations - Add support of on package high bandwidth memory in Sapphire Rapids - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for ICL-NNPI, Tiger Lake and Alder Lake) - Don't even try to load Intel EDAC drivers when running as a guest - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6" * tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/igen6: fix core dependency EDAC/Intel: Do not load EDAC driver when running as a guest EDAC/igen6: Add Intel Alder Lake SoC support EDAC/igen6: Add Intel Tiger Lake SoC support EDAC/igen6: Add Intel ICL-NNPI SoC support EDAC/i10nm: Add support for high bandwidth memory EDAC/i10nm: Add detection of memory levels for ICX/SPR servers EDAC/skx_common: Add new ADXL components for 2-level memory MAINTAINERS: Make Yazen Ghannam maintainer for EDAC-AMD64 EDAC/aspeed: Use proper format string for printing resource EDAC/ti: Add missing MODULE_DEVICE_TABLE EDAC/thunderx: Remove irrelevant variable from error messages
-