- 19 Jul, 2022 1 commit
-
-
Moshe Shemesh authored
Add a lock_class_key per devlink instance to avoid DEADLOCK warning by lockdep, while locking more than one devlink instance in driver code, for example in opening VFs flow. Kernel log: [ 101.433802] ============================================ [ 101.433803] WARNING: possible recursive locking detected [ 101.433810] 5.19.0-rc1+ #35 Not tainted [ 101.433812] -------------------------------------------- [ 101.433813] bash/892 is trying to acquire lock: [ 101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core] [ 101.433909] but task is already holding lock: [ 101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.433989] other info that might help us debug this: [ 101.433990] Possible unsafe locking scenario: [ 101.433991] CPU0 [ 101.433991] ---- [ 101.433992] lock(&devlink->lock); [ 101.433993] lock(&devlink->lock); [ 101.433995] *** DEADLOCK *** [ 101.433996] May be due to missing lock nesting notation [ 101.433996] 6 locks held by bash/892: [ 101.433998] #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0 [ 101.434009] #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520 [ 101.434017] #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520 [ 101.434023] #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310 [ 101.434031] #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.434108] #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430 [ 101.434116] stack backtrace: [ 101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35 [ 101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 101.434130] Call Trace: [ 101.434133] <TASK> [ 101.434135] dump_stack_lvl+0x57/0x7d [ 101.434145] __lock_acquire.cold+0x1df/0x3e7 [ 101.434151] ? register_lock_class+0x1880/0x1880 [ 101.434157] lock_acquire+0x1c1/0x550 [ 101.434160] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434229] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434232] ? __xa_alloc+0x1ed/0x2d0 [ 101.434236] ? ksys_write+0xf3/0x1d0 [ 101.434239] __mutex_lock+0x12c/0x14b0 [ 101.434243] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434312] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434380] ? devlink_alloc_ns+0x11b/0x910 [ 101.434385] ? mutex_lock_io_nested+0x1320/0x1320 [ 101.434388] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434391] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434393] ? __init_swait_queue_head+0x70/0xd0 [ 101.434397] probe_one+0x3c/0x690 [mlx5_core] [ 101.434467] pci_device_probe+0x1b4/0x480 [ 101.434471] really_probe+0x1e0/0xaa0 [ 101.434474] __driver_probe_device+0x219/0x480 [ 101.434478] driver_probe_device+0x49/0x130 [ 101.434481] __device_attach_driver+0x1b8/0x280 [ 101.434484] ? driver_allows_async_probing+0x140/0x140 [ 101.434487] bus_for_each_drv+0x123/0x1a0 [ 101.434489] ? bus_for_each_dev+0x1a0/0x1a0 [ 101.434491] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434494] ? trace_hardirqs_on+0x2d/0x100 [ 101.434498] __device_attach+0x1a3/0x430 [ 101.434501] ? device_driver_attach+0x1e0/0x1e0 [ 101.434503] ? pci_bridge_d3_possible+0x1e0/0x1e0 [ 101.434506] ? pci_create_resource_files+0xeb/0x190 [ 101.434511] pci_bus_add_device+0x6c/0xa0 [ 101.434514] pci_iov_add_virtfn+0x9e4/0xe00 [ 101.434517] ? trace_hardirqs_on+0x2d/0x100 [ 101.434521] sriov_enable+0x64a/0xca0 [ 101.434524] ? pcibios_sriov_disable+0x10/0x10 [ 101.434528] mlx5_core_sriov_configure+0xab/0x280 [mlx5_core] [ 101.434602] sriov_numvfs_store+0x20a/0x310 [ 101.434605] ? sriov_totalvfs_show+0xc0/0xc0 [ 101.434608] ? sysfs_file_ops+0x170/0x170 [ 101.434611] ? sysfs_file_ops+0x117/0x170 [ 101.434614] ? sysfs_file_ops+0x170/0x170 [ 101.434616] kernfs_fop_write_iter+0x348/0x520 [ 101.434619] new_sync_write+0x2e5/0x520 [ 101.434621] ? new_sync_read+0x520/0x520 [ 101.434624] ? lock_acquire+0x1c1/0x550 [ 101.434626] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434630] vfs_write+0x5cb/0x8d0 [ 101.434633] ksys_write+0xf3/0x1d0 [ 101.434635] ? __x64_sys_read+0xb0/0xb0 [ 101.434638] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434640] ? syscall_enter_from_user_mode+0x1d/0x50 [ 101.434643] do_syscall_64+0x3d/0x90 [ 101.434647] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 101.434650] RIP: 0033:0x7f5ff536b2f7 [ 101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7 [ 101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001 [ 101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001 [ 101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002 [ 101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700 [ 101.434673] </TASK> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 18 Jul, 2022 21 commits
-
-
Sieng-Piaw Liew authored
Use netif_napi_add_tx() for NAPI in Tx direction instead of the regular netif_napi_add() function. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arun Ramadoss authored
This patch removes the of_match_ptr() pointer when dereferencing the ksz_dt_ids which produce the unused variable warning. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== tls: rx: avoid skb_cow_data() TLS calls skb_cow_data() on the skb it received from strparser whenever it needs to hold onto the skb with the decrypted data. (The alternative being decrypting directly to a user space buffer in whic case the input skb doesn't get modified or used after.) TLS needs the decrypted skb: - almost always with TLS 1.3 (unless the new NoPad is enabled); - when user space buffer is too small to fit the record; - when BPF sockmap is enabled. Most of the time the skb we get out of strparser is a clone of a 64kB data unit coalsced by GRO. To make things worse skb_cow_data() tries to output a linear skb and allocates it with GFP_ATOMIC. This occasionally fails even under moderate memory pressure. This patch set rejigs the TLS Rx so that we don't expect decryption in place. The decryption handlers return an skb which may or may not be the skb from strparser. For TLS 1.3 this results in a 20-30% performance improvement without NoPad enabled. v2: rebase after 3d8c51b2 ("net/tls: Check for errors in tls_device_init") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We currently CoW Rx skbs whenever we can't decrypt to a user space buffer. The skbs can be enormous (64kB) and CoW does a linear alloc which has a strong chance of failing under memory pressure. Or even without, skb_cow_data() assumes GFP_ATOMIC. Allocate a new frag'd skb and decrypt into it. We finally take advantage of the decrypted skb getting returned via darg. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
The "zero-copy" path in SW TLS will engage either for no skbs or for all but last. If the recvmsg parameters are right and the socket can do ZC we'll ZC until the iterator can't fit a full record at which point we'll decrypt one more record and copy over the necessary bits to fill up the request. The only reason we hold onto the ZC skbs which went thru the async path until the end of recvmsg() is to count bytes. We need an accurate count of zc'ed bytes so that we can calculate how much of the non-zc'd data to copy. To allow freeing input skbs on the ZC path count only how much of the list we'll need to consume. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Async crypto currently benefits from the fact that we decrypt in place. When we allow input and output to be different skbs we will have to hang onto the input while we move to the next record. Clone the inputs and keep them on a list. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Async crypto TLS Rx currently waits for crypto to be done in order to strip the TLS header and tailer. Simplify the code by moving the pointers immediately, since only TLS 1.2 is supported here there is no message padding. This simplifies the decryption into a new skb in the next patch as we don't have to worry about input vs output skb in the decrypt_done() handler any more. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Instead of using ctx->recv_pkt after decryption read the skb from darg.skb. This moves the decision of what the "output skb" is to the decrypt handlers. For now after decrypt handler returns successfully ctx->recv_pkt is simply moved to darg.skb, but it will change soon. Note that tls_decrypt_sg() cannot clear the ctx->recv_pkt because it gets called to re-encrypt (i.e. by the device offload). So we need an awkward temporary if() in tls_rx_one_record(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Callers always pass ctx->recv_pkt into decrypt_skb_update(), and it propagates it to its callees. This may give someone the false impression that those functions can accept any valid skb containing a TLS record. That's not the case, the record sequence number is read from the context, and they can only take the next record coming out of the strp. Let the functions get the skb from the context instead of passing it in. This will also make it cleaner to return a different skb than ctx->recv_pkt as the decrypted one later on. Since we're touching the definition of decrypt_skb_update() use this as an opportunity to rename it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
I already forgot to transform darg from input to output semantics once on the NIC inline crypto fastpath. To avoid this happening again create a device equivalent of decrypt_internal(). A function responsible for decryption and transforming darg. While at it rename decrypt_internal() to a hopefully slightly more meaningful name. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We no longer allow a decrypted skb to remain linked to ctx->recv_pkt. Anything on the list is decrypted, anything on ctx->recv_pkt needs to be decrypted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Detach the skb from ctx->recv_pkt after decryption is done, even if we can't consume it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
I thought that having the skb either always on the ctx->rx_list or ctx->recv_pkt will simplify the handling, as we would not have to remember to flip it from one to the other on exit paths. This became a little harder to justify with the fix for BPF sockmaps. Subsequent changes will make the situation even worse. Queue the skbs only when really needed. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
recvmsg() in TLS gets data from the skb list (rx_list) or fresh skbs we read from TCP via strparser. The former holds skbs which were already decrypted for peek or decrypted and partially consumed. tls_wait_data() only notices appearance of fresh skbs coming out of TCP (or psock). It is possible, if there is a concurrent call to peek() and recv() that the peek() will move the data from input to rx_list without recv() noticing. recv() will then read data out of order or never wake up. This is not a practical use case/concern, but it makes the self tests less reliable. This patch solves the problem by allowing only one reader in. Because having multiple processes calling read()/peek() is not normal avoid adding a lock and try to fast-path the single reader case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Wen Gu says: ==================== net/smc: Introduce virtually contiguous buffers for SMC-R On long-running enterprise production servers, high-order contiguous memory pages are usually very rare and in most cases we can only get fragmented pages. When replacing TCP with SMC-R in such production scenarios, attempting to allocate high-order physically contiguous sndbufs and RMBs may result in frequent memory compaction, which will cause unexpected hung issue and further stability risks. So this patch set is aimed to allow SMC-R link group to use virtually contiguous sndbufs and RMBs to avoid potential issues mentioned above. Whether to use physically or virtually contiguous buffers can be set by sysctl smcr_buf_type. Note that using virtually contiguous buffers will bring an acceptable performance regression, which can be mainly divided into two parts: 1) regression in data path, which is brought by additional address translation of sndbuf by RNIC in Tx. But in general, translating address through MTT is fast. According to qperf test, this part regression is basically less than 10% in latency and bandwidth. (see patch 5/6 for details) 2) regression in buffer initialization and destruction path, which is brought by additional MR operations of sndbufs. But thanks to link group buffer reuse mechanism, the impact of this kind of regression decreases as times of buffer reuse increases. Patch set overview: - Patch 1/6 and 2/6 mainly about simplifying and optimizing DMA sync operation, which will reduce overhead on the data path, especially when using virtually contiguous buffers; - Patch 3/6 and 4/6 introduce a sysctl smcr_buf_type to set the type of buffers in new created link group; - Patch 5/6 allows SMC-R to use virtually contiguous sndbufs and RMBs, including buffer creation, destruction, MR operation and access; - patch 6/6 extends netlink attribute for buffer type of SMC-R link group; v1->v2: - Patch 5/6 fixes build issue on 32bit; - Patch 3/6 adds description of new sysctl in smc-sysctl.rst; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wen Gu authored
Extend SMC-R link group netlink attribute SMC_GEN_LGR_SMCR. Introduce SMC_NLA_LGR_R_BUF_TYPE to show the buffer type of SMC-R link group. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wen Gu authored
On long-running enterprise production servers, high-order contiguous memory pages are usually very rare and in most cases we can only get fragmented pages. When replacing TCP with SMC-R in such production scenarios, attempting to allocate high-order physically contiguous sndbufs and RMBs may result in frequent memory compaction, which will cause unexpected hung issue and further stability risks. So this patch is aimed to allow SMC-R link group to use virtually contiguous sndbufs and RMBs to avoid potential issues mentioned above. Whether to use physically or virtually contiguous buffers can be set by sysctl smcr_buf_type. Note that using virtually contiguous buffers will bring an acceptable performance regression, which can be mainly divided into two parts: 1) regression in data path, which is brought by additional address translation of sndbuf by RNIC in Tx. But in general, translating address through MTT is fast. Taking 256KB sndbuf and RMB as an example, the comparisons in qperf latency and bandwidth test with physically and virtually contiguous buffers are as follows: - client: smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\ -t 5 -vu tcp_{bw|lat} - server: smc_run taskset -c <cpu> qperf [latency] msgsize tcp smcr smcr-use-virt-buf 1 11.17 us 7.56 us 7.51 us (-0.67%) 2 10.65 us 7.74 us 7.56 us (-2.31%) 4 11.11 us 7.52 us 7.59 us ( 0.84%) 8 10.83 us 7.55 us 7.51 us (-0.48%) 16 11.21 us 7.46 us 7.51 us ( 0.71%) 32 10.65 us 7.53 us 7.58 us ( 0.61%) 64 10.95 us 7.74 us 7.80 us ( 0.76%) 128 11.14 us 7.83 us 7.87 us ( 0.47%) 256 10.97 us 7.94 us 7.92 us (-0.28%) 512 11.23 us 7.94 us 8.20 us ( 3.25%) 1024 11.60 us 8.12 us 8.20 us ( 0.96%) 2048 14.04 us 8.30 us 8.51 us ( 2.49%) 4096 16.88 us 9.13 us 9.07 us (-0.64%) 8192 22.50 us 10.56 us 11.22 us ( 6.26%) 16384 28.99 us 12.88 us 13.83 us ( 7.37%) 32768 40.13 us 16.76 us 16.95 us ( 1.16%) 65536 68.70 us 24.68 us 24.85 us ( 0.68%) [bandwidth] msgsize tcp smcr smcr-use-virt-buf 1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%) 2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%) 4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%) 8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%) 16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%) 32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%) 64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%) 128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%) 256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%) 512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%) 1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%) 2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%) 4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%) 8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%) 16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%) 32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%) 65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%) 2) regression in buffer initialization and destruction path, which is brought by additional MR operations of sndbufs. But thanks to link group buffer reuse mechanism, the impact of this kind of regression decreases as times of buffer reuse increases. Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R buffer-related function obtained by bpftrace are as follows: Function Phys-bufs Virt-bufs smcr_new_buf_create() 67154 ns 79164 ns smc_ib_buf_map_sg() 525 ns 928 ns smc_ib_get_memory_region() 162294 ns 161191 ns smc_wr_reg_send() 9957 ns 9635 ns smc_ib_put_memory_region() 203548 ns 198374 ns smc_ib_buf_unmap_sg() 508 ns 1158 ns ------------ Test environment notes: 1. Above tests run on 2 VMs within the same Host. 2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to the each VM respectively. 3. VMs' vCPUs are binded to different physical CPUs, and the binded physical CPUs are isolated by `isolcpus=xxx` cmdline. 4. NICs' queue number are set to 1. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wen Gu authored
This patch introduces a new SMC-R specific element buf_type in struct smc_link_group, for recording the value of sysctl smcr_buf_type when link group is created. New created link group will create and reuse buffers of the type specified by buf_type. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wen Gu authored
This patch introduces the sysctl smcr_buf_type for setting the type of SMC-R sndbufs and RMBs. Valid values includes: - SMCR_PHYS_CONT_BUFS, which means use physically contiguous buffers for better performance and is the default value. - SMCR_VIRT_CONT_BUFS, which means use virtually contiguous buffers in case of physically contiguous memory is scarce. - SMCR_MIXED_BUFS, which means first try to use physically contiguous buffers. If not available, then use virtually contiguous buffers. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guangguan Wang authored
Some CPU, such as Xeon, can guarantee DMA cache coherency. So it is no need to use dma sync APIs to flush cache on such CPUs. In order to avoid calling dma sync APIs on the IO path, use the dma_need_sync to check whether smc_buf_desc needs dma sync when creating smc_buf_desc. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guangguan Wang authored
smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache consistency. Smc sndbufs are dma buffers, where CPU writes data to it and PCIE device reads data from it. So for sndbufs, smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is redundant as PCIE device will not write the buffers. Smc rmbs are dma buffers, where PCIE device write data to it and CPU read data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Jul, 2022 4 commits
-
-
Jakub Kicinski authored
Jaehee Park says: ==================== net: ipv4/ipv6: new option to accept garp/untracked na only if in-network The first patch adds an option to learn a neighbor from garp only if the source ip is in the same subnet as an address configured on the interface that received the garp message. The option has been added to arp_accept in ipv4. The same feature has been added to ndisc (patch 2). For ipv6, the subnet filtering knob is an extension of the accept_untracked_na option introduced in these patches: https://lore.kernel.org/all/642672cb-8b11-c78f-8975-f287ece9e89e@gmail.com/t/ https://lore.kernel.org/netdev/20220530101414.65439-1-aajith@arista.com/T/ The third patch contains selftests for testing the different options for accepting arp and neighbor advertisements. ==================== Link: https://lore.kernel.org/r/cover.1657755188.git.jhpark1013@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jaehee Park authored
ipv4 arp_accept has a new option '2' to create new neighbor entries only if the src ip is in the same subnet as an address configured on the interface that received the garp message. This selftest tests all options in arp_accept. ipv6 has a sysctl endpoint, accept_untracked_na, that defines the behavior for accepting untracked neighbor advertisements. A new option similar to that of arp_accept for learning only from the same subnet is added to accept_untracked_na. This selftest tests this new feature. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jaehee Park authored
This patch adds a third knob, '2', which extends the accept_untracked_na option to learn a neighbor only if the src ip is in the same subnet as an address configured on the interface that received the neighbor advertisement. This is similar to the arp_accept configuration for ipv4. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jaehee Park authored
In many deployments, we want the option to not learn a neighbor from garp if the src ip is not in the same subnet as an address configured on the interface that received the garp message. net.ipv4.arp_accept sysctl is currently used to control creation of a neigh from a received garp packet. This patch adds a new option '2' to net.ipv4.arp_accept which extends option '1' by including the subnet check. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 15 Jul, 2022 9 commits
-
-
Sunil Goutham authored
When number of LMACs active on a CGX/RPM are 3, then current NIX link credit config based on per lmac fifo length which inturn is calculated as 'lmac_fifo_len = total_fifo_len / 3', is incorrect. In HW one of the LMAC gets half of the FIFO and rest gets 1/4th. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha Sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ratheesh Kannoth authored
Fixes smatch static tool warning reported by smatch tool. rvu_npc_hash.c:1232 rvu_npc_exact_del_table_entry_by_id() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1312 rvu_npc_exact_add_table_entry() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1391 rvu_npc_exact_update_table_entry() error: uninitialized symbol 'hash_index'. rvu_npc_hash.c:1428 rvu_npc_exact_promisc_disable() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1473 rvu_npc_exact_promisc_enable() error: uninitialized symbol 'drop_mcam_idx'. otx2_dmac_flt.c:191 otx2_dmacflt_update() error: 'rsp' dereferencing possible ERR_PTR() otx2_dmac_flt.c:60 otx2_dmacflt_add_pfmac() error: 'rsp' dereferencing possible ERR_PTR() Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Marangi authored
Move qca8k driver to qca dir in preparation for code split and introduction of ipq4019 switch based on qca8k. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peilin Ye authored
delay_timer has been unused since commit c3498d34 ("cbq: remove TCA_CBQ_OVL_STRATEGY support"). Delete it. Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5-updates-2022-07-13 1) Support 802.1ad for bridge offloads Vlad Buslov Says: ================= Current mlx5 bridge VLAN offload implementation only supports 802.1Q VLAN Ethernet protocol. That protocol type is assumed by default and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification is ignored. In order to support dynamically setting VLAN protocol handle SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification by flushing FDB and re-creating VLAN modify header actions with a new protocol. Implement support for 802.1ad protocol by saving the current VLAN protocol to per-bridge variable and re-create the necessary flow groups according to its current value (either use cvlan or svlan flow fields). ================== 2) debugfs to count ongoing FW commands 3) debugfs to query eswitch vport firmware diagnostic counters 4) Add missing meter configuration in flow action 5) Some misc cleanup * tag 'mlx5-updates-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Remove the duplicating check for striding RQ when enabling LRO net/mlx5e: Move the LRO-XSK check to mlx5e_fix_features net/mlx5e: Extend flower police validation net/mlx5e: configure meter in flow action net/mlx5e: Removed useless code in function net/mlx5: Bridge, implement QinQ support net/mlx5: Bridge, implement infrastructure for VLAN protocol change net/mlx5: Bridge, extract VLAN push/pop actions creation net/mlx5: Bridge, rename filter fg to vlan_filter net/mlx5: Bridge, refactor groups sizes and indices net/mlx5: debugfs, Add num of in-use FW command interface slots net/mlx5: Expose vnic diagnostic counters for eswitch managed vports net/mlx5: Use software VHCA id when it's supported net/mlx5: Introduce ifc bits for using software vhca id net/mlx5: Use the bitmap API to allocate bitmaps ==================== Link: https://lore.kernel.org/r/20220713225859.401241-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jiri Pirko says: ==================== net: devlink: couple of trivial fixes Just a couple of trivial fixes I found on the way. ==================== Link: https://lore.kernel.org/r/20220713141853.2992014-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Return directly without intermediate value store at the end of devlink_port_new_notify() function. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
Fix the typo in a name of devlink_port_new_notifiy() function. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Pirko authored
The return value is not used, so change the return value type to void. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 14 Jul, 2022 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
include/net/sock.h 310731e2 ("net: Fix data-races around sysctl_mem.") e70f3c70 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c 747c1430 ("ip: fix dflt addr selection for connected nexthop") d62607c3 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b2 ("net/tls: Check for errors in tls_device_init") 58790314 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Nathan Chancellor authored
Clang warns: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] DEFINE_PER_CPU(u64, x86_spec_ctrl_current); ^ arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here extern u64 x86_spec_ctrl_current; ^ 1 error generated. The declaration should be using DECLARE_PER_CPU instead so all attributes stay in sync. Cc: stable@vger.kernel.org Fixes: fc02735b ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bpf and wireless. Still no major regressions, the release continues to be calm. An uptick of fixes this time around due to trivial data race fixes and patches flowing down from subtrees. There has been a few driver fixes (particularly a few fixes for false positives due to 66e4c8d9 which went into -next in May!) that make me worry the wide testing is not exactly fully through. So "calm" but not "let's just cut the final ASAP" vibes over here. Current release - regressions: - wifi: rtw88: fix write to const table of channel parameters Current release - new code bugs: - mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify - mlx5: - TC, allow offload from uplink to other PF's VF - Lag, decouple FDB selection and shared FDB - Lag, correct get the port select mode str - bnxt_en: fix and simplify XDP transmit path - r8152: fix accessing unset transport header Previous releases - regressions: - conntrack: fix crash due to confirmed bit load reordering (after atomic -> refcount conversion) - stmmac: dwc-qos: disable split header for Tegra194 Previous releases - always broken: - mlx5e: ring the TX doorbell on DMA errors - bpf: make sure mac_header was set before using it - mac80211: do not wake queues on a vif that is being stopped - mac80211: fix queue selection for mesh/OCB interfaces - ip: fix dflt addr selection for connected nexthop - seg6: fix skb checksums for SRH encapsulation/insertion - xdp: fix spurious packet loss in generic XDP TX path - bunch of sysctl data race fixes - nf_log: incorrect offset to network header Misc: - bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs" * tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) nfp: flower: configure tunnel neighbour on cmsg rx net/tls: Check for errors in tls_device_init MAINTAINERS: Add an additional maintainer to the AMD XGBE driver xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue selftests/net: test nexthop without gw ip: fix dflt addr selection for connected nexthop net: atlantic: remove aq_nic_deinit() when resume net: atlantic: remove deep parameter on suspend/resume functions sfc: fix kernel panic when creating VF seg6: bpf: fix skb checksum in bpf_push_seg6_encap() seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors seg6: fix skb checksum evaluation in SRH encapsulation/insertion sfc: fix use after free when disabling sriov net: sunhme: output link status with a single print. r8152: fix accessing unset transport header net: stmmac: fix leaks in probe net: ftgmac100: Hold reference returned by of_get_child_by_name() nexthop: Fix data-races around nexthop_compat_mode. ipv4: Fix data-races around sysctl_ip_dynaddr. tcp: Fix a data-race around sysctl_tcp_ecn_fallback. ...
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cifs fixes from Steve French: "Three smb3 client fixes: - two multichannel fixes: fix a potential deadlock freeing a channel, and fix a race condition on failed creation of a new channel - mount failure fix: work around a server bug in some common older Samba servers by avoiding padding at the end of the negotiate protocol request" * tag '5.19-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: workaround negprot bug in some Samba servers cifs: remove unnecessary locking of chan_lock while freeing session cifs: fix race condition with delayed threads
-
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds authored
Pull nfsd fixes from Chuck Lever: "Notable regression fixes: - Enable SETATTR(time_create) to fix regression with Mac OS clients - Fix a lockd crasher and broken NLM UNLCK behavior" * tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: lockd: fix nlm_close_files lockd: set fl_owner when unlocking files NFSD: Decode NFSv4 birth time attribute
-