- 20 Dec, 2019 8 commits
-
-
Rahul Lakkireddy authored
Properly initialize refcount to 1 when hardware queue arrays for TC-MQPRIO offload have been freshly allocated. Otherwise, following warning is observed. Also fix up error path to only free hardware queue arrays when refcount reaches 0. [ 130.075342] ------------[ cut here ]------------ [ 130.075343] refcount_t: addition on 0; use-after-free. [ 130.075355] WARNING: CPU: 0 PID: 10870 at lib/refcount.c:25 refcount_warn_saturate+0xe1/0x100 [ 130.075356] Modules linked in: sch_mqprio iptable_nat ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_umad iw_cxgb4 libcxgb ib_uverbs x86_pkg_temp_thermal cxgb4 igb [ 130.075361] CPU: 0 PID: 10870 Comm: tc Kdump: loaded Not tainted 5.5.0-rc1+ #11 [ 130.075362] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2 01/16/2015 [ 130.075363] RIP: 0010:refcount_warn_saturate+0xe1/0x100 [ 130.075364] Code: e8 14 41 c1 ff 0f 0b c3 80 3d 44 f4 10 01 00 0f 85 63 ff ff ff 48 c7 c7 38 9f 83 8c 31 c0 c6 05 2e f4 10 01 01 e8 ef 40 c1 ff <0f> 0b c3 48 c7 c7 10 9f 83 8c 31 c0 c6 05 17 f4 10 01 01 e8 d7 40 [ 130.075365] RSP: 0018:ffffa48d00c0b768 EFLAGS: 00010286 [ 130.075366] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001 [ 130.075366] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffff8a2e9fa187d0 [ 130.075367] RBP: ffff8a2e93890000 R08: 0000000000000398 R09: 000000000000003c [ 130.075367] R10: 00000000000142a0 R11: 0000000000000397 R12: ffffa48d00c0b848 [ 130.075368] R13: ffff8a2e94746498 R14: ffff8a2e966f7000 R15: 0000000000000031 [ 130.075368] FS: 00007f689015f840(0000) GS:ffff8a2e9fa00000(0000) knlGS:0000000000000000 [ 130.075369] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 130.075369] CR2: 00000000006762a0 CR3: 00000007cf164005 CR4: 00000000001606f0 [ 130.075370] Call Trace: [ 130.075377] cxgb4_setup_tc_mqprio+0xbee/0xc30 [cxgb4] [ 130.075382] ? cxgb4_ethofld_restart+0x50/0x50 [cxgb4] [ 130.075384] ? pfifo_fast_init+0x7e/0xf0 [ 130.075386] mqprio_init+0x5f4/0x630 [sch_mqprio] [ 130.075389] qdisc_create+0x1bf/0x4a0 [ 130.075390] tc_modify_qdisc+0x1ff/0x770 [ 130.075392] rtnetlink_rcv_msg+0x28b/0x350 [ 130.075394] ? rtnl_calcit.isra.32+0x110/0x110 [ 130.075395] netlink_rcv_skb+0xc6/0x100 [ 130.075396] netlink_unicast+0x1db/0x330 [ 130.075397] netlink_sendmsg+0x2f5/0x460 [ 130.075399] ? _copy_from_user+0x2e/0x60 [ 130.075400] sock_sendmsg+0x59/0x70 [ 130.075401] ____sys_sendmsg+0x1f0/0x230 [ 130.075402] ? copy_msghdr_from_user+0xd7/0x140 [ 130.075403] ___sys_sendmsg+0x77/0xb0 [ 130.075404] ? ___sys_recvmsg+0x84/0xb0 [ 130.075406] ? __handle_mm_fault+0x377/0xaf0 [ 130.075407] __sys_sendmsg+0x53/0xa0 [ 130.075409] do_syscall_64+0x44/0x130 [ 130.075412] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 130.075413] RIP: 0033:0x7f688f13af10 [ 130.075414] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 [ 130.075414] RSP: 002b:00007ffe6c7d9988 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 130.075415] RAX: ffffffffffffffda RBX: 00000000006703a0 RCX: 00007f688f13af10 [ 130.075415] RDX: 0000000000000000 RSI: 00007ffe6c7d99f0 RDI: 0000000000000003 [ 130.075416] RBP: 000000005df38312 R08: 0000000000000002 R09: 0000000000008000 [ 130.075416] R10: 00007ffe6c7d93e0 R11: 0000000000000246 R12: 0000000000000000 [ 130.075417] R13: 00007ffe6c7e9c50 R14: 0000000000000001 R15: 000000000067c600 [ 130.075418] ---[ end trace 8fbb3bf36a8671db ]--- v2: - Move the refcount_set() closer to where the hardware queue arrays are being allocated. - Fix up error path to only free hardware queue arrays when refcount reaches 0. Fixes: 2d0cb84d ("cxgb4: add ETHOFLD hardware queue support") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Davide Caratti says: ==================== net/sched: cls_u32: fix refcount leak a refcount leak in the error path of u32_change() has been recently introduced. It can be observed with the following commands: [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 97 \ > u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action drop RTNETLINK answers: Invalid argument We have an error talking to the kernel [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 98 \ > handle 42:42 u32 divisor 256 Error: cls_u32: Divisor can only be used on a hash table. We have an error talking to the kernel [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 99 \ > u32 ht 47:47 Error: cls_u32: Specified hash table not found. We have an error talking to the kernel they all legitimately return -EINVAL; however, they leave semi-configured filters at eth0 tc ingress: [root@f31 ~]# tc filter show dev eth0 ingress filter protocol ip pref 97 u32 chain 0 filter protocol ip pref 97 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 98 u32 chain 0 filter protocol ip pref 98 u32 chain 0 fh 801: ht divisor 1 filter protocol ip pref 99 u32 chain 0 filter protocol ip pref 99 u32 chain 0 fh 802: ht divisor 1 With older kernels, filters were unconditionally considered empty (and thus de-refcounted) on the error path of ->change(). After commit 8b64678e ("net: sched: refactor tp insert/delete for concurrent execution"), filters were considered empty when the walk() function didn't set 'walker.stop' to 1. Finally, with commit 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty"), tc filters are considered empty unless the walker function is called with a non-NULL handle. This last change doesn't fit cls_u32 design, because at least the "root hnode" is (almost) always non-NULL, as it's allocated in u32_init(). - patch 1/2 is a proposal to restore the original kernel behavior, where no filter was installed in the error path of u32_change(). - patch 2/2 adds tdc selftests that can be ued to verify the correct behavior of u32 in the error path of ->change(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
- move test "e9a3 - Add u32 with source match" to u32.json, and change the match pattern to catch all hnodes - add testcases for relevant error paths of cls_u32 module Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
when users replace cls_u32 filters with new ones having wrong parameters, so that u32_change() fails to validate them, the kernel doesn't roll-back correctly, and leaves semi-configured rules. Fix this in u32_walk(), avoiding a call to the walker function on filters that don't have a match rule connected. The side effect is, these "empty" filters are not even dumped when present; but that shouldn't be a problem as long as we are restoring the original behaviour, where semi-configured filters were not even added in the error path of u32_change(). Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Aditya Pakki authored
In s3fwrn5_fw_recv_frame, if fw_info->rsp is not empty, the current code causes a crash via BUG_ON. However, s3fwrn5_fw_send_msg does not crash in such a scenario. The patch replaces the BUG_ON by returning the error to the callers and frees up skb. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Antoine Tenart says: ==================== net: macb: fix probing of PHY not described in the dt The macb Ethernet driver supports various ways of referencing its network PHY. When a device tree is used the PHY can be referenced with a phy-handle or, if connected to its internal MDIO bus, described in a child node. Some platforms omitted the PHY description while connecting the PHY to the internal MDIO bus and in such cases the MDIO bus has to be scanned "manually" by the macb driver. Prior to the phylink conversion the driver registered the MDIO bus with of_mdiobus_register and then in case the PHY couldn't be retrieved using dt or using phy_find_first (because registering an MDIO bus with of_mdiobus_register masks all PHYs) the macb driver was "manually" scanning the MDIO bus (like mdiobus_register does). The phylink conversion did break this particular case but reimplementing the manual scan of the bus in the macb driver wouldn't be very clean. The solution seems to be registering the MDIO bus based on if the PHYs are described in the device tree or not. There are multiple ways to do this, none is perfect. I chose to check if any of the child nodes of the macb node was a network PHY and based on this to register the MDIO bus with the of_ helper or not. The drawback is boards referencing the PHY through phy-handle, would scan the entire MDIO bus of the macb at boot time (as the MDIO bus would be registered with mdiobus_register). For this solution to work properly of_mdiobus_child_is_phy has to be exported, which means the patch doing so has to be backported to -stable as well. Another possible solution could have been to simply check if the macb node has a child node by counting its sub-nodes. This isn't techically perfect, as there could be other sub-nodes (in practice this should be fine, fixed-link being taken care of in the driver). We could also simply s/of_mdiobus_register/mdiobus_register/ but that could break boards using the PHY description in child node as a selector (which really would be not a proper way to do this...). The real issue here being having PHYs not described in the dt but we have dt backward compatibility, so we have to live with that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch fixes the case where the PHY isn't described in the device tree. This is due to the way the MDIO bus is registered in the driver: whether the PHY is described in the device tree or not, the bus is registered through of_mdiobus_register. The function masks all the PHYs and only allow probing the ones described in the device tree. Prior to the Phylink conversion this was also done but later on in the driver the MDIO bus was manually scanned to circumvent the fact that the PHY wasn't described. This patch fixes it in a proper way, by registering the MDIO bus based on if the PHY attached to a given interface is described in the device tree or not. Fixes: 7897b071 ("net: macb: convert to phylink") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
This patch exports of_mdiobus_child_is_phy, allowing to check if a child node is a network PHY. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Dec, 2019 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Expand dummy prog generation such that we can easily check on return codes and add few more test cases to make sure we keep on tracking pruning behavior. # ./test_verifier [...] #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED Also verified that JIT dump of added test cases looks good. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net
-
Daniel Borkmann authored
While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort implementation, I noticed a strange BPF SNAT engine behavior from time to time. In some cases it would do the correct SNAT/DNAT service translation, but at a random point in time it would just stop and perform an unexpected translation after SYN, SYN/ACK and stack would send a RST back. While initially assuming that there is some sort of a race condition in BPF code, adding trace_printk()s for debugging purposes at some point seemed to have resolved the issue auto-magically. Digging deeper on this Heisenbug and reducing the trace_printk() calls to an absolute minimum, it turns out that a single call would suffice to trigger / not trigger the seen RST issue, even though the logic of the program itself remains unchanged. Turns out the single call changed verifier pruning behavior to get everything to work. Reconstructing a minimal test case, the incorrect JIT dump looked as follows: # bpftool p d j i 11346 0xffffffffc0cba96c: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff89cc74e85800,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff89cc74e85800,%rsi 49: mov -0x224(%rbp),%eax 4f: cmp $0x20,%eax 52: ja 0x0000000000000062 54: add $0x1,%eax 57: mov %eax,-0x224(%rbp) 5d: jmpq 0xffffffffffff6911 62: mov $0x1,%eax [...] Hence, unexpectedly, JIT emitted a direct jump even though retpoline based one would have been needed since in line 2c and 3a we have different slot keys in BPF reg r3. Verifier log of the test case reveals what happened: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0xffff89cc74d54a00 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0xffff89cc74d54a00 11: safe processed 13 insns (limit 1000000) [...] Second branch is pruned by verifier since considered safe, but issue is that record_func_key() couldn't have seen the index in line 3a and therefore decided that emitting a direct jump at this location was okay. Fix this by reusing our backtracking logic for precise scalar verification in order to prevent pruning on the slot key. This means verifier will track content of r3 all the way backwards and only prune if both scalars were unknown in state equivalence check and therefore poisoned in the first place in record_func_key(). The range is [x,x] in record_func_key() case since the slot always would have to be constant immediate. Correct verification after fix: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0x0 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0x0 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit processed 15 insns (limit 1000000) [...] And correct corresponding JIT dump: # bpftool p d j i 11 0xffffffffc0dc34c4: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff9928b4c02200,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff9928b4c02200,%rsi 49: cmp $0x4,%rdx 4d: jae 0x0000000000000093 4f: and $0x3,%edx 52: mov %edx,%edx 54: cmp %edx,0x24(%rsi) 57: jbe 0x0000000000000093 59: mov -0x224(%rbp),%eax 5f: cmp $0x20,%eax 62: ja 0x0000000000000093 64: add $0x1,%eax 67: mov %eax,-0x224(%rbp) 6d: mov 0x110(%rsi,%rdx,8),%rax 75: test %rax,%rax 78: je 0x0000000000000093 7a: mov 0x30(%rax),%rax 7e: add $0x19,%rax 82: callq 0x000000000000008e 87: pause 89: lfence 8c: jmp 0x0000000000000087 8e: mov %rax,(%rsp) 92: retq 93: mov $0x1,%eax [...] Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since backtracking is enabled under former (direct jumps as well, but use different test). In case of only tracking different map pointers as in c93552c4 ("bpf: properly enforce index mask to prevent out-of-bounds speculation"), pruning cannot make such short-cuts, neither if there are paths with scalar and non-scalar types as r3. mark_chain_precision() is only needed after we know that register_is_const(). If it was not the case, we already poison the key on first path and non-const key in later paths are not matching the scalar range in regsafe() either. Cilium NodePort testing passes fine as well now. Note, released kernels not affected. Fixes: d2e4c1e6 ("bpf: Constant map key tracking for prog array pokes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net
-
Alexander Lobakin authored
proc_dointvec_minmax_bpf_restricted() has been firstly introduced in commit 2e4a3098 ("bpf: restrict access to core bpf sysctls") under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in ede95a63 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations"), because a new sysctl, bpf_jit_limit, made use of it. Finally, this parameter has become long instead of integer with fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") and thus, a new proc_dolongvec_minmax_bpf_restricted() has been added. With this last change, we got back to that proc_dointvec_minmax_bpf_restricted() is used only under CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been brought back. So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n since v4.20 we have: CC net/core/sysctl_net_core.o net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function] 292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again. Fixes: fdadd049 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191218091821.7080-1-alobakin@dlink.ru
-
Daniel Borkmann authored
Maxim Mikityanskiy says: ==================== This series addresses the issue described in the commit message of the first patch: lack of synchronization between XSK wakeup and destroying the resources used by XSK wakeup. The idea is similar to napi_synchronize. The series contains fixes for the drivers that implement XSK. v2 incorporates changes suggested by Björn: 1. Call synchronize_rcu in Intel drivers only if the XDP program is being unloaded. 2. Don't forget rcu_read_lock when wakeup is called from xsk_poll. 3. Use xs->zc as the condition to call ndo_xsk_wakeup. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Maxim Mikityanskiy authored
Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN. 2. After switching the XDP program, call synchronize_rcu to let ixgbe_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down. 4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to avoid using the XDP ring when it's already destroyed. synchronize_rcu is called from ixgbe_txrx_ring_disable. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-5-maximmi@mellanox.com
-
Maxim Mikityanskiy authored
Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. i40e_down already calls synchronize_rcu. On i40e_down either __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in i40e_xsk_wakeup (the former is already checked there). 2. After switching the XDP program, call synchronize_rcu to let i40e_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down (see i40e_prep_for_reset and i40e_pf_quiesce_all_vsi). 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com
-
Maxim Mikityanskiy authored
After disabling resources necessary for XSK (the XDP program, channels, XSK queues), use synchronize_rcu to wait until the XSK wakeup function finishes, before freeing the resources. Suspend XSK wakeups during switching channels. If the XDP program is being removed, synchronize_rcu before closing the old channels to allow XSK wakeup to complete. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com
-
Maxim Mikityanskiy authored
The XSK wakeup callback in drivers makes some sanity checks before triggering NAPI. However, some configuration changes may occur during this function that affect the result of those checks. For example, the interface can go down, and all the resources will be destroyed after the checks in the wakeup function, but before it attempts to use these resources. Wrap this callback in rcu_read_lock to allow driver to synchronize_rcu before actually destroying the resources. xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup wrapped into the RCU lock. After this commit, xsk_poll starts using xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide ndo_xsk_wakeup should be called. It also fixes a bug introduced with the need_wakeup feature: a non-zero-copy socket may be used with a driver supporting zero-copy, and in this case ndo_xsk_wakeup should not be called, so the xs->zc check is the correct one. Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-2-maximmi@mellanox.com
-
- 18 Dec, 2019 19 commits
-
-
David S. Miller authored
Jose Abreu says: ==================== net: stmmac: Fixes for -net Fixes for stmmac. 1) Fixes the filtering selftests (again) for cases when the number of multicast filters are not enough. 2) Fixes SPH feature for MTU > default. 3) Fixes the behavior of accepting invalid MTU values. 4) Fixes FCS stripping for multi-descriptor packets. 5) Fixes the change of RX buffer size in XGMAC. 6) Fixes RX buffer size alignment. 7) Fixes the 16KB buffer alignment. 8) Fixes the enabling of 16KB buffer size feature. 9) Always arm the TX coalesce timer so that missed interrupts do not cause a TX queue timeout. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
If TX Coalesce timer is enabled we should always arm it, otherwise we may hit the case where an interrupt is missed and the TX Queue will timeout. Arming the timer does not necessarly mean it will run the tx_clean() because this function is wrapped around NAPI launcher. Fixes: 9125cdd1 ("stmmac: add the initial tx coalesce schema") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
XGMAC supports maximum MTU that can go to 16KB. Lets add this check in the calculation of RX buffer size. Fixes: 7ac6653a ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
The 16KB RX Buffer must also be 16 byte aligned. Fix it. Fixes: 7ac6653a ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
We need to align the RX buffer size to at least 16 byte so that IP doesn't mis-behave. This is required by HW. Changes from v2: - Align UP and not DOWN (David) Fixes: 7ac6653a ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
When switching between buffer sizes we need to clear the previous value. Fixes: d6ddfacd ("net: stmmac: Add DMA related callbacks for XGMAC2") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Only the last received buffer contains the FCS field. Check for end of packet before trying to strip the FCS field. Fixes: 88ebe2cf ("net: stmmac: Rework stmmac_rx()") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
The maximum MTU value is determined by the maximum size of TX FIFO so that a full packet can fit in the FIFO. Add a check for this in the MTU change callback. Also check if provided and rounded MTU does not passes the maximum limit of 16K. Changes from v2: - Align MTU before checking if its valid Fixes: 7ac6653a ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Split Header feature needs to know the size of RX buffer but current code is determining it too late. Fix this by moving the RX buffer computation to earlier stage. Changes from v2: - Do not try to align already aligned buffer size Fixes: 67afd6d1 ("net: stmmac: Add Split Header support and enable it in XGMAC cores") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
When running the MC and UC filter tests we setup a multicast address that its expected to be blocked. If the number of available multicast registers is zero, driver will always pass the multicast packets which will fail the test. Check if available multicast addresses is enough before running the tests. Fixes: 091810db ("net: stmmac: Introduce selftests support") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jia-Ju Bai authored
The kernel may sleep while holding a spinlock. The function call path (from bottom to top) in Linux 4.19 is: net/nfc/nci/uart.c, 349: nci_skb_alloc in nci_uart_default_recv_buf net/nfc/nci/uart.c, 255: (FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive net/nfc/nci/uart.c, 254: spin_lock in nci_uart_tty_receive nci_skb_alloc(GFP_KERNEL) can sleep at runtime. (FUNC_PTR) means a function pointer is called. To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for nci_skb_alloc(). This bug is found by a static analysis tool STCheck written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jouni Hogander authored
Dev_hold has to be called always in rx_queue_add_kobject. Otherwise usage count drops below 0 in case of failure in kobject_init_and_add. Fixes: b8eb7183 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject") Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Miller <davem@davemloft.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
As flower rules are added, they are given a stats ID based on the number of rules that can be supported in firmware. Only after the initial allocation of all available IDs does the driver begin to reuse those that have been released. The initial allocation of IDs was modified to account for multiple memory units on the offloaded device. However, this introduced a bug whereby the counter that controls the IDs could be decremented before the ID was assigned (where it is further decremented). This means that the stats ID could be assigned as -1/0xfffffff which is out of range. Fix this by only decrementing the main counter after the current ID has been assigned. Fixes: 467322e2 ("nfp: flower: support multiple memory units for filter offloads") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Dooks (Codethink) authored
dsa_link_touch() is not exported, or defined outside of the file it is in so make it static to avoid the following warning: net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static? Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Oleksij Rempel authored
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_probe': drivers/net/ethernet/atheros/ag71xx.c:1776:30: warning: passing argument 2 of 'of_get_phy_mode' makes pointer from integer without a cast [-Wint-conversion] In file included from drivers/net/ethernet/atheros/ag71xx.c:33: ./include/linux/of_net.h:15:69: note: expected 'phy_interface_t *' {aka 'enum <anonymous> *'} but argument is of type 'int' Fixes: 0c65b2b9 ("net: of_get_phy_mode: Change API to solve int/unit warnings") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Randy Dunlap authored
Fix missing '*' kernel-doc notation that causes this warning: ../include/linux/netdevice.h:1779: warning: bad line: spinlock Fixes: ab92d68f ("net: core: add generic lockdep keys") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sk->sk_pacing_shift can be read and written without lock synchronization. This patch adds annotations to document this fact and avoid future syzbot complains. This might also avoid unexpected false sharing in sk_pacing_shift_update(), as the compiler could remove the conditional check and always write over sk->sk_pacing_shift : if (sk->sk_pacing_shift != val) sk->sk_pacing_shift = val; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
ql_alloc_large_buffers() has the usual RX buffer allocation loop where it allocates skbs and maps them for DMA. It also treats failure as a fatal error. There are (at least) three bugs in the error paths: 1. ql_free_large_buffers() assumes that the lrg_buf[] entry for the first buffer that couldn't be allocated will have .skb == NULL. But the qla_buf[] array is not zero-initialised. 2. ql_free_large_buffers() DMA-unmaps all skbs in lrg_buf[]. This is incorrect for the last allocated skb, if DMA mapping failed. 3. Commit 1acb8f2a ("net: qlogic: Fix memory leak in ql_alloc_large_buffers") added a direct call to dev_kfree_skb_any() after the skb is recorded in lrg_buf[], so ql_free_large_buffers() will double-free it. The bugs are somewhat inter-twined, so fix them all at once: * Clear each entry in qla_buf[] before attempting to allocate an skb for it. This goes half-way to fixing bug 1. * Set the .skb field only after the skb is DMA-mapped. This fixes the rest. Fixes: 1357bfcf ("qla3xxx: Dynamically size the rx buffer queue ...") Fixes: 0f8ab89e ("qla3xxx: Check return code from pci_map_single() ...") Fixes: 1acb8f2a ("net: qlogic: Fix memory leak in ql_alloc_large_buffers") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
syzbot reported a memory leak when an allocation fails within genradix_prealloc() for output streams. That's because genradix_prealloc() leaves initialized members initialized when the issue happens and SCTP stack will abort the current initialization but without cleaning up such members. The fix here is to always call genradix_free() when genradix_prealloc() fails, for output and also input streams, as it suffers from the same issue. Reported-by: syzbot+772d9e36c490b18d51d1@syzkaller.appspotmail.com Fixes: 2075e50c ("sctp: convert to genradix") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Dec, 2019 4 commits
-
-
David S. Miller authored
Merge tag 'wireless-drivers-2019-12-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.5 First set of fixes for v5.5. Fixing security issues, some regressions and few major bugs. mwifiex * security fix for handling country Information Elements (CVE-2019-14895) * security fix for handling TDLS Information Elements ath9k * fix endian issue with ath9k_pci_owl_loader mt76 * fix default mac address handling iwlwifi * fix merge damage which lead to firmware crashing during boot on some devices * fix device initialisation regression on some devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
Upon reusing the ptp_qoriq driver, the ptp_qoriq_free() function was used on the remove path to free any allocated resources. The ptp_qoriq IRQ is among these resources that are freed in ptp_qoriq_free() even though it is also a managed one (allocated using devm_request_threaded_irq). Drop the resource managed version of requesting the IRQ in order to not trigger a double free of the interrupt as below: [ 226.731005] Trying to free already-free IRQ 126 [ 226.735533] WARNING: CPU: 6 PID: 749 at kernel/irq/manage.c:1707 __free_irq+0x9c/0x2b8 [ 226.743435] Modules linked in: [ 226.746480] CPU: 6 PID: 749 Comm: bash Tainted: G W 5.4.0-03629-gfd7102c32b2c-dirty #912 [ 226.755857] Hardware name: NXP Layerscape LX2160ARDB (DT) [ 226.761244] pstate: 40000085 (nZcv daIf -PAN -UAO) [ 226.766022] pc : __free_irq+0x9c/0x2b8 [ 226.769758] lr : __free_irq+0x9c/0x2b8 [ 226.773493] sp : ffff8000125039f0 (...) [ 226.856275] Call trace: [ 226.858710] __free_irq+0x9c/0x2b8 [ 226.862098] free_irq+0x30/0x70 [ 226.865229] devm_irq_release+0x14/0x20 [ 226.869054] release_nodes+0x1b0/0x220 [ 226.872790] devres_release_all+0x34/0x50 [ 226.876790] device_release_driver_internal+0x100/0x1c0 Fixes: d346c9e8 ("dpaa2-ptp: reuse ptp_qoriq driver") Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Recently noticed that we're tracking programs related to local storage maps through their prog pointer. This is a wrong assumption since the prog pointer can still change throughout the verification process, for example, whenever bpf_patch_insn_single() is called. Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign() is not guaranteed to be the same as we pass in bpf_cgroup_storage_release() and the map would therefore remain in busy state forever. Fix this by using the prog's aux pointer which is stable throughout verification and beyond. Fixes: de9cbbaa ("bpf: introduce cgroup storage maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/1471c69eca3022218666f909bc927a92388fd09e.1576580332.git.daniel@iogearbox.net
-
David S. Miller authored
Merge tag 'mac80211-for-net-2019-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A handful of fixes: * disable AQL on most drivers, addressing the iwlwifi issues * fix double-free on network namespace changes * fix TID field in frames injected through monitor interfaces * fix ieee80211_calc_rx_airtime() * fix NULL pointer dereference in rfkill (and remove BUG_ON) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-