- 09 Mar, 2023 3 commits
-
-
Eric Dumazet authored
syzbot reported struct pid leak [1]. Issue is that queue_oob() calls maybe_add_creds() which potentially holds a reference on a pid. But skb->destructor is not set (either directly or by calling unix_scm_to_skb()) This means that subsequent kfree_skb() or consume_skb() would leak this reference. In this fix, I chose to fully support scm even for the OOB message. [1] BUG: memory leak unreferenced object 0xffff8881053e7f80 (size 128): comm "syz-executor242", pid 5066, jiffies 4294946079 (age 13.220s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff812ae26a>] alloc_pid+0x6a/0x560 kernel/pid.c:180 [<ffffffff812718df>] copy_process+0x169f/0x26c0 kernel/fork.c:2285 [<ffffffff81272b37>] kernel_clone+0xf7/0x610 kernel/fork.c:2684 [<ffffffff812730cc>] __do_sys_clone+0x7c/0xb0 kernel/fork.c:2825 [<ffffffff849ad699>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff849ad699>] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 [<ffffffff84a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 314001f0 ("af_unix: Add OOB support") Reported-by: syzbot+7699d9e5635c10253a27@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rao Shoaib <rao.shoaib@oracle.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230307164530.771896-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
This reverts commit d5e2d038. We have a report of this chip being used on a SURECOM EP-320X-S 100/10M Ethernet PCI Adapter which could still have been purchased in some parts of the world 3 years ago. Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=217151 Fixes: d5e2d038 ("eth: fealnx: delete the driver for Myson MTD-800") Link: https://lore.kernel.org/r/20230307171930.4008454-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
The MT7530 switch from the MT7621 SoC has 2 ports which can be set up as internal: port 5 and 6. Arınç reports that the GMAC1 attached to port 5 receives corrupted frames, unless port 6 (attached to GMAC0) has been brought up by the driver. This is true regardless of whether port 5 is used as a user port or as a CPU port (carrying DSA tags). Offline debugging (blind for me) which began in the linked thread showed experimentally that the configuration done by the driver for port 6 contains a step which is needed by port 5 as well - the write to CORE_GSWPLL_GRP2 (note that I've no idea as to what it does, apart from the comment "Set core clock into 500Mhz"). Prints put by Arınç show that the reset value of CORE_GSWPLL_GRP2 is RG_GSWPLL_POSDIV_500M(1) | RG_GSWPLL_FBKDIV_500M(40) (0x128), both on the MCM MT7530 from the MT7621 SoC, as well as on the standalone MT7530 from MT7623NI Bananapi BPI-R2. Apparently, port 5 on the standalone MT7530 can work under both values of the register, while on the MT7621 SoC it cannot. The call path that triggers the register write is: mt753x_phylink_mac_config() for port 6 -> mt753x_pad_setup() -> mt7530_pad_clk_setup() so this fully explains the behavior noticed by Arınç, that bringing port 6 up is necessary. The simplest fix for the problem is to extract the register writes which are needed for both port 5 and 6 into a common mt7530_pll_setup() function, which is called at mt7530_setup() time, immediately after switch reset. We can argue that this mirrors the code layout introduced in mt7531_setup() by commit 42bc4faf ("net: mt7531: only do PLL once after the reset"), in that the PLL setup has the exact same positioning, and further work to consolidate the separate setup() functions is not hindered. Testing confirms that: - the slight reordering of writes to MT7530_P6ECR and to CORE_GSWPLL_GRP1 / CORE_GSWPLL_GRP2 introduced by this change does not appear to cause problems for the operation of port 6 on MT7621 and on MT7623 (where port 5 also always worked) - packets sent through port 5 are not corrupted anymore, regardless of whether port 6 is enabled by phylink or not (or even present in the device tree) My algorithm for determining the Fixes: tag is as follows. Testing shows that some logic from mt7530_pad_clk_setup() is needed even for port 5. Prior to commit ca366d6c ("net: dsa: mt7530: Convert to PHYLINK API"), a call did exist for all phy_is_pseudo_fixed_link() ports - so port 5 included. That commit replaced it with a temporary "Port 5 is not supported!" comment, and the following commit 38f790a8 ("net: dsa: mt7530: Add support for port 5") replaced that comment with a configuration procedure in mt7530_setup_port5() which was insufficient for port 5 to work. I'm laying the blame on the patch that claimed support for port 5, although one would have also needed the change from commit c3b8e079 ("net: dsa: mt7530: setup core clock even in TRGMII mode") for the write to be performed completely independently from port 6's configuration. Thanks go to Arınç for describing the problem, for debugging and for testing. Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/netdev/f297c2c4-6e7c-57ac-2394-f6025d309b9d@arinc9.com/ Fixes: 38f790a8 ("net: dsa: mt7530: Add support for port 5") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230307155411.868573-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 08 Mar, 2023 3 commits
-
-
Daniel Machon authored
Fix deletion of existing DSCP mappings in the APP table. Adding and deleting DSCP entries are replicated per-port, since the mapping table is global for all ports in the chip. Whenever a mapping for a DSCP value already exists, the old mapping is deleted first. However, it is only deleted for the specified port. Fix this by calling sparx5_dcb_ieee_delapp() instead of dcb_ieee_delapp() as it ought to be. Reproduce: // Map and remap DSCP value 63 $ dcb app add dev eth0 dscp-prio 63:1 $ dcb app add dev eth0 dscp-prio 63:2 $ dcb app show dev eth0 dscp-prio dscp-prio 63:2 $ dcb app show dev eth1 dscp-prio dscp-prio 63:1 63:2 <-- 63:1 should not be there Fixes: 8dcf69a6 ("net: microchip: sparx5: add support for offloading dscp table") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Suman Ghosh authored
NDC caches contexts of frequently used queue's (Rx and Tx queues) contexts. Due to a HW errata when NDC detects fault/poision while accessing contexts it could go into an illegal state where a cache line could get locked forever. To makesure all cache lines in NDC are available for optimum performance upon fault/lockerror/posion errors scan through all cache lines in NDC and clear the lock bit. Fixes: 4a3581cd ("octeontx2-af: NPA AQ instruction enqueue support") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
D. Wythe authored
Before determining whether the msg has unsupported options, it has been prematurely terminated by the wrong status check. For the application, the general usages of MSG_FASTOPEN likes fd = socket(...) /* rather than connect */ sendto(fd, data, len, MSG_FASTOPEN) Hence, We need to check the flag before state check, because the sock state here is always SMC_INIT when applications tries MSG_FASTOPEN. Once we found unsupported options, fallback it to TCP. Fixes: ee9dfbef ("net/smc: handle sockopts forcing fallback") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> v2 -> v1: Optimize code style Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Mar, 2023 9 commits
-
-
Jakub Kicinski authored
I was intending to make all the Netlink Spec code BSD-3-Clause to ease the adoption but it appears that: - I fumbled the uAPI and used "GPL WITH uAPI note" there - it gives people pause as they expect GPL in the kernel As suggested by Chuck re-license under dual. This gives us benefit of full BSD freedom while fulfilling the broad "kernel is under GPL" expectations. Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/ Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Stephen Hemminger authored
Map all my old email addresses to current address. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20230306194405.108236-1-stephen@networkplumber.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Map Maxim's old corporate addresses to his personal one. Link: https://lore.kernel.org/r/20230306192018.3894988-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Fedor Pchelkin authored
cb_context should be freed on the error path in nfc_se_io as stated by commit 25ff6f8a ("nfc: fix memory leak of se_io context in nfc_genl_se_io"). Make the error path in nfc_se_io unwind everything in reverse order, i.e. free the cb_context after unlocking the device. Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230306212650.230322-1-pchelkin@ispras.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Enrico Sau authored
Add the following Telit FE990 composition: 0x1080: tty, adb, rmnet, tty, tty, tty, tty Signed-off-by: Enrico Sau <enrico.sau@gmail.com> Link: https://lore.kernel.org/r/20230306120528.198842-1-enrico.sau@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Enrico Sau authored
Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FE990 0x1081 composition in order to avoid bind error. Signed-off-by: Enrico Sau <enrico.sau@gmail.com> Link: https://lore.kernel.org/r/20230306115933.198259-1-enrico.sau@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restore ctnetlink zero mark in events and dump, from Ivan Delalande. 2) Fix deadlock due to missing disabled bh in tproxy, from Florian Westphal. 3) Safer maximum chain load in conntrack, from Eric Dumazet. * 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: adopt safer max chain length netfilter: tproxy: fix deadlock due to missing BH disable netfilter: ctnetlink: revert to dumping mark regardless of event type ==================== Link: https://lore.kernel.org/r/20230307100424.2037-1-pablo@netfilter.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
Customers using GKE 1.25 and 1.26 are facing conntrack issues root caused to commit c9c3b681 ("netfilter: conntrack: make max chain length random"). Even if we assume Uniform Hashing, a bucket often reachs 8 chained items while the load factor of the hash table is smaller than 0.5 With a limit of 16, we reach load factors of 3. With a limit of 32, we reach load factors of 11. With a limit of 40, we reach load factors of 15. With a limit of 50, we reach load factors of 24. This patch changes MIN_CHAINLEN to 50, to minimize risks. Ideally, we could in the future add a cushion based on expected load factor (2 * nf_conntrack_max / nf_conntrack_buckets), because some setups might expect unusual values. Fixes: c9c3b681 ("netfilter: conntrack: make max chain length random") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf 2023-03-06 We've added 8 non-merge commits during the last 7 day(s) which contain a total of 9 files changed, 64 insertions(+), 18 deletions(-). The main changes are: 1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier, that is, keep resolving such instances instead of bailing out, from Lorenz Bauer. 2) Fix BPF test framework with regards to xdp_frame info misplacement in the "live packet" code, from Alexander Lobakin. 3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX, from Liu Jian. 4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n, from Randy Dunlap. 5) Several BPF doc fixes with either broken links or external instead of internal doc links, from Bagas Sanjaya. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: check that modifier resolves after pointer btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES bpf, doc: Link to submitting-patches.rst for general patch submission info bpf, doc: Do not link to docs.kernel.org for kselftest link bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() riscv, bpf: Fix patch_text implicit declaration bpf, docs: Fix link to BTF doc ==================== Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 06 Mar, 2023 15 commits
-
-
Jakub Kicinski authored
Adrien reports that incorrect data is transmitted when a single page straddles multiple records. We would transmit the same data in all iterations of the loop. Reported-by: Adrien Moulin <amoulin@corp.free.fr> Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr Fixes: c1318b39 ("tls: Add opt-in zerocopy mode of sendfile()") Tested-by: Adrien Moulin <amoulin@corp.free.fr> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
Fix data corruption issue with SerDes connected PHYs operating at 1.25 Gbps speed where we could previously observe about 30% packet loss while the bad packet counter was increasing. As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531 switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using rate-adaptation to 2500Base-X mode, this issue only got exposed now when we started trying to use SFP modules operating with 1.25 Gbps with the BananaPi R3 board. The fix is to set bit 12 which disables the RX FIFO clear function when setting up MAC MCR, MediaTek SDK did the same change stating: "If without this patch, kernel might receive invalid packets that are corrupted by GMAC."[1] [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63 Fixes: 42c03844 ("net-next: mediatek: add support for MediaTek MT7622 SoC") Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Currently link up can't be detected in forced mode if polling isn't used. Only link up interrupt source we have is aneg complete which isn't applicable in forced mode. Therefore we have to use energy-on as link up indicator. Fixes: 73654945 ("net: phy: smsc: skip ENERGYON interrupt if disabled") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
Lorenz Bauer says: ==================== See the first patch for a detailed explanation. v2: - Move RESOLVE_TBD assignment out of the loop (Martin) ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Lorenz Bauer authored
Add a regression test that ensures that a VAR pointing at a modifier which follows a PTR (or STRUCT or ARRAY) is resolved correctly by the datasec validator. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Lorenz Bauer authored
btf_datasec_resolve contains a bug that causes the following BTF to fail loading: [1] DATASEC a size=2 vlen=2 type_id=4 offset=0 size=1 type_id=7 offset=1 size=1 [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [3] PTR (anon) type_id=2 [4] VAR a type_id=3 linkage=0 [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [6] TYPEDEF td type_id=5 [7] VAR b type_id=6 linkage=0 This error message is printed during btf_check_all_types: [1] DATASEC a size=2 vlen=2 type_id=7 offset=1 size=1 Invalid type By tracing btf_*_resolve we can pinpoint the problem: btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0 btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0 btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22 The last invocation of btf_datasec_resolve should invoke btf_var_resolve by means of env_stack_push, instead it returns EINVAL. The reason is that env_stack_push is never executed for the second VAR. if (!env_type_is_resolve_sink(env, var_type) && !env_type_is_resolved(env, var_type_id)) { env_stack_set_next_member(env, i + 1); return env_stack_push(env, var_type, var_type_id); } env_type_is_resolve_sink() changes its behaviour based on resolve_mode. For RESOLVE_PTR, we can simplify the if condition to the following: (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved() Since we're dealing with a VAR the clause evaluates to false. This is not sufficient to trigger the bug however. The log output and EINVAL are only generated if btf_type_id_size() fails. if (!btf_type_id_size(btf, &type_id, &type_size)) { btf_verifier_log_vsi(env, v->t, vsi, "Invalid type"); return -EINVAL; } Most types are sized, so for example a VAR referring to an INT is not a problem. The bug is only triggered if a VAR points at a modifier. Since we skipped btf_var_resolve that modifier was also never resolved, which means that btf_resolved_type_id returns 0 aka VOID for the modifier. This in turn causes btf_type_id_size to return NULL, triggering EINVAL. To summarise, the following conditions are necessary: - VAR pointing at PTR, STRUCT, UNION or ARRAY - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or TYPE_TAG The fix is to reset resolve_mode to RESOLVE_TBD before attempting to resolve a VAR from a DATASEC. Fixes: 1dc92851 ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Alexander Lobakin authored
&xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Bagas Sanjaya authored
The link for patch submission information in general refers to index page for "Working with the kernel development community" section of kernel docs, whereas the link should have been Documentation/process/submitting-patches.rst instead. Fix it by replacing the index target with the appropriate doc. Fixes: 54222838 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com
-
Bagas Sanjaya authored
The question on how to run BPF selftests have a reference link to kernel selftest documentation (Documentation/dev-tools/kselftest.rst). However, it uses external link to the documentation at kernel.org/docs (aka docs.kernel.org) instead, which requires Internet access. Fix this and replace the link with internal linking, by using :doc: directive while keeping the anchor text. Fixes: b7a27c3a ("bpf, doc: howto use/run the BPF selftests") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com
-
Florian Westphal authored
The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid <major.david@balasys.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Ivan Delalande authored
It seems that change was unintentional, we have userspace code that needs the mark while listening for events like REPLY, DESTROY, etc. Also include 0-marks in requested dumps, as they were before that fix. Fixes: 1feeae07 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark") Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Selvin Xavier authored
Following warning reported by KASAN during driver unload ================================================================== BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en] Free of addr ffff88814e8dd4c0 by task rmmod/17469 CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S 6.2.0-rc7+ #2 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019 Call Trace: <TASK> dump_stack_lvl+0x33/0x46 print_report+0x17b/0x4b3 ? __call_rcu_common.constprop.79+0x27e/0x8c0 ? __pfx_free_object_rcu+0x10/0x10 ? __virt_addr_valid+0xe3/0x160 ? bnxt_remove_one+0x103/0x200 [bnxt_en] kasan_report_invalid_free+0x64/0xd0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kasan_slab_free+0x179/0x1c0 ? bnxt_remove_one+0x103/0x200 [bnxt_en] __kmem_cache_free+0x194/0x350 bnxt_remove_one+0x103/0x200 [bnxt_en] pci_device_remove+0x62/0x110 device_release_driver_internal+0xf6/0x1c0 driver_detach+0x76/0xe0 bus_remove_driver+0x89/0x160 pci_unregister_driver+0x26/0x110 ? strncpy_from_user+0x188/0x1c0 bnxt_exit+0xc/0x24 [bnxt_en] __x64_sys_delete_module+0x21f/0x390 ? __pfx___x64_sys_delete_module+0x10/0x10 ? __pfx_mem_cgroup_handle_over_high+0x10/0x10 ? _raw_spin_lock+0x87/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __audit_syscall_entry+0x185/0x210 ? ktime_get_coarse_real_ts64+0x51/0x80 ? syscall_trace_enter.isra.18+0x126/0x1a0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7effcb6fd71b Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8 RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000 R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0 R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750 </TASK> Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid calling kfree from bnxt_remove_one. Also, set bp->edev to NULL before freeing the auxilary private structure. Fixes: d80d88b0 ("bnxt_en: Add auxiliary driver support") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The driver needs to keep track of all the possible concurrent TPA (GRO/LRO) completions on the aggregation ring. On P5 chips, the maximum number of concurrent TPA is 256 and the amount of memory we allocate is order-5 on systems using 4K pages. Memory allocation failure has been reported: NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1 Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022 Call Trace: dump_stack+0x57/0x6e warn_alloc.cold.120+0x7b/0xdd ? _cond_resched+0x15/0x30 ? __alloc_pages_direct_compact+0x15f/0x170 __alloc_pages_slowpath.constprop.108+0xc58/0xc70 __alloc_pages_nodemask+0x2d0/0x300 kmalloc_order+0x24/0xe0 kmalloc_order_trace+0x19/0x80 bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en] ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en] __bnxt_open_nic+0x12e/0x780 [bnxt_en] bnxt_open+0x10b/0x240 [bnxt_en] __dev_open+0xe9/0x180 __dev_change_flags+0x1af/0x220 dev_change_flags+0x21/0x60 do_setlink+0x35c/0x1100 Instead of allocating this big chunk of memory and dividing it up for the concurrent TPA instances, allocate each small chunk separately for each TPA instance. This will reduce it to order-0 allocations. Fixes: 79632e9b ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
The locking in phy_probe() and phy_remove() does very little to prevent any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA warnings. Remove it. ====================================================== WARNING: possible circular locking dependency detected 6.2.0-dirty #1108 Tainted: G W E ------------------------------------------------------ ip/415 is trying to acquire lock: ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy] but task is already holding lock: ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 rtnl_lock+0x24/0x30 sfp_bus_add_upstream+0x34/0x150 phy_sfp_probe+0x4c/0x94 [libphy] mv3310_probe+0x148/0x184 [marvell10g] phy_probe+0x8c/0x200 [libphy] call_driver_probe+0xbc/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __device_attach_driver+0xc4/0x160 bus_for_each_drv+0x80/0xe0 __device_attach+0xb0/0x1f0 device_initial_probe+0x1c/0x2c bus_probe_device+0xa4/0xb0 device_add+0x360/0x53c phy_device_register+0x60/0xa4 [libphy] fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio] fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio] of_mdiobus_register+0x140/0x340 [of_mdio] orion_mdio_probe+0x298/0x3c0 [mvmdio] platform_probe+0x70/0xe0 call_driver_probe+0x34/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __driver_attach+0x104/0x210 bus_for_each_dev+0x78/0xdc driver_attach+0x2c/0x3c bus_add_driver+0x184/0x240 driver_register+0x80/0x13c __platform_driver_register+0x30/0x3c xt_compat_calc_jump+0x28/0xa4 [x_tables] do_one_initcall+0x50/0x1b0 do_init_module+0x50/0x1fc load_module+0x684/0x744 __do_sys_finit_module+0xc4/0x140 __arm64_sys_finit_module+0x28/0x34 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 -> #0 (&dev->lock){+.+.}-{3:3}: check_prev_add+0xb4/0xc80 validate_chain+0x414/0x47c __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 phy_attach_direct+0x17c/0x3a0 [libphy] phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink] phylink_fwnode_phy_connect+0x48/0x60 [phylink] mvpp2_open+0xec/0x2e0 [mvpp2] __dev_open+0x104/0x214 __dev_change_flags+0x1d4/0x254 dev_change_flags+0x2c/0x7c do_setlink+0x254/0xa50 __rtnl_newlink+0x430/0x514 rtnl_newlink+0x58/0x8c rtnetlink_rcv_msg+0x17c/0x560 netlink_rcv_skb+0x64/0x150 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x1d4/0x2b4 netlink_sendmsg+0x1a4/0x400 ____sys_sendmsg+0x228/0x290 ___sys_sendmsg+0x88/0xec __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&dev->lock); lock(rtnl_mutex); lock(&dev->lock); *** DEADLOCK *** Fixes: 298e54fa ("net: phy: add core phylib sfp support") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rongguang Wei authored
When MAC is not support PMT, driver will check PHY's WoL capability and set device wakeup capability in stmmac_init_phy(). We can enable the WoL through ethtool, the driver would enable the device wake up flag. Now the device_may_wakeup() return true. But if there is a way which enable the PHY's WoL capability derectly, like in BIOS. The driver would not know the enable thing and would not set the device wake up flag. The phy_suspend may failed like this: [ 32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16 [ 32.409065] PM: Device stmmac-1:00 failed to suspend: error -16 [ 32.409067] PM: Some devices failed to suspend, or early wake event detected Add to set the device wakeup enable flag according to the get_wol function result in PHY can fix the error in this scene. v2: add a Fixes tag. Fixes: 1d8e5b0f ("net: stmmac: Support WOL with phy") Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Mar, 2023 10 commits
-
-
Liu Jian authored
When the buffer length of the recvmsg system call is 0, we got the flollowing soft lockup problem: watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149] CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:remove_wait_queue+0xb/0xc0 Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20 RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768 RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040 RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7 R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800 R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0 FS: 00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_msg_wait_data+0x279/0x2f0 tcp_bpf_recvmsg_parser+0x3c6/0x490 inet_recvmsg+0x280/0x290 sock_recvmsg+0xfc/0x120 ____sys_recvmsg+0x160/0x3d0 ___sys_recvmsg+0xf0/0x180 __sys_recvmsg+0xea/0x1a0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The logic in tcp_bpf_recvmsg_parser is as follows: msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { wait data; goto msg_bytes_ready; } In this case, "copied" always is 0, the infinite loop occurs. According to the Linux system call man page, 0 should be returned in this case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly return. Also modify several other functions with the same problem. Fixes: 1f5be6b3 ("udp: Implement udp_bpf_recvmsg() for sockmap") Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()") Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
-
David S. Miller authored
Simon Horman says: ==================== nfp: fix incorrect IPsec checksum handling this short series resolves two problems with IPsec checksum handling in the nfp driver. * PATCH 1/3, 2/3: Correct setting of checksum flags. One patch for each of the nfd3 and nfdk datapaths. * Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK so that the stack does not unecessarily calculate csums for IPsec offload packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huanhuan Wang authored
When esp-tx-csum-offload is set to on, the protocol stack shouldn't calculate the IPsec offload packet's csum, but it does. Because the callback `.ndo_features_check` incorrectly masked NETIF_F_CSUM_MASK bit. Fixes: 57f273ad ("nfp: add framework to support ipsec offloading") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huanhuan Wang authored
The csum flag of IPsec packet are set repeatedly. Therefore, the csum flag set of IPsec and non-IPsec packet need to be distinguished. As the ipv6 header does not have a csum field, so l3-csum flag is not required to be set for ipv6 case. Fixes: 436396f2 ("nfp: support IPsec offloading for NFP3800") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huanhuan Wang authored
The csum flag of IPsec packet are set repeatedly. Therefore, the csum flag set of IPsec and non-IPsec packet need to be distinguished. As the ipv6 header does not have a csum field, so l3-csum flag is not required to be set for ipv6 case. L4-csum flag include the tcp csum flag and udp csum flag, we shouldn't set the udp and tcp csum flag at the same time for one packet, should set l4-csum flag according to the transport layer is tcp or udp. Fixes: 57f273ad ("nfp: add framework to support ipsec offloading") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Oros authored
ice_get_module_eeprom() is broken since commit e9c9692c ("ice: Reimplement module reads used by ethtool") In this refactor, ice_get_module_eeprom() reads the eeprom in blocks of size 8. But the condition that should protect the buffer overflow ignores the last block. The last block always contains zeros. Bug uncovered by ethtool upstream commit 9538f384b535 ("netlink: eeprom: Defer page requests to individual parsers") After this commit, ethtool reads a block with length = 1; to read the SFF-8024 identifier value. unpatched driver: $ ethtool -m enp65s0f0np0 offset 0x90 length 8 Offset Values ------ ------ 0x0090: 00 00 00 00 00 00 00 00 $ ethtool -m enp65s0f0np0 offset 0x90 length 12 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c 00 00 00 00 $ $ ethtool -m enp65s0f0np0 Offset Values ------ ------ 0x0000: 11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00 0x0070: 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 patched driver: $ ethtool -m enp65s0f0np0 offset 0x90 length 8 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c $ ethtool -m enp65s0f0np0 offset 0x90 length 12 Offset Values ------ ------ 0x0090: 00 00 01 a0 4d 65 6c 6c 61 6e 6f 78 $ ethtool -m enp65s0f0np0 Identifier : 0x11 (QSFP28) Extended identifier : 0x00 Extended identifier description : 1.5W max. Power consumption Extended identifier description : No CDR in TX, No CDR in RX Extended identifier description : High Power Class (> 3.5 W) not enabled Connector : 0x23 (No separable connector) Transceiver codes : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Transceiver type : 40G Ethernet: 40G Base-CR4 Transceiver type : 25G Ethernet: 25G Base-CR CA-N Encoding : 0x05 (64B/66B) BR, Nominal : 25500Mbps Rate identifier : 0x00 Length (SMF,km) : 0km Length (OM3 50um) : 0m Length (OM2 50um) : 0m Length (OM1 62.5um) : 0m Length (Copper or Active cable) : 1m Transmitter technology : 0xa0 (Copper cable unequalized) Attenuation at 2.5GHz : 4db Attenuation at 5.0GHz : 5db Attenuation at 7.0GHz : 7db Attenuation at 12.9GHz : 10db ........ .... Fixes: e9c9692c ("ice: Reimplement module reads used by ethtool") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== tools: ynl: fix subset use and change default value for attrs/ops Fix a problem in subsetting, which will become apparent when the devlink family comes after the merge window. Even tho none of the existing families need this, we don't want someone to get "inspired" by the current, incorrect code when using specs in other languages. Change the default value for the first attr/op. This is a slight behavior change so needs to go in now. The diffstat of the last patch should serve as the clearest justification there.. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Now that the codegen rules had been changed we can update the specs to reflect the new default. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Pretty much all families use value: 1 or reserve as unspec the first entry in attribute set and the first operation. Make this the default. Update documentation (the doc for values of operations just refers back to doc for attrs so updating only attrs). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
To avoid having to repeat the entire definition of an attribute (including the value) use the Attr object from the original set. In fact this is already the documented expectation. Fixes: be5bea1c ("net: add basic C code generators for Netlink") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-