- 06 Jan, 2023 27 commits
-
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== devlink: code split and structured instance walk Split devlink.c into a handful of files, trying to keep the "core" code away from all the command-specific implementations. The core code has been quite scattered until now. Going forward we can consider using a source file per-subobject, I think that it's quite beneficial to newcomers (based on relative ease with which folks contribute to ethtool vs devlink). But this series doesn't split everything out, yet - partially due to backporting concerns, but mostly due to lack of time. Bulk of the netlink command handling is left in a leftover.c file. Introduce a context structure for dumps, and use it to store the devlink instance ID of the last dumped devlink instance. This means we don't have to restart the walk from 0 each time. Finally - introduce a "structured walk". A centralized dump handler in devlink/netlink.c which walks the devlink instances, deals with refcounting/locking, simplifying the per-object implementations quite a bit. Inspired by the ethtool code. v1: https://lore.kernel.org/all/20230104041636.226398-1-kuba@kernel.org/ RFC: https://lore.kernel.org/all/20221215020155.1619839-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Soon we'll have to check if a devlink instance is alive after locking it. Convert to the by-instance dumping scheme to make refactoring easier. Most of the subobject code no longer has to worry about any devlink locking / lifetime rules (the only ones that still do are the two subject types which stubbornly use their own locking). Both dump and do callbacks are given a devlink instance which is already locked and good-to-access (do from the .pre_doit handler, dump from the new dump indirection). Note that we'll now check presence of an op (e.g. for sb_pool_get) under the devlink instance lock, that will soon be necessary anyway, because we don't hold refs on the driver modules so the memory in which ops live may be gone for a dead instance, after upcoming locking changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Most dumpit implementations walk the devlink instances. This requires careful lock taking and reference dropping. Factor the loop out and provide just a callback to handle a single instance dump. Convert one user as an example, other users converted in the next change. Slightly inspired by ethtool netlink code. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit(). This way all dumps will take the instance lock in the main iteration loop directly, making refactoring and reading the code easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Use xarray id for cases of sub-objects which are iterated in a function. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Use xarray id for cases of simple sub-object iteration. We'll now use the state->instance for the devlink instances and state->idx for subobject index. Moving the definition of idx into the inner loop makes sense, so while at it also move other sub-object local variables into the loop. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
xarray gives each devlink instance an id and allows us to restart walk based on that id quite neatly. This is nice both from the perspective of code brevity and from the stability of the dump (devlink instances disappearing from before the resumption point will not cause inconsistent dumps). This patch takes care of simple cases where state->idx counts devlink instances only. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Walk devlink instances only once. Dump the instance reporters and port reporters before moving to the next instance. User space should not depend on ordering of messages. This will make improving stability of the walk easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Looks like devlinks_xa_find_get() was intended to get the mark from the @filter argument. It doesn't actually use @filter, passing DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other than registered is unlikely so drop @filter argument completely. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
The start variables made the code clearer when we had to access cb->args[0] directly, as the name args doesn't explain much. Now that we use a structure to hold state this seems no longer needed. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Create a dump context structure instead of using cb->args as an unsigned long array. This is a pure conversion which is intended to be as much of a noop as possible. Subsequent changes will use this to simplify the code. The two non-trivial parts are: - devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0] to see if devlink_fmsg_dumpit() has already been called (whether this is the first msg), but doesn't use the exact value, so we can drop the local variable there already - devlink_nl_cmd_region_read_dumpit() uses args[0] for address but we'll use args[1] now, shouldn't matter Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
We encourage casting struct netlink_callback::ctx to a local struct (in a comment above the field). Provide a convenience macro for checking if the local struct fits into the ctx. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move out the netlink glue into a separate file. Leave the ops in the old file because we'd have to export a ton of functions. Going forward we should switch to split ops which will let us to put the new ops in the netlink.c file. Pure code move, no functional changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move core code into a separate file. It's spread around the main file which makes refactoring and figuring out how devlink works harder. Move the xarray, all the most core devlink instance code out like locking, ref counting, alloc, register, etc. Leave port stuff in leftover.c, if we want to move port code it'd probably be to its own file. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
To make the upcoming change a pure(er?) code move rename devlink_netdevice_event -> devlink_port_netdevice_event. This makes it clear that it only touches ports and doesn't belong cleanly in the core. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probably be too much to split it all up, but we can at least separate the core parts out of the per-cmd implementations and put it in a directory so that new commands can be separate files. Move the code, subsequent commit will do a partial split. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: simplify IPA interrupt handling One of the IPA's two IRQs fires when data on a suspended channel is available (to request that the channel--or system--be resumed to recieve the pending data). This interrupt also handles a few conditions signaled by the embedded microcontroller. For this "IPA interrupt", the current code requires a handler to be dynamically registered for each interrupt condition. Any condition that has no registered handler is quietly ignored. This design is derived from the downstream IPA driver implementation. There isn't any need for this complexity. Even in the downstream code, only four of the available 30 or so IPA interrupt conditions are ever handled. So these handlers can pretty easily just be called directly in the main IRQ handler function. This series simplifies the interrupt handling code by having the small number of IPA interrupt handlers be called directly, rather than having them be registered dynamically. Version 2 just adds a missing forward-reference, as suggested by Caleb. ==================== Link: https://lore.kernel.org/r/20230104175233.2862874-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
We can call the two IPA interrupt handler functions directly; there's no need to maintain the array of handler function pointers any more. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The dynamic assignment of IPA interrupt handlers isn't needed; we only handle three IPA interrupt types, and their handler functions are now assigned directly. We can get rid of ipa_interrupt_add() and ipa_interrupt_remove() now, because they serve no purpose. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Declare the microcontroller IPA interrupt handler publicly, and assign it directly in ipa_interrupt_config(). Make the SUSPEND IPA interrupt handler public, and rename it ipa_power_suspend_handler(). Assign it directly in ipa_interrupt_config() as well. This makes it unnecessary to do this in ipa_interrupt_add(). Make similar changes for removing IPA interrupt handlers. The next two patches will finish the cleanup, removing the add/remove functions and the handler array entirely. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Expose ipa_interrupt_enable() and have functions that register IPA interrupt handlers enable them directly, rather than having the registration process do that. Do the same for disabling IPA interrupt handlers. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create new function ipa_interrupt_enable() to encapsulate enabling one of the IPA interrupt types. Introduce ipa_interrupt_disable() to reverse that operation. Add a helper function to factor out the common register update used by both. Use these in ipa_interrupt_add() and ipa_interrupt_remove(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The prototype for an IPA interrupt handler supplies the IPA interrupt ID, so it's possible to use a single function to handle any type of microcontroller interrupt. Introduce ipa_uc_interrupt_handler(), which calls the event or the response handler depending on the IRQ ID provided. Register the new function as the handler for both microcontroller IPA interrupt types. The called functions don't use their "irq_id" arguments, so remove them. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Lorenzo Bianconi says: ==================== enetc: unlock XDP_REDIRECT for XDP non-linear buffers Unlock XDP_REDIRECT for S/G XDP buffer and rely on XDP stack to properly take care of the frames. Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access non-linear part of the xdp_buff/xdp_frame. ==================== Link: https://lore.kernel.org/r/cover.1672840490.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Lorenzo Bianconi authored
Move XDP skb_shared_info structure initialization in from enetc_map_rx_buff_to_xdp() to enetc_add_rx_buff_to_xdp() and do not always access skb_shared_info in the xdp_buff/xdp_frame since it is located in a different cacheline with respect to hard_start and data xdp pointers. Rely on XDP_FLAGS_HAS_FRAGS flag to check if it really necessary to access non-linear part of the xdp_buff/xdp_frame. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Lorenzo Bianconi authored
Remove xdp_redirect_sg counter and the related ethtool entry since it is no longer used. Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Lorenzo Bianconi authored
Even if full XDP_REDIRECT is not supported yet for non-linear XDP buffers since we allow redirecting just into CPUMAPs, unlock XDP_REDIRECT for S/G XDP buffer and rely on XDP stack to properly take care of the frames. Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 05 Jan, 2023 13 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, wifi, and netfilter. Current release - regressions: - bpf: fix nullness propagation for reg to reg comparisons, avoid null-deref - inet: control sockets should not use current thread task_frag - bpf: always use maximal size for copy_array() - eth: bnxt_en: don't link netdev to a devlink port for VFs Current release - new code bugs: - rxrpc: fix a couple of potential use-after-frees - netfilter: conntrack: fix IPv6 exthdr error check - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes - eth: dsa: qca8k: various fixes for the in-band register access - eth: nfp: fix schedule in atomic context when sync mc address - eth: renesas: rswitch: fix getting mac address from device tree - mobile: ipa: use proper endpoint mask for suspend Previous releases - regressions: - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by Jiri / python tests - net: tc: don't intepret cls results when asked to drop, fix oob-access - vrf: determine the dst using the original ifindex for multicast - eth: bnxt_en: - fix XDP RX path if BPF adjusted packet length - fix HDS (header placement) and jumbo thresholds for RX packets - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf, avoid memory corruptions Previous releases - always broken: - ulp: prevent ULP without clone op from entering the LISTEN status - veth: fix race with AF_XDP exposing old or uninitialized descriptors - bpf: - pull before calling skb_postpull_rcsum() (fix checksum support and avoid a WARN()) - fix panic due to wrong pageattr of im->image (when livepatch and kretfunc coexist) - keep a reference to the mm, in case the task is dead - mptcp: fix deadlock in fastopen error path - netfilter: - nf_tables: perform type checking for existing sets - nf_tables: honor set timeout and garbage collection updates - ipset: fix hash:net,port,net hang with /0 subnet - ipset: avoid hung task warning when adding/deleting entries - selftests: net: - fix cmsg_so_mark.sh test hang on non-x86 systems - fix the arp_ndisc_evict_nocarrier test for IPv6 - usb: rndis_host: secure rndis_query check against int overflow - eth: r8169: fix dmar pte write access during suspend/resume with WOL - eth: lan966x: fix configuration of the PCS - eth: sparx5: fix reading of the MAC address - eth: qed: allow sleep in qed_mcp_trace_dump() - eth: hns3: - fix interrupts re-initialization after VF FLR - fix handling of promisc when MAC addr table gets full - refine the handling for VF heartbeat - eth: mlx5: - properly handle ingress QinQ-tagged packets on VST - fix io_eq_size and event_eq_size params validation on big endian - fix RoCE setting at HCA level if not supported at all - don't turn CQE compression on by default for IPoIB - eth: ena: - fix toeplitz initial hash key value - account for the number of XDP-processed bytes in interface stats - fix rx_copybreak value update Misc: - ethtool: harden phy stat handling against buggy drivers - docs: netdev: convert maintainer's doc from FAQ to a normal document" * tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) caif: fix memory leak in cfctrl_linkup_request() inet: control sockets should not use current thread task_frag net/ulp: prevent ULP without clone op from entering the LISTEN status qed: allow sleep in qed_mcp_trace_dump() MAINTAINERS: Update maintainers for ptp_vmw driver usb: rndis_host: Secure rndis_query check against int overflow net: dpaa: Fix dtsec check for PCS availability octeontx2-pf: Fix lmtst ID used in aura free drivers/net/bonding/bond_3ad: return when there's no aggregator netfilter: ipset: Rework long task execution when adding/deleting entries netfilter: ipset: fix hash:net,port,net hang with /0 subnet net: sparx5: Fix reading of the MAC address vxlan: Fix memory leaks in error path net: sched: htb: fix htb_classify() kernel-doc net: sched: cbq: dont intepret cls results when asked to drop net: sched: atm: dont intepret cls results when asked to drop dt-bindings: net: marvell,orion-mdio: Fix examples dt-bindings: net: sun8i-emac: Add phy-supply property net: ipa: use proper endpoint mask for suspend selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linuxLinus Torvalds authored
Pull gpio fixes from Bartosz Golaszewski: "A reference leak fix, two fixes for using uninitialized variables and more drivers converted to using immutable irqchips: - fix a reference leak in gpio-sifive - fix a potential use of an uninitialized variable in core gpiolib - fix a potential use of an uninitialized variable in gpio-pca953x - make GPIO irqchips immutable in gpio-pmic-eic-sprd, gpio-eic-sprd and gpio-sprd" * tag 'gpio-fixes-for-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: sifive: Fix refcount leak in sifive_gpio_probe gpio: sprd: Make the irqchip immutable gpio: pmic-eic-sprd: Make the irqchip immutable gpio: eic-sprd: Make the irqchip immutable gpio: pca953x: avoid to use uninitialized value pinctrl gpiolib: Fix using uninitialized lookup-flags on ACPI platforms
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdevLinus Torvalds authored
Pull fbdev fixes from Helge Deller: - Fix Matrox G200eW initialization failure - Fix build failure of offb driver when built as module - Optimize stack usage in omapfb * tag 'fbdev-for-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: omapfb: avoid stack overflow warning fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB fbdev: atyfb: use strscpy() to instead of strncpy() fbdev: omapfb: use strscpy() to instead of strncpy() fbdev: make offb driver tristate
-
Paolo Abeni authored
Siddharth Vadapalli says: ==================== Add support for QSGMII mode for J721e CPSW9G to am65-cpsw driver Add compatible to am65-cpsw driver for J721e CPSW9G, which contains 8 external ports and 1 internal host port. Add support to power on and power off the SERDES PHY which is used by the CPSW MAC. ========= Changelog ========= v5: https://lore.kernel.org/r/20221109042203.375042-1-s-vadapalli@ti.com/ v4: https://lore.kernel.org/r/20221108080606.124596-1-s-vadapalli@ti.com/ v3: https://lore.kernel.org/r/20221026090957.180592-1-s-vadapalli@ti.com/ v2: https://lore.kernel.org/r/20221018085810.151327-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20220914095053.189851-1-s-vadapalli@ti.com/ ==================== Link: https://lore.kernel.org/r/20230104103432.1126403-1-s-vadapalli@ti.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Siddharth Vadapalli authored
Use PHY framework APIs to initialize the SERDES PHY connected to CPSW MAC. Define the functions am65_cpsw_disable_phy(), am65_cpsw_enable_phy(), am65_cpsw_disable_serdes_phy() and am65_cpsw_enable_serdes_phy(). Add new member "serdes_phy" to struct "am65_cpsw_slave_data" to store the SERDES PHY for each port, if it exists. Use it later while disabling the SERDES PHY for each port. Power on and initialize the SerDes PHY in am65_cpsw_nuss_init_slave_ports() by invoking am65_cpsw_enable_serdes_phy(). Power off the SerDes PHY in am65_cpsw_nuss_remove() by invoking am65_cpsw_disable_serdes_phy(). Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Siddharth Vadapalli authored
CPSW9G in J721e supports additional modes like QSGMII. Add new compatible for J721e in am65-cpsw driver. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Siddharth Vadapalli authored
Update bindings for TI K3 J721e SoC which contains 9 ports (8 external ports) CPSW9G module and add compatible for it. Changes made: - Add new compatible ti,j721e-cpswxg-nuss for CPSW9G. - Extend pattern properties for new compatible. - Change maximum number of CPSW ports to 8 for new compatible. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Arnd Bergmann authored
The dsi_irq_stats structure is a little too big to fit on the stack of a 32-bit task, depending on the specific gcc options: fbdev/omap2/omapfb/dss/dsi.c: In function 'dsi_dump_dsidev_irqs': fbdev/omap2/omapfb/dss/dsi.c:1621:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Since this is only a debugfs file, performance is not critical, so just dynamically allocate it, and print an error message in there in place of a failure code when the allocation fails. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Helge Deller <deller@gmx.de>
-
Zhengchao Shao authored
When linktype is unknown or kzalloc failed in cfctrl_linkup_request(), pkt is not released. Add release process to error path. Fixes: b482cd20 ("net-caif: add CAIF core protocol stack") Fixes: 8d545c8f ("caif: Disconnect without waiting for response") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
Because ICMP handlers run from softirq contexts, they must not use current thread task_frag. Previously, all sockets allocated by inet_ctl_sock_create() would use the per-socket page fragment, with no chance of recursion. Fixes: 98123866 ("Treewide: Stop corrupting socket's task_frag") Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Coddington <bcodding@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
When an ULP-enabled socket enters the LISTEN status, the listener ULP data pointer is copied inside the child/accepted sockets by sk_clone_lock(). The relevant ULP can take care of de-duplicating the context pointer via the clone() operation, but only MPTCP and SMC implement such op. Other ULPs may end-up with a double-free at socket disposal time. We can't simply clear the ULP data at clone time, as TLS replaces the socket ops with custom ones assuming a valid TLS ULP context is available. Instead completely prevent clone-less ULP sockets from entering the LISTEN status. Fixes: 734942cc ("tcp: ULP infrastructure") Reported-by: slipper <slipper.alive@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Caleb Sander authored
By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop that can run 500K times, so calls to qed_mcp_nvm_rd_cmd() may block the current thread for over 5s. We observed thread scheduling delays over 700ms in production, with stacktraces pointing to this code as the culprit. qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted. It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd(). Add a "can sleep" parameter to qed_find_nvram_image() and qed_nvram_read() so they can sleep during qed_mcp_trace_dump(). qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(), called only by qed_mcp_trace_dump(), allow these functions to sleep. I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep, so keep b_can_sleep set to false when it calls these functions. An example stacktrace from a custom warning we added to the kernel showing a thread that has not scheduled despite long needing resched: [ 2745.362925,17] ------------[ cut here ]------------ [ 2745.362941,17] WARNING: CPU: 23 PID: 5640 at arch/x86/kernel/irq.c:233 do_IRQ+0x15e/0x1a0() [ 2745.362946,17] Thread not rescheduled for 744 ms after irq 99 [ 2745.362956,17] Modules linked in: ... [ 2745.363339,17] CPU: 23 PID: 5640 Comm: lldpd Tainted: P O 4.4.182+ #202104120910+6d1da174272d.61x [ 2745.363343,17] Hardware name: FOXCONN MercuryB/Quicksilver Controller, BIOS H11P1N09 07/08/2020 [ 2745.363346,17] 0000000000000000 ffff885ec07c3ed8 ffffffff8131eb2f ffff885ec07c3f20 [ 2745.363358,17] ffffffff81d14f64 ffff885ec07c3f10 ffffffff81072ac2 ffff88be98ed0000 [ 2745.363369,17] 0000000000000063 0000000000000174 0000000000000074 0000000000000000 [ 2745.363379,17] Call Trace: [ 2745.363382,17] <IRQ> [<ffffffff8131eb2f>] dump_stack+0x8e/0xcf [ 2745.363393,17] [<ffffffff81072ac2>] warn_slowpath_common+0x82/0xc0 [ 2745.363398,17] [<ffffffff81072b4c>] warn_slowpath_fmt+0x4c/0x50 [ 2745.363404,17] [<ffffffff810d5a8e>] ? rcu_irq_exit+0xae/0xc0 [ 2745.363408,17] [<ffffffff817c99fe>] do_IRQ+0x15e/0x1a0 [ 2745.363413,17] [<ffffffff817c7ac9>] common_interrupt+0x89/0x89 [ 2745.363416,17] <EOI> [<ffffffff8132aa74>] ? delay_tsc+0x24/0x50 [ 2745.363425,17] [<ffffffff8132aa04>] __udelay+0x34/0x40 [ 2745.363457,17] [<ffffffffa04d45ff>] qed_mcp_cmd_and_union+0x36f/0x7d0 [qed] [ 2745.363473,17] [<ffffffffa04d5ced>] qed_mcp_nvm_rd_cmd+0x4d/0x90 [qed] [ 2745.363490,17] [<ffffffffa04e1dc7>] qed_mcp_trace_dump+0x4a7/0x630 [qed] [ 2745.363504,17] [<ffffffffa04e2556>] ? qed_fw_asserts_dump+0x1d6/0x1f0 [qed] [ 2745.363520,17] [<ffffffffa04e4ea7>] qed_dbg_mcp_trace_get_dump_buf_size+0x37/0x80 [qed] [ 2745.363536,17] [<ffffffffa04ea881>] qed_dbg_feature_size+0x61/0xa0 [qed] [ 2745.363551,17] [<ffffffffa04eb427>] qed_dbg_all_data_size+0x247/0x260 [qed] [ 2745.363560,17] [<ffffffffa0482c10>] qede_get_regs_len+0x30/0x40 [qede] [ 2745.363566,17] [<ffffffff816c9783>] ethtool_get_drvinfo+0xe3/0x190 [ 2745.363570,17] [<ffffffff816cc152>] dev_ethtool+0x1362/0x2140 [ 2745.363575,17] [<ffffffff8109bcc6>] ? finish_task_switch+0x76/0x260 [ 2745.363580,17] [<ffffffff817c2116>] ? __schedule+0x3c6/0x9d0 [ 2745.363585,17] [<ffffffff810dbd50>] ? hrtimer_start_range_ns+0x1d0/0x370 [ 2745.363589,17] [<ffffffff816c1e5b>] ? dev_get_by_name_rcu+0x6b/0x90 [ 2745.363594,17] [<ffffffff816de6a8>] dev_ioctl+0xe8/0x710 [ 2745.363599,17] [<ffffffff816a58a8>] sock_do_ioctl+0x48/0x60 [ 2745.363603,17] [<ffffffff816a5d87>] sock_ioctl+0x1c7/0x280 [ 2745.363608,17] [<ffffffff8111f393>] ? seccomp_phase1+0x83/0x220 [ 2745.363612,17] [<ffffffff811e3503>] do_vfs_ioctl+0x2b3/0x4e0 [ 2745.363616,17] [<ffffffff811e3771>] SyS_ioctl+0x41/0x70 [ 2745.363619,17] [<ffffffff817c6ffe>] entry_SYSCALL_64_fastpath+0x1e/0x79 [ 2745.363622,17] ---[ end trace f6954aa440266421 ]--- Fixes: c965db44 ("qed: Add support for debug data collection") Signed-off-by: Caleb Sander <csander@purestorage.com> Acked-by: Alok Prasad <palok@marvell.com> Link: https://lore.kernel.org/r/20230103233021.1457646-1-csander@purestorage.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-