- 10 Mar, 2020 7 commits
-
-
Heiner Kallweit authored
So far tx_skb->skb is the only member of the two structs that is not reset. Make understanding the code easier by resetting both structs completely in rtl8169_unmap_tx_skb. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Slightly improve the code by converting this while to a for loop. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Julian Wiedmann says: ==================== s390/qeth: updates 2020-03-06 please apply the following patch series for qeth to netdev's net-next tree. Just a small update to take care of a regression wrt to IRQ handling in net-next, reported by Qian Cai. The fix needs some qdio layer changes, so you will find Vasily's Acked-by in that patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
After recent cleanups this is just a complicated wrapper around an u32*. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Once the call to qdio_establish() has completed, qdio is free to deliver data IRQs to the device driver's IRQ poll handler. For qeth (the only qdio driver that currently uses IRQ polling) this is problematic, since the IRQs can arrive before its NAPI instance is even registered. Calling napi_schedule() from qeth_qdio_start_poll() then crashes in various nasty ways. Until recently qeth checked for IFF_UP to drop such early interrupts, but that's fragile as well since it doesn't enforce any ordering. Fix this properly by bringing up the qdio device in IRQS_DISABLED mode, and have the driver explicitly opt-in to receive data IRQs. qeth does so from qeth_open(), which kick-starts a NAPI poll and then calls qdio_start_irq() from qeth_poll(). Also add a matching qdio_stop_irq() in qeth_stop() to switch the qdio dataplane back into a disabled state. Fixes: 3d35dbe6 ("s390/qeth: don't check for IFF_UP when scheduling napi") CC: Qian Cai <cai@lca.pw> Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Leslie Monis authored
Commit 105e808c ("pie: remove pie_vars->accu_prob_overflows") changes the scale of probability values in PIE from (2^64 - 1) to (2^56 - 1). This affects the precision of tc_pie_xstats->prob in user space. This patch ensures user space is unaffected. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yousuk Seung authored
Add TCP_NLA_BYTES_NOTSENT to SCM_TIMESTAMPING_OPT_STATS that reports bytes in the write queue but not sent. This is the same metric as what is exported with tcp_info.tcpi_notsent_bytes. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 09 Mar, 2020 33 commits
-
-
Thomas Bogendoerfer authored
Commit a8d0f11e ("MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 module") fixes the root cause of not detected PHYs. Therefore the workaround can go away now. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alex Elder says: ==================== net: introduce Qualcomm IPA driver (UPDATED) This series presents the driver for the Qualcomm IP Accelerator (IPA). This is version 2 of this updated series. It includes the following small changes since the previous version: - Now based on net-next instead of v5.6-rc - Config option now named CONFIG_QCOM_IPA - Some minor cleanup in the GSI code - Small change to replenish logic - No longer depends on remoteproc bug fixes What follows is the basically same explanation as was posted previously. -Alex I have posted earlier versions of this code previously, but it has undergone quite a bit of development since the last time, so rather than calling it "version 3" I'm just treating it as a new series (indicating it's been updated in this message). The fast/data path is the same as before. But the driver now (nearly) supports a second platform, its transaction handling has been generalized and improved, and modem activities are now handled in a more unified way. This series is available (based on net-next in branch "ipa_updated-v2" in this git repository: https://git.linaro.org/people/alex.elder/linux.git The branch depends on other one other small patch that I sent out for review earlier. https://lore.kernel.org/lkml/20200306042302.17602-1-elder@linaro.org/ I want to address some of the discussion that arose last time. First, there was the WWAN discussion. Here's the history: - This was last posted nine months ago. - Reviewers at that time favored developing a new WWAN subsystem that would be used for managing devices like this. And the suggestion was to not accept this driver until that could be developed. - Along the way, Apple acquired much of Intel's modem business. And as a result, the generic framework became less pressing. - I did participate in the WWAN subsystem design however, and although it went dormant for a while it's been resurrected: https://lore.kernel.org/netdev/20200225100053.16385-1-johannes@sipsolutions.net/ - Unfortunately the proposed WWAN design was not an easy fit with Qualcomm's integrated modem interfaces. Given that rmnet is a supported link type for in the upstream "iproute2" package (more on this below), I have opted not to integrate with any WWAN subsystem. So in summary, this driver does not integrate with a generic WWAN framework. And I'd like it to be accepted upstream despite that. Next, Arnd Bergmann had some concerns about flow control. (Note: some of my discussions with Arnd about this were offline.) The overall architecture here also involves the "rmnet" driver: drivers/net/ethernet/qualcomm/rmnet The rmnet driver presents a network device for use. It connects with another network device presented, by the IPA driver. The rmnet driver wraps (and unwraps) packets transferred to (and from) the IPA driver with QMAP headers. --------------- | rmnet_data0 | <-- "real" netdev --------------- || }- QMAP spoken here -------------- | rmnet_ipa0 | <-- also netdev, transporting QMAP packets -------------- || -------------- ( IPA hardware ) -------------- Arnd's concern was that the rmnet_data0 network device does not have the benefit of information about the state of the underlying IPA hardware in order to be effective in controlling TX flow. The feared result is over-buffering of TX packets (bufferbloat). I began working on some simple experiments to see whether (or how much) his concern was warranted. But it turned out that completing these experiments was much more work than had been hoped. The rmnet driver is present in the upstream kernel. There is also support for the rmnet link type in the upstream "ip" user space command in the "iproute2" package. Changing the layering of rmnet over IPA likely involves deprecating the rmnet driver and its support in "iproute2". I would really rather not go down that path. There is precedent for this sort of layering of network devices (L2TP, VLAN). And any architecture like this would suffer the issues Arnd mentioned; the problem is not limited to rmnet and IPA. I do think this is a problem worth solving, but the prudent thing to do might be to try to solve it more generally. So to summarize on this issue, this driver does not attempt to change the way the rmnet and IPA drivers work together. And even though I think Arnd's concerns warrant more investigation, I'd like this driver to to be accepted upstream without any change to this architecture. Finally, a more technical description for the series, and some acknowledgements to some people who contributed to it. The IPA is a component present in some Qualcomm SoCs that allows network functions such as aggregation, filtering, routing, and NAT to be performed without active involvement of the main application processor (AP). In this initial patch series these advanced features are not implemented. The IPA driver simply provides a network interface that makes the modem's LTE network available in Linux. This initial series supports only the Qualcomm SDM845 SoC. The Qualcomm SC7180 SoC is partially supported, and support for other platforms will follow. This code is derived from a driver developed by Qualcomm. A version of the original source can be seen here: https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree in the "drivers/platform/msm/ipa" directory. Many were involved in developing this, but the following individuals deserve explicit acknowledgement for their substantial contributions: Abhishek Choubey Ady Abraham Chaitanya Pratapa David Arinzon Ghanim Fodi Gidon Studinski Ravi Gummadidala Shihuan Liu Skylar Chang ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Add IPA-related nodes and definitions to "sdm845.dtsi". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Add an entry in the MAINTAINERS file for the Qualcomm IPA driver Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Add build and Kconfig support for the Qualcomm IPA driver. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch implements two forms of out-of-band communication between the AP and modem. - QMI is a mechanism that allows clients running on the AP interact with services running on the modem (and vice-versa). The AP IPA driver uses QMI to communicate with the corresponding IPA driver resident on the modem, to agree on parameters used with the IPA hardware and to ensure both sides are ready before entering operational mode. - SMP2P is a more primitive mechanism available for the modem and AP to communicate with each other. It provides a means for either the AP or modem to interrupt the other, and furthermore, to provide 32 bits worth of information. The IPA driver uses SMP2P to tell the modem what the state of the IPA clock was in the event of a crash. This allows the modem to safely access the IPA hardware (or avoid doing so) when a crash occurs, for example, to access information within the IPA hardware. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch includes code implementing the modem functionality. There are several communication paths between the AP and modem, separate from the main data path provided by IPA. SMP2P provides primitive messaging and interrupt capability, and QMI allows more complex out-of-band messaging to occur between entities on the AP and modem. (SMP2P and QMI support are added by the next patch.) Management of these (plus the network device implementing the data path) is done by code within "ipa_modem.c". Sort of unrelated, this patch also includes the code supporting the microcontroller CPU present on the IPA. The microcontroller can be used to implement special handling of packets, but at this time we don't support that. Still, it is a component that needs to be initialized, and in the event of a crash we need to do some synchronization between the AP and the microcontroller. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
One TX endpoint (per EE) is used for issuing immediate commands to the IPA. These commands request activites beyond simple data transfers to be done by the IPA hardware. For example, the IPA is able to manage routing packets among endpoints, and immediate commands are used to configure tables used for that routing. Immediate commands are built on top of GSI transactions. They are different from normal transfers (in that they use a special endpoint, and their "payload" is interpreted differently), so separate functions are used to issue immediate command transactions. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch contains code implementing filter and routing tables for the IPA. A filter table allows rules to be used for filtering packets that depart the AP at an endpoint. A filter table entry contains the address of a set of rules to apply for each endpoint that supports filtering. A routing table allows packets to be routed to an endpoint based on packet metadata. It is also a table whose entries each contain the address of a set of routing rules to apply. Neither filtering nor routing is supported by the current driver. All table entries refer to rules that mean "no filtering" and "no routing." Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch includes the code implementing an IPA endpoint. This is the primary abstraction implemented by the IPA. An endpoint is one end of a network connection between two entities physically connected to the IPA. Specifically, the AP and the modem implement endpoints, and an (AP endpoint, modem endpoint) pair implements the transfer of network data in one direction between the AP and modem. Endpoints are built on top of GSI channels, but IPA endpoints represent the higher-level functionality that the IPA provides. Data can be sent through a GSI channel, but it is the IPA endpoint that represents what is on the "other end" to receive that data. Other functionality, including aggregation, checksum offload and (at some future date) IP routing and filtering are all associated with the IPA endpoint. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch implements GSI transactions. A GSI transaction is a structure that represents a single request (consisting of one or more TREs) sent to the GSI hardware. The last TRE in a transaction includes a flag requesting that the GSI interrupt the AP to notify that it has completed. TREs are executed and completed strictly in order. For this reason, the completion of a single TRE implies that all previous TREs (in particular all of those "earlier" in a transaction) have completed. Whenever there is a need to send a request (a set of TREs) to the IPA, a GSI transaction is allocated, specifying the number of TREs that will be required. Details of the request (e.g. transfer offsets and length) are represented by in a Linux scatterlist array that is incorporated in the transaction structure. Once all commands (TREs) are added to a transaction it is committed. When the hardware signals that the request has completed, a callback function allows for cleanup or followup activity to be performed before the transaction is freed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch provides interface functions supplied by the IPA layer that are called from the GSI layer. One function is called when a GSI transaction has completed. The others allow the GSI layer to inform the IPA layer when the hardware has been told it has new TREs to execute, and when the hardware has indicated transactions have completed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch includes "gsi.c", which implements the generic software interface (GSI) for IPA. The generic software interface abstracts channels, which provide a means of transferring data either from the AP to the IPA, or from the IPA to the AP. A ring buffer of "transfer elements" (TREs) is used to describe data transfers to perform. The AP writes a doorbell register associated with a channel to let it know it has added new entries (for an AP->IPA channel) or has finished processing entries (for an IPA->AP channel). Each channel also has an event ring buffer, used by the IPA to communicate information about events related to a channel (for example, the completion of TREs). The IPA writes its own doorbell register, which triggers an interrupt on the AP, to signal that new event information has arrived. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
The Generic Software Interface is a layer of the IPA driver that abstracts the underlying hardware. The next patch includes the main code for GSI (including some additional documentation). This patch just includes three GSI header files. - "gsi.h" is the top-level GSI header file. This structure is is embedded within the IPA structure. The main abstraction implemented by the GSI code is the channel, and this header exposes several operations that can be performed on a GSI channel. - "gsi_private.h" exposes some definitions that are intended to be private, used only by the main GSI code and the GSI transaction code (defined in an upcoming patch). - Like "ipa_reg.h", "gsi_reg.h" defines the offsets of the 32-bit registers used by the GSI layer, along with masks that define the position and width of fields less than 32 bits located within these registers. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch incorporates three source files (and their headers). They're grouped into one patch mainly for the purpose of making the number and size of patches in this series somewhat reasonable. - "ipa_clock.c" and "ipa_clock.h" implement clocking for the IPA device. The IPA has a single core clock managed by the common clock framework. In addition, the IPA has three buses whose bandwidth is managed by the Linux interconnect framework. At this time the core clock and all three buses are either on or off; we don't yet do any more fine-grained management than that. The core clock and interconnects are enabled and disabled as a unit, using a unified clock-like abstraction, ipa_clock_get()/ipa_clock_put(). - "ipa_interrupt.c" and "ipa_interrupt.h" implement IPA interrupts. There are two hardware IRQs used by the IPA driver (the other is the GSI interrupt, described in a separate patch). Several types of interrupt are handled by the IPA IRQ handler; these are not part of data/fast path. - The IPA has a region of local memory that is accessible by the AP (and modem). Within that region are areas with certain defined purposes. "ipa_mem.c" and "ipa_mem.h" define those regions, and implement their initialization. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch defines configuration data that is used to specify some of the details of IPA hardware supported by the driver. It is built as Device Tree match data, discovered at boot time. The driver supports the Qualcomm SDM845 SoC. Data for the Qualcomm SC7180 is also defined here, but it is not yet completely supported. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
This patch includes three source files that represent some basic "main program" code for the IPA driver. They are: - "ipa.h" defines the top-level IPA structure which represents an IPA device throughout the code. - "ipa_main.c" contains the platform driver probe function, along with some general code used during initialization. - "ipa_reg.h" defines the offsets of the 32-bit registers used for the IPA device, along with masks that define the position and width of fields within these registers. - "version.h" defines some symbolic IPA version numbers. Each file includes some documentation that provides a little more overview of how the code is organized and used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Add the binding definitions for the "qcom,ipa" device tree node. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Set up a subdev in the q6v5 modem remoteproc driver that generates event notifications for the IPA driver to use for initialization and recovery following a modem shutdown or crash. A pair of new functions provides a way for the IPA driver to register and deregister a notification callback function that will be called whenever modem events (about to boot, running, about to shut down, etc.) occur. A void pointer value (provided by the IPA driver at registration time) and an event type are supplied to the callback function. One event, MODEM_REMOVING, is signaled whenever the q6v5 driver is about to remove the notification subdevice. It requires the IPA driver de-register its callback. This sub-device is only used by the modem subsystem (MSS) driver, so the code that adds the new subdev and allows registration and deregistration of the notifier is found in "qcom_q6v6_mss.c". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sascha Hauer says: ==================== QorIQ DPAA: Use random MAC address when none is given Use random MAC addresses when they are not provided in the device tree. Tested on LS1046ARDB. Changes in v3: addressed all MAC types, removed some redundant code in dtsec in the process ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
If there is no valid MAC address in the device tree, use a random MAC address. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
Allow the initialization of the MAC to be performed even if the device tree does not provide a valid MAC address. Later a random MAC address should be assigned by the Ethernet driver. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
Reuse the set_mac_address() in the init() function. Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Updates. This series includes simplification and improvement of NAPI polling logic in bnxt_poll_p5(). The improvements will prevent starving the async events from firmware if we are in continuous NAPI polling. The rest of the patches include cleanups, a better return code for firmware busy, and to clear devlink port type more properly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
Similar to other drivers, properly clear the devlink port type when removing the device before unregistration. Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
If firmware command returns error code as HWRM_ERR_CODE_BUSY, which means it cannot handle the command due to a conflicting command from another function, convert it to -EAGAIN. If it is an ethtool operation, this error code will be returned to userspace. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
Return code is not needed in some of these functions, as the return code from firmware message is ignored. Remove the unused rc variable and also convert functions to void. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
As part of converting error code in firmware message to standard code, checking for firmware return code is removed in most of the places. Remove the assignment of return code where the function can directly return. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The driver stores a copy of the DCB settings that have been applied to the firmware. After firmware reset, the firmware settings are gone and will revert back to default. Clear the driver's copy so that if there is a DCBNL request to get the settings, the driver will retrieve the current settings from the firmware. lldpad keeps the DCB settings in userspace and will re-apply the settings if it is running. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
When we are in continuous NAPI polling mode, the current code in bnxt_poll_p5() will only process the completion rings and will not process the NQ until interrupt is re-enabled. Tis logic works and will not cause RX or TX starvation, but async events in the NQ may be delayed for the duration of continuous NAPI polling. These async events may be firmware or VF events. Continue to handle the NQ after we are done polling the completion rings. This actually simplies the code in bnxt_poll_p5(). Acknowledge the NQ so these async events will not overflow. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Simplify the function by removing tha 'all' parameter. In the current code, the caller has to specify whether to update/arm both completion rings with the 'all' parameter. Instead of this, we can just update/arm all the completion rings that have been polled. By setting cpr->had_work_done earlier in __bnxt_poll_work(), we know which completion ring has been polled and can just update/arm all the completion rings with cpr->had_work_done set. This simplifies the function with one less parameter and works just as well. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
In bnxt_poll_p5(), the logic polls for up to 2 completion rings (RX and TX) for work. In the current code, if we reach budget polling the first completion ring, we will stop. If the other completion ring has work to do, we will handle it when NAPI calls us back. This is not optimal. We potentially leave an unproceesed entry in the NQ. When we are finally done with NAPI polling and re-enable interrupt, the remaining entry in the NQ will cause interrupt to be triggered immediately for no reason. Modify the code in bnxt_poll_p5() to keep looping until all NQ entries are handled even if the first completion ring has reached budget. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Convert zones_lock spinlock to zones_mutex mutex, and struct (tcf_ct_flow_table)->ref to a refcount, so that control path can use regular GFP_KERNEL allocations from standard process context. This is more robust in case of memory pressure. The refcount is needed because tcf_ct_flow_table_put() can be called from RCU callback, thus in BH context. The issue was spotted by syzbot, as rhashtable_init() was called with a spinlock held, which is bad since GFP_KERNEL allocations can sleep. Note to developers : Please make sure your patches are tested with CONFIG_DEBUG_ATOMIC_SLEEP=y BUG: sleeping function called from invalid context at mm/slab.h:565 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9582, name: syz-executor610 2 locks held by syz-executor610/9582: #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:72 [inline] #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x3f9/0xad0 net/core/rtnetlink.c:5437 #1: ffffffff8a3961b8 (zones_lock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline] #1: ffffffff8a3961b8 (zones_lock){+...}, at: tcf_ct_flow_table_get+0xa3/0x1700 net/sched/act_ct.c:67 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 0 PID: 9582 Comm: syz-executor610 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 ___might_sleep.cold+0x1f4/0x23d kernel/sched/core.c:6798 slab_pre_alloc_hook mm/slab.h:565 [inline] slab_alloc_node mm/slab.c:3227 [inline] kmem_cache_alloc_node_trace+0x272/0x790 mm/slab.c:3593 __do_kmalloc_node mm/slab.c:3615 [inline] __kmalloc_node+0x38/0x60 mm/slab.c:3623 kmalloc_node include/linux/slab.h:578 [inline] kvmalloc_node+0x61/0xf0 mm/util.c:574 kvmalloc include/linux/mm.h:645 [inline] kvzalloc include/linux/mm.h:653 [inline] bucket_table_alloc+0x8b/0x480 lib/rhashtable.c:175 rhashtable_init+0x3d2/0x750 lib/rhashtable.c:1054 nf_flow_table_init+0x16d/0x310 net/netfilter/nf_flow_table_core.c:498 tcf_ct_flow_table_get+0xe33/0x1700 net/sched/act_ct.c:82 tcf_ct_init+0xba4/0x18a6 net/sched/act_ct.c:1050 tcf_action_init_1+0x697/0xa20 net/sched/act_api.c:945 tcf_action_init+0x1e9/0x2f0 net/sched/act_api.c:1001 tcf_action_add+0xdb/0x370 net/sched/act_api.c:1411 tc_ctl_action+0x366/0x456 net/sched/act_api.c:1466 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5440 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0xec/0x1b0 net/socket.c:2430 do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4403d9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd719af218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000005 R09: 00000000004002c8 R10: 0000000000000008 R11: 00000000000 Fixes: c34b961a ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Blakey <paulb@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-