- 12 Feb, 2021 21 commits
-
-
Shay Drory authored
In dual port mode, setting roce enabled/disable for the slave device have no effect. e.g.: the slave device roce status remain unchanged. Therefore disable it and add an error message. Enable or disable roce of the master device affect both master and slave devices. Fixes: cc9defcb ("net/mlx5: Handle "enable_roce" devlink param") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
Devlink reload can't be allowed on a multi port slave device, because reload of slave device doesn't take effect. The right flow is to disable devlink reload for multi port slave device. Hence, disabling it in mlx5_core probing. Fixes: 4383cfcc ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
wait_for_resync is unreliable - if it timeouts, priv_rx will be freed anyway. However, mlx5e_ktls_handle_get_psv_completion will be called sooner or later, leading to use-after-free. For example, it can happen if a CQ error happened, and ICOSQ stopped, but later on the queues are destroyed, and ICOSQ is flushed with mlx5e_free_icosq_descs. This patch converts the lifecycle of priv_rx to fully refcount-based, so that the struct won't be freed before the refcount goes to zero. Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
The commit mentioned below has split the parameters of ICOSQ and async ICOSQ, but it contained a typo: the CQ parameters were swapped for ICOSQ and async ICOSQ. Async ICOSQ is longer than the normal ICOSQ, and the CQ size must be the same as the size of the corresponding SQ, but due to this bug, the CQ of async ICOSQ was much shorter than async ICOSQ itself. It led to overflows of the CQ with such messages in dmesg, in particular, when running multiple kTLS-offloaded streams: mlx5_core 0000:08:00.0: cq_err_event_notifier:529:(pid 9422): CQ error on CQN 0x406, syndrome 0x1 mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000406 event=0x04 This commit fixes the issue by using the corresponding parameters for ICOSQ and async ICOSQ. Fixes: c293ac92 ("net/mlx5e: Refactor build channel params") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
The commit cited below switched from using napi_synchronize to synchronize_rcu to have a guarantee that it will finish in finite time. However, on average, synchronize_rcu takes more time than napi_synchronize. Given that it's called multiple times per channel on deactivation, it accumulates to a significant amount, which causes timeouts in some applications (for example, when using bonding with NetworkManager). This commit replaces synchronize_rcu with synchronize_net, which is faster when called under rtnl_lock, allowing to speed up the described flow. Fixes: 9c25a22d ("net/mlx5e: Use synchronize_rcu to sync with NAPI") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Shay Drory authored
Currently, when we discover a fatal error, we are queueing a work that will wait for a lock in order to enter the device to error state. Meanwhile, FW commands are still being processed, and gets timeouts. This can block the driver for few minutes before the work will manage to get the lock and enter to error state. Setting the device to error state before queueing health work, in order to avoid FW commands being processed while the work is waiting for the lock. Fixes: c1d4d2e9 ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
struct mlx5e_params contains fields ({rx,tx}_cq_moderation) that depend on two things: whether DIM is enabled and the state of a private flag (MLX5E_PFLAG_{RX,TX}_CQE_BASED_MODER). Whenever the DIM state changes, mlx5e_reset_{rx,tx}_moderation is called to update the fields, however, only if the channels are open. The flow where the channels are closed misses the required update of the fields. This commit moves the calls of mlx5e_reset_{rx,tx}_moderation, so that they run in both flows. Fixes: ebeaf084 ("net/mlx5e: Properly set default values when disabling adaptive moderation") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Maxim Mikityanskiy authored
When mlx5e_ethtool_set_coalesce doesn't change DIM state (enabled/disabled), it calls mlx5e_set_priv_channels_coalesce unconditionally, which in turn invokes a firmware command to set interrupt moderation parameters. It shouldn't happen while DIM manages those parameters dynamically (it might even be happening at the same time). This patch fixes it by splitting mlx5e_set_priv_channels_coalesce into two functions (for RX and TX) and calling them only when DIM is disabled (for RX and TX respectively). Fixes: cb3c7fd4 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Raed Salem authored
This limitation was inherited by previous Innova (FPGA) IPsec implementation, it uses its private set of RQ handlers which does not support XDP, for Connect-X this is no longer true. Fix by keeping this limitation only for Innova IPsec supporting devices, as otherwise this limitation effectively wrongly blocks XDP for all future Connect-X devices for all flows even if IPsec offload is not used. Fixes: 2d64663c ("net/mlx5: IPsec: Add HW crypto offload support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Raed Salem authored
This limitation was inherited by previous Innova (FPGA) IPsec implementation, it uses its private set of RQ handlers which does not support striding rq, for Connect-X this is no longer true. Fix by keeping this limitation only for Innova IPsec supporting devices, as otherwise this limitation effectively wrongly blocks striding RQs for all future Connect-X devices for all flows even if IPsec offload is not used. Fixes: 2d64663c ("net/mlx5: IPsec: Add HW crypto offload support") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Parav Pandit authored
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to apply_police_params(). Due to this when police rate is higher than 4Gbps, 32-bit calculation ignores the carry. This results in incorrect rate configurationn the device. Fix it by performing 64-bit calculation. Fixes: fcb64c0f ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
David S. Miller authored
Mat Martineau says: ==================== mptcp: Miscellaneous fixes Here are some MPTCP fixes for the -net tree, addressing various issues we have seen thanks to syzkaller and other testing: Patch 1 correctly propagates errors at connection time and for TCP fallback connections. Patch 2 sets the expected poll() events on SEND_SHUTDOWN. Patch 3 fixes a retranmit crash and unneeded retransmissions. Patch 4 fixes possible uninitialized data on the error path during socket creation. Patch 5 addresses a problem with MPTCP window updates. Patch 6 fixes a case where MPTCP retransmission can get stuck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Currently we do not schedule the MPTCP retransmission timer after pushing the data when such action happens in the subflow context. This may cause hang-up on active-backup scenarios, or even when only single subflow msks are involved, if we lost some peer's ack. Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Move mptcp_cleanup_rbuf() related checks inside the mentioned helper and extend them to mirror TCP checks more closely. Additionally drop the 'rmem_pending' hack, since commit 87952603 ("mptcp: protect the rx path with the msk socket spinlock") we can use instead 'rmem_released'. Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The mptcp subflow route_req() callback performs the subflow req initialization after the route_req() check. If the latter fails, mptcp-specific bits of the current request sockets are left uninitialized. The above causes bad things at req socket disposal time, when the mptcp resources are cleared. This change addresses the issue by splitting subflow_init_req() into the actual initialization and the mptcp-specific checks. The initialization is moved before any possibly failing check. Reported-by: Christoph Paasch <cpaasch@apple.com> Fixes: 7ea851d1 ("tcp: merge 'init_req' and 'route_req' functions") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Syzkaller was able to trigger the following splat again: WARNING: CPU: 1 PID: 12512 at net/mptcp/protocol.c:761 mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761 Modules linked in: CPU: 1 PID: 12512 Comm: kworker/1:6 Not tainted 5.10.0-rc6 #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761 Code: e8 4b 0c ad ff e8 56 21 88 fe 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00 00 00 00 48 83 c4 40 5b 5d 41 5c c3 e8 36 21 88 fe <0f> 0b 41 bc c8 00 00 00 eb 98 e8 e7 b1 af fe e9 30 ff ff ff 48 c7 RSP: 0018:ffffc900018c7c68 EFLAGS: 00010293 RAX: ffff888108cb1c80 RBX: 1ffff92000318f8d RCX: ffffffff82ad0307 RDX: 0000000000000000 RSI: ffffffff82ad036a RDI: 0000000000000007 RBP: ffff888113e2d000 R08: ffff888108cb1c80 R09: ffffed10227c5ab7 R10: ffff888113e2d5b7 R11: ffffed10227c5ab6 R12: 0000000000000000 R13: ffff88801f100000 R14: ffff888113e2d5b0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd76a874ef8 CR3: 000000001689c005 CR4: 0000000000170ee0 Call Trace: mptcp_worker+0xaa4/0x1560 net/mptcp/protocol.c:2334 process_one_work+0x8d3/0x1200 kernel/workqueue.c:2272 worker_thread+0x9c/0x1090 kernel/workqueue.c:2418 kthread+0x303/0x410 kernel/kthread.c:292 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296 The mptcp_worker tries to update the MPTCP retransmission timer even if such timer is not currently scheduled. The mptcp_rtx_head() return value is bogus: we can have enqueued data not yet transmitted. The above may additionally cause spurious, unneeded MPTCP-level retransmissions. Fix the issue adding an explicit clearing of the rtx queue before trying to retransmit and checking for unacked data. Additionally drop an unneeded timer stop call and the unused mptcp_rtx_tail() helper. Reported-by: Christoph Paasch <cpaasch@apple.com> Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The current mptcp_poll() implementation gives unexpected results after shutdown(SEND_SHUTDOWN) and when the msk status is TCP_CLOSE. Set the correct mask. Fixes: 8edf0864 ("mptcp: rework poll+nospace handling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Currently all errors received on msk subflows are ignored. We need to catch at least the errors on connect() and on fallback sockets. Use a custom sk_error_report callback at subflow level, and do the real action under the msk socket lock - via the usual sock_owned_by_user()/release_callback() schema. Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
Claudiu reported that on his system S2R cuts off power to the PHY and after resuming certain PHY settings are lost. The PM folks confirmed that cutting off power to selected components in S2R is a valid case. Therefore resuming from S2R, same as from hibernation, has to assume that the PHY has power-on defaults. As a consequence use the restore callback also as resume callback. In addition make sure that the interrupt configuration is restored. Let's do this in phy_init_hw() and ensure that after this call actual interrupt configuration is in sync with phydev->interrupts. Currently, if interrupt was enabled before hibernation, we would resume with interrupt disabled because that's the power-on default. This fix applies cleanly only after the commit marked as fixed. I don't have an affected system, therefore change is compile-tested only. [0] https://lore.kernel.org/netdev/1610120754-14331-1-git-send-email-claudiu.beznea@microchip.com/ Fixes: 611d779a ("net: phy: fix MDIO bus PM PHY resuming") Reported-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ioana Ciornei authored
If xdp_do_redirect() fails, the calling driver should handle recycling or freeing of the page associated with the frame. The dpaa2-eth driver didn't do either of them and just incremented a counter. Fix this by trying to DMA map back the page and recycle it or, if the mapping fails, just free it. Fixes: d678be1d ("dpaa2-eth: add XDP_REDIRECT support") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tong Zhang authored
FSL_ENETC_MDIO use symbols from PHYLIB (MDIO_BUS) and MDIO_DEVRES, however there are no dependency specified in Kconfig ERROR: modpost: "__mdiobus_register" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined! ERROR: modpost: "mdiobus_unregister" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined! ERROR: modpost: "devm_mdiobus_alloc_size" [drivers/net/ethernet/freescale/enetc/fsl-enetc-mdio.ko] undefined! add depends on MDIO_DEVRES && MDIO_BUS Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 11 Feb, 2021 7 commits
-
-
Nathan Rossi authored
The aq_nic_start function can fail in a variety of cases which leaves the device in broken state. An example case where the start function fails is the request_threaded_irq which can be interrupted, resulting in a EINTR result. This can be manually triggered by bringing the link up (e.g. ip link set up) and triggering a SIGINT on the initiating process (e.g. Ctrl+C). This would put the device into a half configured state. Subsequently bringing the link up again would cause the napi_enable to BUG. In order to correctly clean up the failed attempt to start a device call aq_nic_stop. Signed-off-by: Nathan Rossi <nathan.rossi@digi.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: 2 bug fixes. Two unrelated fixes. The first one fixes intermittent false TX timeouts during ring reconfigurations. The second one fixes a formatting discrepancy between the stored and the running FW versions. Please also queue these for -stable. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vasundhara Volam authored
The running fw.psid version is in decimal format but the stored fw.psid is in hex format. This can mislead the user to reset the NIC to activate the stored version to become the running version. Fix it to display the stored fw.psid in decimal format. Fixes: 1388875b ("bnxt_en: Add stored FW version info to devlink info_get cb.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Edwin Peer authored
A TX queue can potentially immediately timeout after it is stopped and the last TX timestamp on that queue was more than 5 seconds ago with carrier still up. Prevent these intermittent false TX timeouts by bringing down carrier first before calling netif_tx_disable(). Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sukadev Bhattiprolu authored
If set_link_state() fails for any reason, we still cleanup the adapter state and cannot recover from a partial close anyway. So set the adapter to CLOSED state. That way if a new soft/hard reset is processed, the adapter will remain in the CLOSED state until the next ibmvnic_open(). Fixes: 01d9bd79 ("ibmvnic: Reorganize device close") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp_rmem[1] has been changed to 131072, we should update the documentation to reflect this. Fixes: a337531b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Zhibin Liu <zhibinliu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Point out where patchwork bot's code lives, and that we don't want people posting stuff that doesn't build. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 10 Feb, 2021 12 commits
-
-
wenxu authored
Reject the unsupported and invalid ct_state flags of cls flower rules. Fixes: e0ace68a ("net/sched: cls_flower: Add matching on conntrack info") Signed-off-by: wenxu <wenxu@ucloud.cn> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "Address a performance regression related to scale-invariance on x86 that may prevent turbo CPU frequencies from being used in certain workloads on systems using acpi-cpufreq as the CPU performance scaling driver and schedutil as the scaling governor" * tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there cpufreq: ACPI: Extend frequency tables to cover boost frequencies
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fix from Rafael Wysocki: "Revert a problematic ACPICA commit that changed the code to attempt to update memory regions which may be read-only on some systems (Ard Biesheuvel)" * tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"
-
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengineLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: "Some late fixes for dmaengine: Core: - fix channel device_node deletion Driver fixes: - dw: revert of runtime pm enabling - idxd: device state fix, interrupt completion and list corruption - ti: resource leak * tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine dw: Revert "dmaengine: dw: Enable runtime PM" dmaengine: idxd: check device state before issue command dmaengine: ti: k3-udma: Fix a resource leak in an error handling path dmaengine: move channel device_node deletion to driver dmaengine: idxd: fix misc interrupt completion dmaengine: idxd: Fix list corruption in description completion
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from David Miller: "Another pile of networing fixes: 1) ath9k build error fix from Arnd Bergmann 2) dma memory leak fix in mediatec driver from Lorenzo Bianconi. 3) bpf int3 kprobe fix from Alexei Starovoitov. 4) bpf stackmap integer overflow fix from Bui Quang Minh. 5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from Christoph Schemmel. 6) Don't update deleted entry in xt_recent netfilter module, from Jazsef Kadlecsik. 7) Use after free in nftables, fix from Pablo Neira Ayuso. 8) Header checksum fix in flowtable from Sven Auhagen. 9) Validate user controlled length in qrtr code, from Sabyrzhan Tasbolatov. 10) Fix race in xen/netback, from Juergen Gross, 11) New device ID in cxgb4, from Raju Rangoju. 12) Fix ring locking in rxrpc release call, from David Howells. 13) Don't return LAPB error codes from x25_open(), from Xie He. 14) Missing error returns in gsi_channel_setup() from Alex Elder. 15) Get skb_copy_and_csum_datagram working properly with odd segment sizes, from Willem de Bruijn. 16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean. 17) Do teardown on probe failure in DSA, from Vladimir Oltean. 18) Fix compilation failures of txtimestamp selftest, from Vadim Fedorenko. 19) Limit rx per-napi gro queue size to fix latency regression, from Eric Dumazet. 20) dpaa_eth xdp fixes from Camelia Groza. 21) Missing txq mode update when switching CBS off, in stmmac driver, from Mohammad Athari Bin Ismail. 22) Failover pending logic fix in ibmvnic driver, from Sukadev Bhattiprolu. 23) Null deref fix in vmw_vsock, from Norbert Slusarek. 24) Missing verdict update in xdp paths of ena driver, from Shay Agroskin. 25) seq_file iteration fix in sctp from Neil Brown. 26) bpf 32-bit src register truncation fix on div/mod, from Daniel Borkmann. 27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann. 28) Fix locking in vsock_shutdown(), from Stefano Garzarella. 29) Various missing index bound checks in hns3 driver, from Yufeng Mo. 30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from Vladimir Oltean. 31) Don't mix up stp and mrp port states in bridge layer, from Horatiu Vultur. 32) Fix locking during netif_tx_disable(), from Edwin Peer" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) bpf: Fix 32 bit src register truncation on div/mod bpf: Fix verifier jmp32 pruning decision logic bpf: Fix verifier jsgt branch analysis on max bound vsock: fix locking in vsock_shutdown() net: hns3: add a check for index in hclge_get_rss_key() net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() net: hns3: add a check for queue_id in hclge_reset_vf_queue() net: dsa: felix: implement port flushing on .phylink_mac_link_down switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state net: watchdog: hold device global xmit lock during tx disable netfilter: nftables: relax check for stateful expressions in set definition netfilter: conntrack: skip identical origin tuple in same zone only vsock/virtio: update credit only if socket is not closed net: fix iteration for sctp transport seq_files net: ena: Update XDP verdict upon failure net/vmw_vsock: improve locking in vsock_connect_timeout() net/vmw_vsock: fix NULL pointer dereference ibmvnic: Clear failover_pending if unable to schedule net: stmmac: set TxQ mode back to DCB after disabling CBS ...
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, mremap, tmpfs, selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and firmware" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: nilfs2: make splice write available again mm, slub: better heuristic for number of cpus when calculating slab order Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" MAINTAINERS: update Andrey Ryabinin's email address selftests/vm: rename file run_vmtests to run_vmtests.sh tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 mm/mremap: fix BUILD_BUG_ON() error in get_extent firmware_loader: align .builtin_fw to 8 kasan: fix stack traces dependency for HW_TAGS squashfs: add more sanity checks in xattr id lookup squashfs: add more sanity checks in inode lookup squashfs: add more sanity checks in id lookup squashfs: avoid out of bounds writes in decompressors
-
Joachim Henke authored
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit 36e2c742 ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.comSigned-off-by: Joachim Henke <joachim.henke@t-systems.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit 045ab8c9 ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in 32a6f409 ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of 045ab8c9 for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before 045ab8c9. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: 045ab8c9 ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2021-02-10 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 8 day(s) which contain a total of 3 files changed, 22 insertions(+), 21 deletions(-). The main changes are: 1) Fix missed execution of kprobes BPF progs when kprobe is firing via int3, from Alexei Starovoitov. 2) Fix potential integer overflow in map max_entries for stackmap on 32 bit archs, from Bui Quang Minh. 3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> c# Please enter a commit message to explain why this merge is necessary,
-
Johannes Weiner authored
This reverts commit 536d3bf2, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff9291 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrey Ryabinin authored
Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.comSigned-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rong Chen authored
Commit c2aa8afc has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com Fixes: c2aa8afc (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen <rong.a.chen@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-