Commit 572f9caa authored by Moshe Shemesh's avatar Moshe Shemesh Committed by Jakub Kicinski

net/mlx5: Fix missing lock on sync reset reload

On sync reset reload work, when remote host updates devlink on reload
actions performed on that host, it misses taking devlink lock before
calling devlink_remote_reload_actions_performed() which results in
triggering lock assert like the following:

WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50
…
 CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S      W          6.10.0-rc2+ #116
 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015
 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core]
 RIP: 0010:devl_assert_locked+0x3e/0x50
…
 Call Trace:
  <TASK>
  ? __warn+0xa4/0x210
  ? devl_assert_locked+0x3e/0x50
  ? report_bug+0x160/0x280
  ? handle_bug+0x3f/0x80
  ? exc_invalid_op+0x17/0x40
  ? asm_exc_invalid_op+0x1a/0x20
  ? devl_assert_locked+0x3e/0x50
  devlink_notify+0x88/0x2b0
  ? mlx5_attach_device+0x20c/0x230 [mlx5_core]
  ? __pfx_devlink_notify+0x10/0x10
  ? process_one_work+0x4b6/0xbb0
  process_one_work+0x4b6/0xbb0
[…]

Fixes: 84a433a4 ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
Link: https://patch.msgid.link/20240730061638.1831002-6-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent 3fda84dc
...@@ -207,6 +207,7 @@ int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev) ...@@ -207,6 +207,7 @@ int mlx5_fw_reset_set_live_patch(struct mlx5_core_dev *dev)
static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unloaded) static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unloaded)
{ {
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset; struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
struct devlink *devlink = priv_to_devlink(dev);
/* if this is the driver that initiated the fw reset, devlink completed the reload */ /* if this is the driver that initiated the fw reset, devlink completed the reload */
if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) { if (test_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags)) {
...@@ -218,9 +219,11 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unload ...@@ -218,9 +219,11 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev, bool unload
mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n"); mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
else else
mlx5_load_one(dev, true); mlx5_load_one(dev, true);
devlink_remote_reload_actions_performed(priv_to_devlink(dev), 0, devl_lock(devlink);
devlink_remote_reload_actions_performed(devlink, 0,
BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) | BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE)); BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE));
devl_unlock(devlink);
} }
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment