1. 22 Nov, 2022 6 commits
    • Eli Cohen's avatar
      net/mlx5: Lag, avoid lockdep warnings · 0d4e8ed1
      Eli Cohen authored
      ldev->lock is used to serialize lag change operations. Since multiport
      eswtich functionality was added, we now change the mode dynamically.
      However, acquiring ldev->lock is not allowed as it could possibly lead
      to a deadlock as reported by the lockdep mechanism.
      
      [  836.154963] WARNING: possible circular locking dependency detected
      [  836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted
      [  836.156549] ------------------------------------------------------
      [  836.157418] handler1/12198 is trying to acquire lock:
      [  836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.159575]
      [  836.159575] but task is already holding lock:
      [  836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
      [  836.161669] which lock already depends on the new lock.
      [  836.162905]
      [  836.162905] the existing dependency chain (in reverse order) is:
      [  836.164008] -> #3 (&block->cb_lock){++++}-{3:3}:
      [  836.164946]        down_write+0x25/0x60
      [  836.165548]        tcf_block_get_ext+0x1c6/0x5d0
      [  836.166253]        ingress_init+0x74/0xa0 [sch_ingress]
      [  836.167028]        qdisc_create.constprop.0+0x130/0x5e0
      [  836.167805]        tc_modify_qdisc+0x481/0x9f0
      [  836.168490]        rtnetlink_rcv_msg+0x16e/0x5a0
      [  836.169189]        netlink_rcv_skb+0x4e/0xf0
      [  836.169861]        netlink_unicast+0x190/0x250
      [  836.170543]        netlink_sendmsg+0x243/0x4b0
      [  836.171226]        sock_sendmsg+0x33/0x40
      [  836.171860]        ____sys_sendmsg+0x1d1/0x1f0
      [  836.172535]        ___sys_sendmsg+0xab/0xf0
      [  836.173183]        __sys_sendmsg+0x51/0x90
      [  836.173836]        do_syscall_64+0x3d/0x90
      [  836.174471]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  836.175282]
      
      [  836.175282] -> #2 (rtnl_mutex){+.+.}-{3:3}:
      [  836.176190]        __mutex_lock+0x6b/0xf80
      [  836.176830]        register_netdevice_notifier+0x21/0x120
      [  836.177631]        rtnetlink_init+0x2d/0x1e9
      [  836.178289]        netlink_proto_init+0x163/0x179
      [  836.178994]        do_one_initcall+0x63/0x300
      [  836.179672]        kernel_init_freeable+0x2cb/0x31b
      [  836.180403]        kernel_init+0x17/0x140
      [  836.181035]        ret_from_fork+0x1f/0x30
      
       [  836.181687] -> #1 (pernet_ops_rwsem){+.+.}-{3:3}:
      [  836.182628]        down_write+0x25/0x60
      [  836.183235]        unregister_netdevice_notifier+0x1c/0xb0
      [  836.184029]        mlx5_ib_roce_cleanup+0x94/0x120 [mlx5_ib]
      [  836.184855]        __mlx5_ib_remove+0x35/0x60 [mlx5_ib]
      [  836.185637]        mlx5_eswitch_unregister_vport_reps+0x22f/0x440 [mlx5_core]
      [  836.186698]        auxiliary_bus_remove+0x18/0x30
      [  836.187409]        device_release_driver_internal+0x1f6/0x270
      [  836.188253]        bus_remove_device+0xef/0x160
      [  836.188939]        device_del+0x18b/0x3f0
      [  836.189562]        mlx5_rescan_drivers_locked+0xd6/0x2d0 [mlx5_core]
      [  836.190516]        mlx5_lag_remove_devices+0x69/0xe0 [mlx5_core]
      [  836.191414]        mlx5_do_bond_work+0x441/0x620 [mlx5_core]
      [  836.192278]        process_one_work+0x25c/0x590
      [  836.192963]        worker_thread+0x4f/0x3d0
      [  836.193609]        kthread+0xcb/0xf0
      [  836.194189]        ret_from_fork+0x1f/0x30
      
      [  836.194826] -> #0 (&ldev->lock){+.+.}-{3:3}:
      [  836.195734]        __lock_acquire+0x15b8/0x2a10
      [  836.196426]        lock_acquire+0xce/0x2d0
      [  836.197057]        __mutex_lock+0x6b/0xf80
      [  836.197708]        mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.198575]        tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
      [  836.199467]        parse_tc_actions+0x168/0x5a0 [mlx5_core]
      [  836.200340]        __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
      [  836.201241]        mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
      [  836.202187]        tc_setup_cb_add+0xd7/0x200
      [  836.202856]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  836.203739]        fl_change+0xbbe/0x1730 [cls_flower]
      [  836.204501]        tc_new_tfilter+0x407/0xd90
      [  836.205168]        rtnetlink_rcv_msg+0x406/0x5a0
      [  836.205877]        netlink_rcv_skb+0x4e/0xf0
      [  836.206535]        netlink_unicast+0x190/0x250
      [  836.207217]        netlink_sendmsg+0x243/0x4b0
      [  836.207915]        sock_sendmsg+0x33/0x40
      [  836.208538]        ____sys_sendmsg+0x1d1/0x1f0
      [  836.209219]        ___sys_sendmsg+0xab/0xf0
      [  836.209878]        __sys_sendmsg+0x51/0x90
      [  836.210510]        do_syscall_64+0x3d/0x90
      [  836.211137]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  836.211954] other info that might help us debug this:
      [  836.213174] Chain exists of:
      [  836.213174]   &ldev->lock --> rtnl_mutex --> &block->cb_lock
         836.214650]  Possible unsafe locking scenario:
      [  836.214650]
      [  836.215574]        CPU0                    CPU1
      [  836.216255]        ----                    ----
      [  836.216943]   lock(&block->cb_lock);
      [  836.217518]                                lock(rtnl_mutex);
      [  836.218348]                                lock(&block->cb_lock);
      [  836.219212]   lock(&ldev->lock);
      [  836.219758]
      [  836.219758]  *** DEADLOCK ***
      [  836.219758]
       [  836.220747] 2 locks held by handler1/12198:
      [  836.221390]  #0: ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
      [  836.222646]  #1: ffff88810c9a92c0 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
      
      [  836.224063] stack backtrace:
      [  836.224799] CPU: 6 PID: 12198 Comm: handler1 Not tainted 5.19.0-rc5_net_56b7df2 #1
      [  836.225923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  836.227476] Call Trace:
      [  836.227929]  <TASK>
      [  836.228332]  dump_stack_lvl+0x57/0x7d
      [  836.228924]  check_noncircular+0x104/0x120
      [  836.229562]  __lock_acquire+0x15b8/0x2a10
      [  836.230201]  lock_acquire+0xce/0x2d0
      [  836.230776]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.231614]  ? find_held_lock+0x2b/0x80
      [  836.232221]  __mutex_lock+0x6b/0xf80
      [  836.232799]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.233636]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.234451]  ? xa_load+0xc3/0x190
      [  836.234995]  mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.235803]  tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
      [  836.236636]  ? tc_act_can_offload_mirred+0x135/0x210 [mlx5_core]
      [  836.237550]  parse_tc_actions+0x168/0x5a0 [mlx5_core]
      [  836.238364]  __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
      [  836.239202]  mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
      [  836.240076]  ? lock_acquire+0xce/0x2d0
      [  836.240668]  ? tc_setup_cb_add+0x5b/0x200
      [  836.241294]  tc_setup_cb_add+0xd7/0x200
      [  836.241917]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  836.242709]  fl_change+0xbbe/0x1730 [cls_flower]
      [  836.243408]  tc_new_tfilter+0x407/0xd90
      [  836.244043]  ? tc_del_tfilter+0x880/0x880
      [  836.244672]  rtnetlink_rcv_msg+0x406/0x5a0
      [  836.245310]  ? netlink_deliver_tap+0x7a/0x4b0
      [  836.245991]  ? if_nlmsg_stats_size+0x2b0/0x2b0
      [  836.246675]  netlink_rcv_skb+0x4e/0xf0
      [  836.258046]  netlink_unicast+0x190/0x250
      [  836.258669]  netlink_sendmsg+0x243/0x4b0
      [  836.259288]  sock_sendmsg+0x33/0x40
      [  836.259857]  ____sys_sendmsg+0x1d1/0x1f0
      [  836.260473]  ___sys_sendmsg+0xab/0xf0
      [  836.261064]  ? lock_acquire+0xce/0x2d0
      [  836.261669]  ? find_held_lock+0x2b/0x80
      [  836.262272]  ? __fget_files+0xb9/0x190
      [  836.262871]  ? __fget_files+0xd3/0x190
      [  836.263462]  __sys_sendmsg+0x51/0x90
      [  836.264064]  do_syscall_64+0x3d/0x90
      [  836.264652]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  836.265425] RIP: 0033:0x7fdbe5e2677d
      
      [  836.266012] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ba ee
      ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f
      05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ee ee ff ff 48
      [  836.268485] RSP: 002b:00007fdbe48a75a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [  836.269598] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fdbe5e2677d
      [  836.270576] RDX: 0000000000000000 RSI: 00007fdbe48a7640 RDI: 000000000000003c
      [  836.271565] RBP: 00007fdbe48a8368 R08: 0000000000000000 R09: 0000000000000000
      [  836.272546] R10: 00007fdbe48a84b0 R11: 0000000000000293 R12: 0000557bd17dc860
      [  836.273527] R13: 0000000000000000 R14: 0000557bd17dc860 R15: 00007fdbe48a7640
      
      [  836.274521]  </TASK>
      
      To avoid using mode holding ldev->lock in the configure flow, we queue a
      work to the lag workqueue and cease wait on a completion object.
      
      In addition, we remove the lock from mlx5_lag_do_mirred() since it is
      not really protecting anything.
      
      It should be noted that an actual deadlock has not been observed.
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0d4e8ed1
    • Moshe Shemesh's avatar
      net/mlx5: Fix handling of entry refcount when command is not issued to FW · aaf2e65c
      Moshe Shemesh authored
      In case command interface is down, or the command is not allowed, driver
      did not increment the entry refcount, but might have decrement as part
      of forced completion handling.
      
      Fix that by always increment and decrement the refcount to make it
      symmetric for all flows.
      
      Fixes: 50b2412b ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reported-by: default avatarJack Wang <jinpu.wang@ionos.com>
      Tested-by: default avatarJack Wang <jinpu.wang@ionos.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      aaf2e65c
    • Moshe Shemesh's avatar
      net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint · 870c2481
      Moshe Shemesh authored
      While moving to new CMD API (quiet API), some pre-existing flows may call the new API
      function that in case of error, returns the error instead of printing it as previously done.
      For such flows we bring back the print but to tracepoint this time for sys admins to
      have the ability to check for errors especially for commands using the new quiet API.
      
      Tracepoint output example:
               devlink-1333    [001] .....   822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22)
      
      Fixes: f23519e5 ("net/mlx5: cmdif, Add new api for command execution")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      870c2481
    • Shay Drory's avatar
      net/mlx5: SF: Fix probing active SFs during driver probe phase · 4f57332d
      Shay Drory authored
      When SF devices and SF port representors are located on different
      functions, unloading and reloading of SF parent driver doesn't recreate
      the existing SF present in the device.
      Fix it by querying SFs and probe active SFs during driver probe phase.
      
      Fixes: 90d010b8 ("net/mlx5: SF, Add auxiliary device support")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4f57332d
    • Moshe Shemesh's avatar
      net/mlx5: Fix FW tracer timestamp calculation · 61db3d7b
      Moshe Shemesh authored
      Fix a bug in calculation of FW tracer timestamp. Decreasing one in the
      calculation should effect only bits 52_7 and not effect bits 6_0 of the
      timestamp, otherwise bits 6_0 are always set in this calculation.
      
      Fixes: 70dd6fdb ("net/mlx5: FW tracer, parse traces and kernel tracing support")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarFeras Daoud <ferasda@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      61db3d7b
    • Roy Novich's avatar
      net/mlx5: Do not query pci info while pci disabled · 394164f9
      Roy Novich authored
      The driver should not interact with PCI while PCI is disabled. Trying to
      do so may result in being unable to get vital signs during PCI reset,
      driver gets timed out and fails to recover.
      
      Fixes: fad1783a ("net/mlx5: Print more info on pci error handlers")
      Signed-off-by: default avatarRoy Novich <royno@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      394164f9
  2. 21 Nov, 2022 9 commits
  3. 19 Nov, 2022 11 commits
  4. 18 Nov, 2022 14 commits