1. 13 Sep, 2024 5 commits
  2. 11 Sep, 2024 7 commits
  3. 10 Sep, 2024 10 commits
  4. 09 Sep, 2024 10 commits
  5. 04 Sep, 2024 2 commits
    • Patrisious Haddad's avatar
      IB/core: Fix ib_cache_setup_one error flow cleanup · 1403c8b1
      Patrisious Haddad authored
      When ib_cache_update return an error, we exit ib_cache_setup_one
      instantly with no proper cleanup, even though before this we had
      already successfully done gid_table_setup_one, that results in
      the kernel WARN below.
      
      Do proper cleanup using gid_table_cleanup_one before returning
      the err in order to fix the issue.
      
      WARNING: CPU: 4 PID: 922 at drivers/infiniband/core/cache.c:806 gid_table_release_one+0x181/0x1a0
      Modules linked in:
      CPU: 4 UID: 0 PID: 922 Comm: c_repro Not tainted 6.11.0-rc1+ #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:gid_table_release_one+0x181/0x1a0
      Code: 44 8b 38 75 0c e8 2f cb 34 ff 4d 8b b5 28 05 00 00 e8 23 cb 34 ff 44 89 f9 89 da 4c 89 f6 48 c7 c7 d0 58 14 83 e8 4f de 21 ff <0f> 0b 4c 8b 75 30 e9 54 ff ff ff 48 8    3 c4 10 5b 5d 41 5c 41 5d 41
      RSP: 0018:ffffc90002b835b0 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c8527
      RDX: 0000000000000000 RSI: ffffffff811c8534 RDI: 0000000000000001
      RBP: ffff8881011b3d00 R08: ffff88810b3abe00 R09: 205d303839303631
      R10: 666572207972746e R11: 72746e6520444947 R12: 0000000000000001
      R13: ffff888106390000 R14: ffff8881011f2110 R15: 0000000000000001
      FS:  00007fecc3b70800(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000340 CR3: 000000010435a001 CR4: 00000000003706b0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ? show_regs+0x94/0xa0
       ? __warn+0x9e/0x1c0
       ? gid_table_release_one+0x181/0x1a0
       ? report_bug+0x1f9/0x340
       ? gid_table_release_one+0x181/0x1a0
       ? handle_bug+0xa2/0x110
       ? exc_invalid_op+0x31/0xa0
       ? asm_exc_invalid_op+0x16/0x20
       ? __warn_printk+0xc7/0x180
       ? __warn_printk+0xd4/0x180
       ? gid_table_release_one+0x181/0x1a0
       ib_device_release+0x71/0xe0
       ? __pfx_ib_device_release+0x10/0x10
       device_release+0x44/0xd0
       kobject_put+0x135/0x3d0
       put_device+0x20/0x30
       rxe_net_add+0x7d/0xa0
       rxe_newlink+0xd7/0x190
       nldev_newlink+0x1b0/0x2a0
       ? __pfx_nldev_newlink+0x10/0x10
       rdma_nl_rcv_msg+0x1ad/0x2e0
       rdma_nl_rcv_skb.constprop.0+0x176/0x210
       netlink_unicast+0x2de/0x400
       netlink_sendmsg+0x306/0x660
       __sock_sendmsg+0x110/0x120
       ____sys_sendmsg+0x30e/0x390
       ___sys_sendmsg+0x9b/0xf0
       ? kstrtouint+0x6e/0xa0
       ? kstrtouint_from_user+0x7c/0xb0
       ? get_pid_task+0xb0/0xd0
       ? proc_fail_nth_write+0x5b/0x140
       ? __fget_light+0x9a/0x200
       ? preempt_count_add+0x47/0xa0
       __sys_sendmsg+0x61/0xd0
       do_syscall_64+0x50/0x110
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Fixes: 1901b91f ("IB/core: Fix potential NULL pointer dereference in pkey cache")
      Signed-off-by: default avatarPatrisious Haddad <phaddad@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Link: https://patch.msgid.link/79137687d829899b0b1c9835fcb4b258004c439a.1725273354.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      1403c8b1
    • Chris Mi's avatar
      IB/mlx5: Fix UMR pd cleanup on error flow of driver init · 112e6e83
      Chris Mi authored
      The cited commit moves the pd allocation from function
      mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
      So the fix in commit [1] is broken. In error flow, will hit panic [2].
      
      Fix it by checking pd pointer to avoid panic if it is NULL;
      
      [1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
      [2]
       [  347.567063] infiniband mlx5_0: Couldn't register device with driver model
       [  347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
       [  347.593438] #PF: supervisor read access in kernel mode
       [  347.595176] #PF: error_code(0x0000) - not-present page
       [  347.596962] PGD 0 P4D 0
       [  347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
       [  347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
       [  347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
       [  347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
       [  347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
       [  347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
       [  347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
       [  347.609067] FS:  00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
       [  347.610094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
       [  347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [  347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [  347.612804] Call Trace:
       [  347.613130]  <TASK>
       [  347.613417]  ? __die+0x20/0x60
       [  347.613793]  ? page_fault_oops+0x150/0x3e0
       [  347.614243]  ? free_msg+0x68/0x80 [mlx5_core]
       [  347.614840]  ? cmd_exec+0x48f/0x11d0 [mlx5_core]
       [  347.615359]  ? exc_page_fault+0x74/0x130
       [  347.615808]  ? asm_exc_page_fault+0x22/0x30
       [  347.616273]  ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
       [  347.616801]  mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
       [  347.617365]  mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
       [  347.618025]  __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
       [  347.618539]  mlx5r_probe+0xe9/0x310 [mlx5_ib]
       [  347.619032]  ? kernfs_add_one+0x107/0x150
       [  347.619478]  ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
       [  347.619984]  auxiliary_bus_probe+0x3e/0x90
       [  347.620448]  really_probe+0xc5/0x3a0
       [  347.620857]  __driver_probe_device+0x80/0x160
       [  347.621325]  driver_probe_device+0x1e/0x90
       [  347.621770]  __driver_attach+0xec/0x1c0
       [  347.622213]  ? __device_attach_driver+0x100/0x100
       [  347.622724]  bus_for_each_dev+0x71/0xc0
       [  347.623151]  bus_add_driver+0xed/0x240
       [  347.623570]  driver_register+0x58/0x100
       [  347.623998]  __auxiliary_driver_register+0x6a/0xc0
       [  347.624499]  ? driver_register+0xae/0x100
       [  347.624940]  ? 0xffffffffa0893000
       [  347.625329]  mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
       [  347.625845]  do_one_initcall+0x4a/0x2a0
       [  347.626273]  ? gcov_event+0x2e2/0x3a0
       [  347.626706]  do_init_module+0x8a/0x260
       [  347.627126]  init_module_from_file+0x8b/0xd0
       [  347.627596]  __x64_sys_finit_module+0x1ca/0x2f0
       [  347.628089]  do_syscall_64+0x4c/0x100
      
      Fixes: 63842011 ("IB/mlx5: Create UMR QP just before first reg_mr occurs")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Link: https://patch.msgid.link/778c40c60287992da5d6ec92bb07b67f7bb5e6ef.1725273295.git.leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      112e6e83
  6. 02 Sep, 2024 6 commits