• Mike Marciniszyn's avatar
    IB/hfi1: Correct tid qp rcd to match verbs context · cc78076a
    Mike Marciniszyn authored
    The qp priv rcd pointer doesn't match the context being used for verbs
    causing issues when 9B and kdeth packets are processed by different
    receive contexts and hence different CPUs.
    
    When running on different CPUs the following panic can occur:
    
     WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
     list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230
     CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.2.3.el7_lustre.x86_64 #1
     Call Trace:
      <IRQ>  [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b
      [<ffffffffb74916d8>] __warn+0xd8/0x100
      [<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80
      [<ffffffffb7768671>] __list_del_entry+0xa1/0xd0
      [<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1]
      [<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1]
      [<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1]
      [<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0
      [<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80
      [<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60
      [<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150
      [<ffffffffb742d504>] handle_irq+0xe4/0x1a0
      [<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0
      [<ffffffffb7b16362>] common_interrupt+0x162/0x162
      <EOI>  [<ffffffffb775a326>] ? memcpy+0x6/0x110
      [<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs]
      [<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs]
      [<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs]
      [<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs]
      [<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs]
      [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
      [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
      [<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs]
      [<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl]
      [<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl]
      [<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs]
      [<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl]
      [<ffffffffb74cee80>] ? wake_up_state+0x20/0x20
      [<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
      [<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl]
      [<ffffffffb74bae31>] kthread+0xd1/0xe0
      [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
      [<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
      [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
    
    Fix by reading the map entry in the same manner as the hardware so that
    the kdeth and verbs contexts match.
    
    Cc: <stable@vger.kernel.org>
    Fixes: 5190f052 ("IB/hfi1: Allow the driver to initialize QP priv struct")
    Reviewed-by: default avatarKaike Wan <kaike.wan@intel.com>
    Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    cc78076a
chip.h 43.7 KB