• Dust Li's avatar
    net/smc: fix kernel panic caused by race of smc_sock · 349d4312
    Dust Li authored
    A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
    but smc_release() has already freed it.
    
    [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
    [ 4570.696048] #PF: supervisor write access in kernel mode
    [ 4570.696728] #PF: error_code(0x0002) - not-present page
    [ 4570.697401] PGD 0 P4D 0
    [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
    [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
    [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
    [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
    <...>
    [ 4570.711446] Call Trace:
    [ 4570.711746]  <IRQ>
    [ 4570.711992]  smc_cdc_tx_handler+0x41/0xc0
    [ 4570.712470]  smc_wr_tx_tasklet_fn+0x213/0x560
    [ 4570.712981]  ? smc_cdc_tx_dismisser+0x10/0x10
    [ 4570.713489]  tasklet_action_common.isra.17+0x66/0x140
    [ 4570.714083]  __do_softirq+0x123/0x2f4
    [ 4570.714521]  irq_exit_rcu+0xc4/0xf0
    [ 4570.714934]  common_interrupt+0xba/0xe0
    
    Though smc_cdc_tx_handler() checked the existence of smc connection,
    smc_release() may have already dismissed and released the smc socket
    before smc_cdc_tx_handler() further visits it.
    
    smc_cdc_tx_handler()           |smc_release()
    if (!conn)                     |
                                   |
                                   |smc_cdc_tx_dismiss_slots()
                                   |      smc_cdc_tx_dismisser()
                                   |
                                   |sock_put(&smc->sk) <- last sock_put,
                                   |                      smc_sock freed
    bh_lock_sock(&smc->sk) (panic) |
    
    To make sure we won't receive any CDC messages after we free the
    smc_sock, add a refcount on the smc_connection for inflight CDC
    message(posted to the QP but haven't received related CQE), and
    don't release the smc_connection until all the inflight CDC messages
    haven been done, for both success or failed ones.
    
    Using refcount on CDC messages brings another problem: when the link
    is going to be destroyed, smcr_link_clear() will reset the QP, which
    then remove all the pending CQEs related to the QP in the CQ. To make
    sure all the CQEs will always come back so the refcount on the
    smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
    by smc_ib_modify_qp_error().
    And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
    need to wait for all pending WQEs done, or we may encounter use-after-
    free when handling CQEs.
    
    For IB device removal routine, we need to wait for all the QPs on that
    device been destroyed before we can destroy CQs on the device, or
    the refcount on smc_connection won't reach 0 and smc_sock cannot be
    released.
    
    Fixes: 5f08318f ("smc: connection data control (CDC)")
    Reported-by: default avatarWen Gu <guwen@linux.alibaba.com>
    Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    349d4312
smc.h 8.97 KB