• Wen Gu's avatar
    net/smc: Resolve the race between SMC-R link access and clear · 20c9398d
    Wen Gu authored
    We encountered some crashes caused by the race between SMC-R
    link access and link clear that triggered by abnormal link
    group termination, such as port error.
    
    Here is an example of this kind of crashes:
    
     BUG: kernel NULL pointer dereference, address: 0000000000000000
     Workqueue: smc_hs_wq smc_listen_work [smc]
     RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
     Call Trace:
      <TASK>
      ? __smc_buf_create+0x75a/0x950 [smc]
      smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
      smc_listen_work+0xf72/0x1230 [smc]
      ? process_one_work+0x25c/0x600
      process_one_work+0x25c/0x600
      worker_thread+0x4f/0x3a0
      ? process_one_work+0x600/0x600
      kthread+0x15d/0x1a0
      ? set_kthread_struct+0x40/0x40
      ret_from_fork+0x1f/0x30
      </TASK>
    
    smc_listen_work()                     __smc_lgr_terminate()
    ---------------------------------------------------------------
                                        | smc_lgr_free()
                                        |  |- smcr_link_clear()
                                        |      |- memset(lnk, 0)
    smc_listen_rdma_reg()               |
     |- smcr_lgr_reg_rmbs()             |
         |- smc_llc_flow_initiate()     |
             |- access lnk->lgr (panic) |
    
    These crashes are similarly caused by clearing SMC-R link
    resources when some functions is still accessing to them.
    This patch tries to fix the issue by introducing reference
    count of SMC-R links and ensuring that the sensitive resources
    of links won't be cleared until reference count reaches zero.
    
    The operation to the SMC-R link reference count can be concluded
    as follows:
    
    object          [hold or initialized as 1]         [put]
    --------------------------------------------------------------------
    links           smcr_link_init()                   smcr_link_clear()
    connections     smc_conn_create()                  smc_conn_free()
    
    Through this way, the clear of SMC-R links is later than the
    free of all the smc connections above it, thus avoiding the
    unsafe reference to SMC-R links.
    Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    20c9398d
smc_core.c 66.2 KB