• Jens Remus's avatar
    scsi: zfcp: fix infinite iteration on ERP ready list · fa89adba
    Jens Remus authored
    zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
    rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
    adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
    processed asynchronously and concurrently.
    
    Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
    calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
    zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
    and then iterates with list_for_each() over the adapter's ERP ready list
    without holding the ERP lock. This opens a race window in which the
    current list entry can be moved to another list, causing list_for_each()
    to iterate forever on the wrong list, as the erp_ready_head is never
    encountered as terminal condition.
    
    Meanwhile the ERP action can be processed in the ERP thread by
    zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
    lock and then calls zfcp_erp_action_to_running() to move the ERP action
    from the ready to the running list.  zfcp_erp_action_to_running() can
    move the ERP action using list_move() just during the aforementioned
    race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
    zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
    held by the infinitely looping kworker, it effectively spins forever.
    
    Example Sequence Diagram:
    
    Process                ERP Thread             rport_work
    -------------------    -------------------    -------------------
    zfcp_erp_adapter_reopen()
    zfcp_erp_adapter_block()
    zfcp_scsi_schedule_rports_block()
    lock ERP                                      zfcp_scsi_rport_work()
    zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
    list_add_tail() on ready                      !(rport_task==RPORT_ADD)
    wake_up() ERP thread                          zfcp_scsi_rport_block()
    zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
    unlock ERP                                    lock DBF REC
    zfcp_erp_wait()        lock ERP
    |                      zfcp_erp_action_to_running()
    |                                             list_for_each() ready
    |                      list_move()              current entry
    |                        ready to running
    |                      zfcp_dbf_rec_run()       endless loop over running
    |                      zfcp_dbf_rec_run_lvl()
    |                      lock DBF REC spins forever
    
    Any adapter recovery can trigger this, such as setting the device offline
    or reboot.
    
    V4.9 commit 4eeaa4f3 ("zfcp: close window with unblocked rport
    during rport gone") introduced additional tracing of (un)blocking of
    rports. It missed that the adapter->erp_lock must be held when calling
    zfcp_dbf_rec_trig().
    
    This fix uses the approach formerly introduced by commit aa0fec62
    ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
    later removed by commit ae0904f6 ("[SCSI] zfcp: Redesign of the debug
    tracing for recovery actions.").
    
    Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
    acquires and releases the adapter->erp_lock for read.
    Reported-by: default avatarSebastian Ott <sebott@linux.ibm.com>
    Signed-off-by: default avatarJens Remus <jremus@linux.ibm.com>
    Fixes: 4eeaa4f3 ("zfcp: close window with unblocked rport during rport gone")
    Cc: <stable@vger.kernel.org> # 2.6.32+
    Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
    Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    fa89adba
zfcp_dbf.c 21.9 KB