• Steffen Maier's avatar
    scsi: zfcp: Fix missing auto port scan and thus missing target ports · 4da8c5f7
    Steffen Maier authored
    Case (1):
      The only waiter on wka_port->completion_wq is zfcp_fc_wka_port_get()
      trying to open a WKA port. As such it should only be woken up by WKA port
      *open* responses, not by WKA port close responses.
    
    Case (2):
      A close WKA port response coming in just after having sent a new open WKA
      port request and before blocking for the open response with wait_event()
      in zfcp_fc_wka_port_get() erroneously renders the wait_event a NOP
      because the close handler overwrites wka_port->status. Hence the
      wait_event condition is erroneously true and it does not enter blocking
      state.
    
    With non-negligible probability, the following time space sequence happens
    depending on timing without this fix:
    
    user process        ERP thread zfcp work queue tasklet system work queue
    ============        ========== =============== ======= =================
    $ echo 1 > online
    zfcp_ccw_set_online
    zfcp_ccw_activate
    zfcp_erp_adapter_reopen
    msleep scan backoff zfcp_erp_strategy
    |                   ...
    |                   zfcp_erp_action_cleanup
    |                   ...
    |                   queue delayed scan_work
    |                   queue ns_up_work
    |                              ns_up_work:
    |                              zfcp_fc_wka_port_get
    |                               open wka request
    |                                              open response
    |                              GSPN FC-GS
    |                              RSPN FC-GS [NPIV-only]
    |                              zfcp_fc_wka_port_put
    |                               (--wka->refcount==0)
    |                               sched delayed wka->work
    |
    ~~~Case (1)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    zfcp_erp_wait
    flush scan_work
    |                                                      wka->work:
    |                                                      wka->status=CLOSING
    |                                                      close wka request
    |                              scan_work:
    |                              zfcp_fc_wka_port_get
    |                               (wka->status==CLOSING)
    |                               wka->status=OPENING
    |                               open wka request
    |                               wait_event
    |                               |              close response
    |                               |              wka->status=OFFLINE
    |                               |              wake_up /*WRONG*/
    ~~~Case (2)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |                                                      wka->work:
    |                                                      wka->status=CLOSING
    |                                                      close wka request
    zfcp_erp_wait
    flush scan_work
    |                              scan_work:
    |                              zfcp_fc_wka_port_get
    |                               (wka->status==CLOSING)
    |                               wka->status=OPENING
    |                               open wka request
    |                                              close response
    |                                              wka->status=OFFLINE
    |                                              wake_up /*WRONG&NOP*/
    |                               wait_event /*NOP*/
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |                               (wka->status!=ONLINE)
    |                               return -EIO
    |                              return early
                                                   open response
                                                   wka->status=ONLINE
                                                   wake_up /*NOP*/
    
    So we erroneously end up with no automatic port scan. This is a big problem
    when it happens during boot. The timing is influenced by v3.19 commit
    18f87a67 ("zfcp: auto port scan resiliency").
    
    Fix it by fully mutually excluding zfcp_fc_wka_port_get() and
    zfcp_fc_wka_port_offline(). For that to work, we make the latter block
    until we got the response for a close WKA port. In order not to penalize
    the system workqueue, we move wka_port->work to our own adapter workqueue.
    Note that before v2.6.30 commit 828bc121 ("[SCSI] zfcp: Set WKA-port to
    offline on adapter deactivation"), zfcp did block in
    zfcp_fc_wka_port_offline() as well, but with a different condition.
    
    While at it, make non-functional cleanups to improve code reading in
    zfcp_fc_wka_port_get(). If we cannot send the WKA port open request, don't
    rely on the subsequent wait_event condition to immediately let this case
    pass without blocking. Also don't want to rely on the additional condition
    handling the refcount to be skipped just to finally return with -EIO.
    
    Link: https://lore.kernel.org/r/20220729162529.1620730-1-maier@linux.ibm.com
    Fixes: 5ab944f9 ("[SCSI] zfcp: attach and release SAN nameserver port on demand")
    Cc: <stable@vger.kernel.org> #v2.6.28+
    Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
    Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    4da8c5f7
zfcp_fc.c 30.7 KB