• Sagi Grimberg's avatar
    nvme-rdma: fix concurrent reset and reconnect · d5bf4b7f
    Sagi Grimberg authored
    Now ctrl state machine allows to transition from RESETTING to
    RECONNECTING.  In nvme-rdma when we receive a rdma cm DISONNECTED event,
    we trigger nvme_rdma_error_recovery. This happens also when we execute a
    controller reset, issue a cm diconnect request and receive a cm
    disconnect reply, as a result, the reset work and the error recovery work
    can run concurrently.
    
    Until now the state machine prevented from the error recovery work from
    running as a result of a controller reset (RESETTING -> RECONNECTING was
    not allowed).
    
    To fix this, we adopt the FC state machine approach, we always transition
    from LIVE to RESETTING and only then to RECONNECTING.  We do this both
    for the error recovery work and the controller reset work:
    
     1. transition to RESETTING
     2. teardown the controller association
     3. transition to RECONNECTING
    
    This will restore the protection against reset work and error recovery work
    from concurrently running together.
    
    Fixes: 3cec7f9d ("nvme: allow controller RESETTING to RECONNECTING transition")
    Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    d5bf4b7f
rdma.c 52.1 KB