• James Smart's avatar
    scsi: lpfc: Fix random heartbeat timeouts during heavy IO · cf1a1d3e
    James Smart authored
    NVME targets appear to randomly disconnect from the initiator when
    running heavy IO.
    
    The error is due to the host aggregate (across all controllers) io load
    was beyond the maximum exchange count for nvme on the adapter. The
    driver was properly returning a resource busy status, but the io load
    was so great heartbeat commands would be bounced and not have a
    successful retry within the fuzz amount for the nvme heartbeat (yes, a
    very high io load!). Thus the target was terminating the controller due
    to a keep alive failure.
    
    Resolve by reserving a few exchanges (by counters) which can be used
    when the adapter is out of normal exchanges and the command is a NVME
    heartbeat command. As counters are used, while the reserved command is
    outstanding, as soon as any other exchange completes, the counters are
    adjusted and the reserved count is replenished. The heartbeat completes
    execution in a normal fashion.
    Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
    Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    cf1a1d3e
lpfc_nvme.c 84.1 KB