• Stefan Richter's avatar
    firewire: fw-sbp2: (try to) avoid I/O errors during reconnect · 2e2705bd
    Stefan Richter authored
    While fw-sbp2 takes the necessary time to reconnect to a logical unit
    after bus reset, the SCSI core keeps sending new commands.  They are all
    immediately completed with host busy status, and application clients or
    filesystems will break quickly.  The SCSI device might even be taken
    offline:  http://bugzilla.kernel.org/show_bug.cgi?id=9734
    
    The only remedy seems to be to block the SCSI device until reconnect.
    Alas the SCSI core has no useful API to block only one logical unit i.e.
    the scsi_device, therefore we block the entire Scsi_Host.  This
    currently corresponds to an SBP-2 target.  In case of targets with
    multiple logical units, we need to satisfy the dependencies between
    logical units by carefully tracking the blocking state of the target and
    its units.  We block all logical units of a target as soon as one of
    them needs to be blocked, and keep them blocked until all of them are
    ready to be unblocked.
    
    Furthermore, as the history of the old sbp2 driver has shown, the
    scsi_block_requests() API is a minefield with high potential of
    deadlocks.  We therefore take extra measures to keep logical units
    unblocked during __scsi_add_device() and during shutdown.
    
    This avoids I/O errors during reconnect in many but alas not in all
    cases.  There may still be errors after a re-login had to be performed.
    Also, some bridges have been seen to cease fetching management ORBs if
    I/O went on up until a bus reset.  In these cases, all management ORBs
    time out after mgt_orb_timeout.  The old sbp2 driver is less vulnerable
    or maybe not vulnerable to this, for as yet unknown reasons.
    Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
    2e2705bd
fw-sbp2.c 45 KB