• Menny Hamburger's avatar
    [SCSI] scsi_dh: propagate SCSI device deletion · db422318
    Menny Hamburger authored
    Currently, when scsi_dh_activate() returns with an error
    (e.g. SCSI_DH_NOSYS) the activate_complete callback is not called and
    the error is not propagated to DM mpath.
    
    When a SCSI device attached to a device handler is deleted, userland
    processes currently performing I/O on the device will have their I/O
    hang forever.
    
    - Set SCSI_DH_NOSYS error when the handler is in the process of being
      deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
    
    - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
    
    - Call the activate_complete callback function directly from
      scsi_dh_activate if an error has been set (when either the scsi_dh
      internal data has already been deleted or is in the process of being
      deleted).
    
    The patch was tested in an iSCSI environment, RDAC H/W handler and
    multipath.  In the following reproduction process, dd will I/O hang
    forever and the only way to release it will be to reboot the machine:
    1) Perform I/O on a multipath device:
        dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
    2) Delete all slave SCSI devices contained in the mpath device:
       I)  In an iSCSI environment, the easiest way to do this is by
       stopping iSCSI:
           /etc/init.d/iscsi stop
       II) Another way to delete the devices is by applying the following
       bash scriptlet:
           dm_devs=$(ls /sys/block/ | grep dm- | xargs)
           for dm_dev in $dm_devs; do
             devices=$(ls /sys/block/$dm_dev/slaves)
             for device in $devices; do
                echo 1 > /sys/block/$device/device/delete
             done
           done
    
    NOTE: when DM mpath's fail_path uses blk_abort_queue this scsi_dh change
    isn't strictly required.  However, DM mpath's call to blk_abort_queue
    will soon be reverted because it has proven to be unsafe due to a race
    (between blk_abort_queue and scsi_request_fn) that can lead to list
    corruption.  Therefore we cannot rely on blk_abort_queue via fail_path,
    but even if we could this scsi_dh change is still preferrable.
    Signed-off-by: default avatarMenny Hamburger <Menny_Hamburger@Dell.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    Reviewed-by: default avatarBabu Moger <babu.moger@lsi.com>
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@suse.de>
    db422318
scsi_dh.c 14.9 KB