hpsa: rework controller command submission
Allow driver initiated commands to have a timeout. It does not yet try to do anything with timeouts on such commands. We are sending a reset in order to get rid of a command we want to abort. If we make it return on the same reply queue as the command we want to abort, the completion of the aborted command will not race with the completion of the reset command. Rename hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd(), since this function is the interface for issuing commands to the controller and not the "core" of that implementation. Add a parameter to it which allows the caller to specify the reply queue to be used. Modify existing callers to specify the default reply queue. Rename __hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd_core(), since this routine is the "core" implementation of the "do simple command" function and there is no longer any other function with a similar name. Modify the existing callers of this routine (other than hpsa_scsi_do_simple_cmd()) to instead call hpsa_scsi_do_simple_cmd(), since it will now accept the reply_queue paramenter, and it provides a controller lock-up check. (Also, tweak two related message strings to make them distinct from each other.) Submitting a command to a locked up controller always results in a timeout, so check for controller lock-up before submitting. This is to enable fixing a race between command completions and abort completions on different reply queues in a subsequent patch. We want to be able to specify which reply queue an abort completion should occur on so that it cannot race the completion of the command it is trying to abort. The following race was possible in theory: 1. Abort command is sent to hardware. 2. Command to be aborted simultaneously completes on another reply queue. 3. Hardware receives abort command, decides command has already completed and indicates this to the driver via another different reply queue. 4. driver processes abort completion finds that the hardware does not know about the command, concludes that therefore the command cannot complete, returns SUCCESS indicating to the mid-layer that the scsi_cmnd may be re-used. 5. Command from step 2 is processed and completed back to scsi mid layer (after we already promised that would never happen.) Fix by forcing aborts to complete on the same reply queue as the command they are aborting. Piggybacking device rescanning functionality onto the lockup detection thread is not a good idea because if the controller locks up during device rescanning, then the thread could get stuck, then the lockup isn't detected. Use separate work queues for device rescanning and lockup detection. Detect controller lockup in abort handler. After a lockup is detected, return DO_NO_CONNECT which results in immediate termination of commands rather than DID_ERR which results in retries. Modify detect_controller_lockup() to return the result, to remove the need for a separate check. Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Showing
This diff is collapsed.
Please register or sign in to comment