- 04 Jan, 2018 1 commit
-
-
Rafael David Tinoco authored
If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 21 Dec, 2017 20 commits
-
-
James Smart authored
Prior patch mixed up what argument in the macro was what, so min value was placed as the "default" argument, and the default value was placed as the "min" argument. Thus, when the default was applied, it looked like the default was smaller than the allowed min. Swap argument postions to correct. [mkp: fixed checkpatch warning] Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Himanshu Madhani authored
This patch fixes following warnings reported by smatch: drivers/scsi/qla2xxx/qla_mid.c:586 qla25xx_delete_req_que() error: we previously assumed 'req' could be null (see line 580) drivers/scsi/qla2xxx/qla_mid.c:602 qla25xx_delete_rsp_que() error: we previously assumed 'rsp' could be null (see line 596) Fixes: 7867b98d ("scsi: qla2xxx: Fix memory leak in dual/target mode") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Jia-Ju Bai authored
The driver may sleep under a spinlock. The function call path is: qedi_cpu_offline (acquire the spinlock) qedi_fp_process_cqes qedi_mtask_completion qedi_process_tmf_resp kzalloc(GFP_KERNEL) --> may sleep To fix it, GFP_KERNEL is replaced with GFP_ATOMIC. This bug is found by my static analysis tool(DSAC) and checked by my code review. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ching Huang authored
Simplify arcmsr_request_device_map routine. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ching Huang authored
Simplify all arcmsr_hbaX_get_config routine by call a new get_adapter_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ching Huang authored
Simplify arcmsr_hbaE_get_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ching Huang authored
Waiting for iop firmware ready before issue get_config command to iop for adapter type A and D. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ching Huang authored
Simplify arcmsr_hbaC_get_config function. Signed-off-by: Ching Huang <ching2048@areca.com.tw> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Update the driver version to 11.4.0.6 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
If log verbose in not turned on, its hard to tell when certain error paths get hit. Add stats counters and corresponding logic to debugfs/sysfs to aid understanding what paths were traversed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
When unregistering a remote port the lpfc driver would eventually wait for the remoteport_unreg done callback. But the driver never completed the io aborts that would allow the connections to terminate thus the unreg done callback was never issued. Turns out the coding style of the driver allowed for the wait to occur on the same cpu that the deferred isr is called on. The blocking for the wait, blocked the isr, and as the isr didn't run, the io aborts wouldn't finish. Turns out there was never a good reason to block waiting for the unreg done in the first place. The driver can continue execution and the ref counting within the driver will do the right thing. Resolve by removing the wait and patching up a few cases where the ref counting didn't look right - mainly cases where the remote port comes back before the aborts had completed and the unreg done had been called. Additionally, a few places which used pointer values to guide driver actions weren't protected by lock, so correct those. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
In the lpfc discovery engine, when as a nvme target, where the driver was performing mailbox io with the adapter for port login when a NVME PRLI is received from the host. Rather than queue and eventually get back to sending a response after the mailbox traffic, the driver rejected the io with an error response. Turns out this particular initiator didn't like the rejection values (unable to process command/command in progress) so it never attempted a retry of the PRLI. Thus the host never established nvme connectivity with the lpfc target. By changing the rejection values (to Logical Busy/nothing more), the initiator accepted the response and would retry the PRLI, resulting in nvme connectivity. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
When enabled for both SCSI and NVME support, and connected pt2pt to a SCSI only target, the driver nodelist entry for the remote port is left in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only configured for SCSI support. Error was due to some of the prli points still reflecting the need to send only 1 PRLI. On a lot of fabric configs, targets were NVME only, which meant the fabric-reported protocol attributes were only telling the driver one protocol or the other. Thus things worked fine. With pt2pt, the driver must send a PRLI for both protocols as there are no hints on what the target supports. Thus pt2pt targets were hitting the multiple PRLI issues. Complete the dual PRLI support. Track explicitly whether scsi (fcp) or nvme prli's have been sent. Accurately track protocol support detected on each node as reported by the fabric or probed by PRLI traffic. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Increased the sizes of the SCSI WQ's and CQ's so that SCSI operation is similar to that used by NVME. However, size increase restricted only to those newer adapters that can support the larger WQE size, thus bigger queue sizes. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the wrong state or the driver to ACC and PRLI when it should send LS_RJT. The cause was due to the driver not properly looking at the PRLI type and taking the multiple protocol support into consideration. Resolved by adding checks in the various PRLI receive points to validate PRLI type and reject if not valid for the enabled protocols and mode (host vs target). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The driver is all set to handle the defer_rcv api for the nvmet_fc transport, yet didn't properly recognize the return status when the defer_rcv occurred. The driver treated it simply as an error and aborted the io. Several residual issues occurred at that point. Finish the defer_rcv support: recognize the return status when the io request is being handled in a deferred style. This stops the rogue aborts; Replenish the async cmd rcv buffer in the deferred receive if needed. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
NVME targets appear to randomly disconnect from the initiator when running heavy IO. The error is due to the host aggregate (across all controllers) io load was beyond the maximum exchange count for nvme on the adapter. The driver was properly returning a resource busy status, but the io load was so great heartbeat commands would be bounced and not have a successful retry within the fuzz amount for the nvme heartbeat (yes, a very high io load!). Thus the target was terminating the controller due to a keep alive failure. Resolve by reserving a few exchanges (by counters) which can be used when the adapter is out of normal exchanges and the command is a NVME heartbeat command. As counters are used, while the reserved command is outstanding, as soon as any other exchange completes, the counters are adjusted and the reserved count is replenished. The heartbeat completes execution in a normal fashion. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
For v3 hw SAS, it supports configuring power state from D0 to D3 for entering Low Power status and power state from D3 to D0 for quit Low Power status. When power state from D0 to D3, HW will send FLR to clear the registers of ECAM and BAR space, and when power state from D3 to D0, it will clear the registers of ECAM space only. So when suspend, need to do like controller reset (including disable interrupts/DQ/PHY/BUS), and also release slots after FLR. When resume, re-config the registers of BAR space. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
In function sas_suspend_devices(), it requires callback lldd_port_deformed callback to be implemented if lldd_port_deformed is implemented. So add a stub for lldd_port_deformed. Callback lldd_port_deformed was not required as the port deformation is done elsewhere in the LLDD. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
This patch fix SAS_QUEUE_FULL problem. The test situation is close port while running IO. In sas_eh_handle_sas_errors(), SCSI EH will free sas_task of the device if lldd_I_T_nexus_reset() return TMF_RESP_FUNC_COMPLETE or -ENODEV. But in our SAS driver, we only free slots of the device when the return value is TMF_RESP_FUNC_COMPLETE. So if the return value is -ENODEV, the slot resource will not free any more. As an solution, we should also free slots of the device in lldd_I_T_nexus_reset() if the return value is -ENODEV. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 15 Dec, 2017 16 commits
-
-
Xiaofei Tan authored
We should do internal abort dev before TMF_ABORT_TASK_SET and TMF_LU_RESET. Because we may only have done internal abort for single IO in the earlier part of SCSI EH process. Even the internal abort to the single IO, we also don't know whether it is successful. Besides, we should release slots of the device in hisi_sas_abort_task_set() if the abort is successful. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Normally, hardware should ensure that internal abort timeout will never happen. If happen, it would be an SoC failure. What's more, HW will not process any other commands if an internal abort hasn't return CQ, and they will time out also. So, we should judge the result of internal abort in SCSI EH, if it is failed, we should give up to do TMF/softreset and return failure to the upper layer directly. This patch do following things to achieve this: 1. When internal abort timeout happened, we set return value to -EIO in hisi_sas_internal_task_abort(). 2. If prep_abort() is not support, let hisi_sas_internal_task_abort() return TMF_RESP_FUNC_FAILED. 3. If hisi_sas_internal_task_abort() return an negative number, it can be thought that it not executed properly or internal abort timeout. Then we won't do behind TMF or softreset, and return failure directly. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
We should do link reset of PHY when identify timeout or STP link timeout. They are internal events of SOC and are notified to driver through interrupts of CHL_INT2. Besides, we should add an delay work to do link reset as it needs sleep. So, this patch add an new PHY event HISI_PHYE_LINK_RESET for this. Notes: v2 HW doesn't report the event of STP link timeout. So, we only need to handle event of identify timeout for v2 HW. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Use an general way to do delay work for a PHY. Then it will be easier to add new delayed work for a PHY in future. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Add port AXI errors handling for v2 hw. We do host controller reset for such errors. Besides, change port muli-bits ECC error handling, and we should also do host reset for such error. So, this patch put them in the same struct with port AXI error. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Change code format of int_chnl_int_v2_hw() to be consistent with v3 hw to reduce an tag indent. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Add some print at some places such as error info and cq of exception IO, device found etc, and also adjust some log levels. All this to assist debugging ability. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
We use PCIe AER to support RAS feature for v3 hw. This driver should do following two things to support this: 1. Enable RAS interrupts, so that errors can be reported to RAS module. 2. Realize err_handler for sas_v3_pci_driver. Then if non-fatal error is detected, print error source and try to recover SAS controller. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
For v3 hw, each NCQ will return a CQ, so it is no need to acquire IPTT from ITCT, just acquire it from IPTT field of CQ. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Sometimes it is required to know when the controller reset has completed and also if it has completed successfully. For such places, we call hisi_sas_controller_reset() directly before. That may lead to multiple calls to this function. This patch create a per-reset structure which contains a completion structure and status flag to know when the reset completes and also the status. It is also in hisi_hba.wq to do reset work. As all host reset works are done in hisi_hba.wq, we don't worry multiple calls to hisi_sas_controller_reset(). Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Do a couple of changes for when HISI_SAS_RESET_BIT is set for HBA: - Clearing ITCT is not necessary - Remove internal abort as it will fail during reset Flag sas_dev->dev_type is kept as SAS_PHY_UNUSED. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
This patch do following optimizations to host controller reset: 1. Unblock scsi requests before rescanning topology, as SCSI command need be used if new device is found during rescanning topology. 2. Remove drain_workqueue(hisi_hba->wq) and drain_workqueue(shost->work_q), as there is no need to ensure that all PHYs event are done before exiting host reset. 3. Improve message print level of host reset. Host reset is an important and very few occurrence event. We should know its progress even when not debugging. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
Currently refreshing the PHY port id after reset is done in the rescan topology function, which is quite late in the reset process. It could be moved earlier in the process, as the port id can be refreshed once the PHYs become ready. In addition to this, we should set the hisi_sas_dev port id to 0xff (invalid port id) if all PHYs of this port remain down for the same device. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiaofei Tan authored
In certain scenarios we may just want to clear the ITCT for a device, and not free other resources like the SATA bitmap using in v2 hw. To facilitate this, this patch relocates the code of clearing ITCT from free_device() to a new hw interface clear_itct(). Then for some hw, we should not realise free_device() if there's nothing left to do for it. [mkp: typo] Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
For function dma_unmap_sg(), the <nents> parameter should be number of elements in the scatterlist prior to the mapping, not after the mapping. Fix this usage. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
It is required to initialize the dq spinlock before use, which was not being done, so fix it. This issue can be detected when CONFIG_DEBUG_SPINLOCK is enabled. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 12 Dec, 2017 3 commits
-
-
Pravin Shedge authored
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Bart Van Assche authored
Avoid that building with gcc 7 and W=1 triggers warnings similar to the following: drivers/scsi/qla2xxx/qla_isr.c:1189:27: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Himanshu Madhani <himanshu.madhani@cavium.com> Cc: Quinn Tran <quinn.tran@cavium.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Nicolas Iooss authored
fnic_fcpio_icmnd_cmpl_handler() displays the value of sc with: FNIC_SCSI_DBG(KERN_INFO... "... sc = 0x%p" "scsi_status ..." ... As the literal strings get merged, the function uses %ps instead of the intended raw %p format. Fix this by inserting a space. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-