- 23 Dec, 2021 36 commits
-
-
Sreekanth Reddy authored
Currently the driver marks the controller as unrecoverable if there is an asynchronous reset or fault during the initialization, reinitialization post reset, and OS resume. Enhance driver to retry the initialization, re-initialization, and resume sequences for a maximum of 3 times if the controller became faulty or asynchronously reset due to a firmware activation during the initialization sequence. Link: https://lore.kernel.org/r/20211220141159.16117-15-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Move the IOC initialization's bring up logic to mpi3mr_bring_ioc_ready() routine. Link: https://lore.kernel.org/r/20211220141159.16117-14-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Separate out reply and sense buffer allocation and initialization into two routines and call only initialization routine while issuing the IOC Init request message. Also move out the event enable logic to a separate function. Link: https://lore.kernel.org/r/20211220141159.16117-13-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Save snapdump and fault the controller with the given reason code if it is already not in the fault or not in asynchronous reset. This ensures that soft reset is issued from the watchdog thread. This will also be used to handle initialization time faults/resets/timeout as in those cases immediate soft reset invocation is not required. Link: https://lore.kernel.org/r/20211220141159.16117-12-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Display IOC firmware package version by reading component image upload data. Link: https://lore.kernel.org/r/20211220141159.16117-11-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
The following special handling is needed for UNMAP commands issued to NVMe drives: - On B0 boards, if the parameter list length is greater than 24 and not a 16-byte multiple, then truncate the parameter list length to a 16-byte multiple. - On A0 boards, if the parameter list length is greater than block descriptor data length + 8, then truncate the parameter list length to block descriptor data length + 8 value. Link: https://lore.kernel.org/r/20211220141159.16117-10-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
- Increase internal command timeout to 60 seconds. - Enable 16 device removal handshake processing in parallel in the device removal handshake infrastructure. Link: https://lore.kernel.org/r/20211220141159.16117-9-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Add validation for various access statuses prior to exposing attached target device to the operating system. Link: https://lore.kernel.org/r/20211220141159.16117-8-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
The SAS4 Controller firmware exposes the SES devices in Managed PCIe Switch as a PCIe Device Type SCSI Device (MPI3_DEVICE0_PCIE_DEVICE_INFO_TYPE_SCSI_DEVICE). Driver is enhanced to handle this device type by: - Exposing the device to the upper layers and - Not updating any hardware sectors & virtual boundary settings as these settings are needed only for NVMe devices. Link: https://lore.kernel.org/r/20211220141159.16117-7-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Continued updating MPI3 headers. Link: https://lore.kernel.org/r/20211220141159.16117-6-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Update MPI3 headers. Link: https://lore.kernel.org/r/20211220141159.16117-5-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Don't issue the soft reset if internal commands are flushed out with reset status. Soft reset needs to be issued only if commands are really timed out. Link: https://lore.kernel.org/r/20211220141159.16117-4-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Use spin_lock_irqsave() instead of spin_lock() while acquiring reply_free_queue_lock & sbq_lock locks. Link: https://lore.kernel.org/r/20211220141159.16117-3-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Add debug print functions which will print messages based on logging_level bits enabled. Link: https://lore.kernel.org/r/20211220141159.16117-2-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222092247.928711-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222092048.925829-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The myrs devices supports 64-bit addressing, so remove the spurious GFP_DMA allocations. Link: https://lore.kernel.org/r/20211222091935.925624-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222091801.924745-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The driver doesn't express DMA addressing limitation under 32-bits anywhere else, so remove the spurious GFP_DMA allocation. Link: https://lore.kernel.org/r/20211222091630.922788-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The allocated buffers are used as a command payload, for which the block layer and/or DMA API do the proper bounce buffering if needed. Link: https://lore.kernel.org/r/20211222090842.920724-1-hch@lst.deReported-by: Baoquan He <bhe@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
The allocated buffers are used as a command payload, for which the block layer and/or DMA API do the proper bounce buffering if needed. Link: https://lore.kernel.org/r/20211222090311.916624-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
The controller may frequently enter and exit suspend for each I/O which we need to deal with. This is inefficient and may cause too much suspend and resume activity for the controller. To avoid this, use a default 5s autosuspend for the controller to stop frequently suspending and resuming. This value may still be modified via sysfs interfaces. Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.comAcked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Processing events such as PORTE_BROADCAST_RCVD may cause dependency issues for runtime power management support. Such a problem would be that handling a PORTE_BROADCAST_RCVD event requires that the host is resumed to send SMP commands. However, in resuming the host, the phyup events generated from re-enabling the phys are processed in the same workqueue as the original PORTE_BROADCAST_RCVD event. As such, the host will never finish resuming (as it waits for the phyup event processing), and then the PORTE_BROADCAST_RCVD event can't be processed as the SMP commands are blocked, and so we have a deadlock. Solve this problem by ensuring that libsas keeps the host active until completely finished phy or port events, such as PORTE_BYTES_DMAED. As such, we don't have to worry about resuming the host for processing individual SMP commands in this example. Link: https://lore.kernel.org/r/1639999298-244569-15-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
It is possible that controller may become suspended between processing a phyup interrupt and the event being processed by libsas. As such, we can't ensure the controller is active when processing the phyup event - this may cause the phyup event to be lost or other issues. To avoid any possible issues, add pm_runtime_get_noresume() in phyup interrupt handler and pm_runtime_put_sync() in the work handler exit to ensure that we stay always active. Since we only want to call pm_runtime_get_noresume() for v3 hw, signal this will a new event, HISI_PHYE_PHY_UP_PM. Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.comAcked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
During the processing of event PORT_BYTES_DMAED, the driver queues work DISCE_DISCOVER_DOMAIN and then flushes workqueue ha->disco_q. If a new phyup event occurs during resuming the controller, the work PORTE_BYTES_DMAED of new phy occurs before suspended phy's. The work DISCE_DISCOVER_DOMAIN of new phy requires an active SAS controller (it needs to resume SAS controller by function scsi_sysfs_add_sdev() and some other functions such as function add_device_link()). However, the activation of the SAS controller requires completion of work PORTE_BYTES_DMAED of suspended phys while it is blocked by new phy's work on ha->event_q. So there is a deadlock and it is released only after resume timeout. To solve the issue, defer works of new phys during suspend and queue those defer works after SAS controller becomes active. Link: https://lore.kernel.org/r/1639999298-244569-13-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
In the second part of function __sas_drain_work(), deferred work is queued. This functionality is required other places so factor it out into the function sas_queue_deferred_work(). Link: https://lore.kernel.org/r/1639999298-244569-12-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Add a flag SAS_HA_RESUMING and use it to indicate the state of resuming the host controller. Link: https://lore.kernel.org/r/1639999298-244569-11-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
When sending SMP I/Os to the host we need to ensure that the host is not suspended and can process the commands. This is a better approach than replying on the host to resume itself to handle such commands. Use pm_runtime_get_sync() and pm_runtime_put_sync() calls for the host when executing SMP I/Os. Link: https://lore.kernel.org/r/1639999298-244569-10-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Add some logs at the beginning and end of suspend/resume. Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.comAcked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
If a new disk is inserted through an expander when the host was suspended, it will not necessarily be detected as the topology is not re-scanned during resume. To detect possible changes in topology during suspension, insert a PORTE_BROADCAST_RCVD event per port when resuming to trigger a revalidation. Link: https://lore.kernel.org/r/1639999298-244569-8-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
phy_list_lock is not held when using asd_sas_port->phy_list in the mvsas driver. Add spin_lock/unlock in those places. Link: https://lore.kernel.org/r/1639999298-244569-7-git-send-email-chenxiang66@hisilicon.comSigned-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Most places that use asd_sas_port->phy_list are protected by spinlock asd_sas_port->phy_list_lock, however there are still some places which miss grabbing the lock. Add it in function hisi_sas_refresh_port_id() when accessing asd_sas_port->phy_list. This carries a risk that list mutates while at the same time dropping the lock in function hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list to avoid this risk. Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.comAcked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiang Chen authored
Most places that use asd_sas_port->phy_list in libsas are protected by spinlock asd_sas_port->phy_list_lock. However, there are still a few places which miss the lock. Add it in those places. Link: https://lore.kernel.org/r/1639999298-244569-5-git-send-email-chenxiang66@hisilicon.comReviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Alan Stern authored
John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: e27829dc ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
John Garry authored
This reverts commit b14a37e0. In that commit, we had to filter out phy-up events during suspend, as it work cause a deadlock between processing the phyup event and the resume HA function try to drain the HA event workqueue to complete the resume process. Now that we no longer try to drain the HA event queue during the HA resume processor, the deadlock would not occur, so remove the special handling for it. Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.comSigned-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
John Garry authored
For the hisi_sas driver, if a directly attached disk is removed during suspend, a hang will occur in the resume process: The background is that in commit 16fd4a7c ("scsi: hisi_sas: Add device link between SCSI devices and hisi_hba"), it is ensured that the HBA device cannot be runtime suspended when any SCSI device associated is active. Other drivers which use libsas don't worry about this as none support runtime suspend. The mentioned hang occurs when an disk is removed during suspend. In the removal process - from PHYE_RESUME_TIMEOUT event processing - we call into scsi_remove_device(), which is being processed in the HA event workqueue. Here we wait for all suppliers of the SCSI device to resume, which includes the HBA device (from the above commit). However the HBA device cannot resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from calling sas_resume_ha() -> sas_drain_work()). This is the deadlock. There does not appear to be any need for the sas_drain_work() to be called at all in sas_resume_ha() as it is not syncing against anything, so allow LLDDs to avoid this by providing a variant of sas_resume_ha() which does "sync", i.e. doesn't drain the event workqueue. Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.comSigned-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 17 Dec, 2021 4 commits
-
-
John Garry authored
Value 0 is used for SAM status and libsas exec_status bytes codes in sas_end_task() - use defined macros instead. In addition, change to proper enum types. Also replace SAM_STAT_CHECK_CONDITION with SAS_SAM_STAT_CHECK_CONDITION, the former being a proper member of enum exec_status. Link: https://lore.kernel.org/r/1639579061-179473-9-git-send-email-john.garry@huawei.comSigned-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Qi Liu authored
The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger. To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes. Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.comSigned-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Qi Liu authored
If we issue a controller reset command during executing a FLR a hung task may be found: Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18 This is a race condition which occurs when the FLR completes first. Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem . Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.comSigned-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Qi Liu authored
A user may issue a control phy command from sysfs at any time, even if the controller is resetting. If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect. To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running. Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.comSigned-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-