- 21 Aug, 2020 36 commits
-
-
Denis Efremov authored
Remove cxgbi_alloc_big_mem(), cxgbi_free_big_mem() functions and use kvzalloc/kvfree instead. __GFP_NOWARN added to kvzalloc() call because we already print a warning in case of allocation fail. Link: https://lore.kernel.org/r/20200801133123.61834-1-efremov@linux.comSigned-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Andy Shevchenko authored
Use %*ph format to print small buffer as hex string. Link: https://lore.kernel.org/r/20200730153535.39691-1-andriy.shevchenko@linux.intel.comSigned-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Fixes coccicheck warning: ./drivers/scsi/mpt3sas/mpt3sas_base.c:5247:16-34: WARNING: dma_alloc_coherent use in ioc -> request already zeroes out memory, so memset is not needed dma_alloc_coherent() already zeroes out memory so memset() is not needed. Link: https://lore.kernel.org/r/1596079918-41115-4-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Fixes coccicheck warning: ./drivers/scsi/qla2xxx/qla_mbx.c:4928:15-33: WARNING: dma_alloc_coherent use in els_cmd_map already zeroes out memory, so memset is not needed dma_alloc_coherent() already zeroes out memory so memset() is not needed. Link: https://lore.kernel.org/r/1596079918-41115-3-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Fixes coccicheck warning: ./drivers/scsi/pmcraid.c:4709:3-21: WARNING: dma_alloc_coherent use in pinstance -> hrrq_start [ i ] already zeroes out memory, so memset is not needed dma_alloc_coherent() already zeroes out memory so memset() is not needed. Link: https://lore.kernel.org/r/1596079918-41115-2-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Fixes coccicheck warning: ./drivers/scsi/mvsas/mv_init.c:244:11-29: WARNING: dma_alloc_coherent use in mvi -> tx already zeroes out memory, so memset is not needed ./drivers/scsi/mvsas/mv_init.c:250:15-33: WARNING: dma_alloc_coherent use in mvi -> rx_fis already zeroes out memory, so memset is not needed ./drivers/scsi/mvsas/mv_init.c:256:11-29: WARNING: dma_alloc_coherent use in mvi -> rx already zeroes out memory, so memset is not needed ./drivers/scsi/mvsas/mv_init.c:265:13-31: WARNING: dma_alloc_coherent use in mvi -> slot already zeroes out memory, so memset is not needed dma_alloc_coherent() already zeroes out memory so memset() is not needed. Link: https://lore.kernel.org/r/1596078235-54002-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Remove casting the values returned by memory allocation function. Coccinelle emits WARNING: ./drivers/message/fusion/mptctl.c:2596:14-31: WARNING: casting value returned by memory allocation function to (SCSIDevicePage0_t *) is useless. ./drivers/message/fusion/mptctl.c:2660:15-32: WARNING: casting value returned by memory allocation function to (SCSIDevicePage3_t *) is useless. Link: https://lore.kernel.org/r/1596014390-18605-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Heng authored
Remove casting the values returned by memory allocation function. Coccinelle emits WARNING: ./drivers/message/fusion/mptfc.c:766:17-30: WARNING: casting value returned by memory allocation function to (FCPortPage0_t *) is useless. ./drivers/message/fusion/mptfc.c:907:17-30: WARNING: casting value returned by memory allocation function to (FCPortPage1_t *) is useless. [mkp: memset()] Link: https://lore.kernel.org/r/1596014354-59935-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Andy Teng authored
MediaTek UFS host now supports 2 lanes. Modify the lane count to 2. This modification shall not impact old 1-lane host because PA_CONNECTEDRXDATALANES and PA_CONNECTEDTXDATALANES will limit the target lanes properly during power mode change. So we could relax the limitation in ufs_dev_params. Link: https://lore.kernel.org/r/20200819084340.7021-1-stanley.chu@mediatek.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Andy Teng <andy.teng@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Can Guo authored
Commit 5586dd8e ("scsi: ufs: Fix a race condition between error handler and runtime PM ops") moves the ufshcd_scsi_block_requests() inside err_handler() but forgets to remove the ufshcd_scsi_unblock_requests() in the early return path. Correct the mistake. Link: https://lore.kernel.org/r/1597798958-24322-1-git-send-email-cang@codeaurora.org Fixes: 5586dd8e ("scsi: ufs: Fix a race condition between error handler and runtime PM ops") Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Hongwu Su<hongwus@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kiwoong Kim authored
Currently, the UFS driver busy waits for fDeviceInit to be cleared. Provide an upper bound and sleep between attempts instead of busy waiting. Link: https://lore.kernel.org/r/1597053747-75171-1-git-send-email-kwmad.kim@samsung.comTested-by: Kiwoong Kim <kwmad.kim@samsung.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Bean Huo authored
Link: https://lore.kernel.org/r/20200814095034.20709-3-huobean@gmail.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Bean Huo authored
ufshcd_comp_devman_upiu() was poorly named leading people to think it was a completion function. Rename it to ufshcd_compose_devman_upiu(). Link: https://lore.kernel.org/r/20200814095034.20709-2-huobean@gmail.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
Fix race between ELS completion and flushing ELS request. Link: https://lore.kernel.org/r/20200807110656.19965-8-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
Don't process ELS completion if event is flushed or cleaned up. Link: https://lore.kernel.org/r/20200807110656.19965-7-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
Initiate cleanup for ELS commands as well. Link: https://lore.kernel.org/r/20200807110656.19965-6-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
Send cleanup even for RRQ on timeout. Link: https://lore.kernel.org/r/20200807110656.19965-5-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
The timer is already cancelled when abort is completed, hence no need to cancel it again. Link: https://lore.kernel.org/r/20200807110656.19965-4-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
This is reported by Klockwork. Link: https://lore.kernel.org/r/20200807110656.19965-3-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Saurav Kashyap authored
The rport lock gets initialized during offload. If a non-FCP or non-target rport got logout then this rport will be uninitialized. KASAN was complaining because of it. ========= [ 14.384434] the code is fine but needs lockdep annotation. [ 14.384482] turning off the locking correctness validator. ======== Link: https://lore.kernel.org/r/20200807110656.19965-2-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sai Prakash Ranjan authored
MSM bus scaling has moved on to use interconnect framework and downstream bus scaling APIs like msm_bus_scale*() do not exist anymore in the kernel. Currently they are guarded by a config which also does not exist and hence there are no build failures reported. Remove these unused interfaces as they are currently no-ops and the scaling support that may be added in future will use interconnect API. Link: https://lore.kernel.org/r/20200804161033.15586-1-saiprakash.ranjan@codeaurora.orgSigned-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Link: https://lore.kernel.org/r/159622931040.30579.9167901134341507088.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kevin Barnett authored
Add a counter to assist in verifying when RAID bypass is being used. Link: https://lore.kernel.org/r/159622930468.30579.13153724465552773544.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kevin Barnett authored
Support device deletion via sysfs. I.e: echo 1 > /sys/block/sd<X>/device/delete Link: https://lore.kernel.org/r/159622929885.30579.2727491506675011534.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kevin Barnett authored
Eliminate kernel panics when getting invalid responses from controller. Take controller offline instead of causing kernel panics. Link: https://lore.kernel.org/r/159622929306.30579.16523318707596752828.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Prasad Munirathnam <Prasad.Munirathnam@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mahesh Rajashekhara authored
Have OS rescan after logical volume expansion to reflect new size. Link: https://lore.kernel.org/r/159622928727.30579.298277463169866711.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mahesh Rajashekhara authored
VID_9005, DID_028F, SVID_9005 and SDID_080A. Link: https://lore.kernel.org/r/159622928143.30579.14769183842894725454.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kevin Barnett authored
Eliminate issuing INQUIRYs to problematic devices by using information provided by controller. Link: https://lore.kernel.org/r/159622927172.30579.3960527536810532094.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
Updated driver version to 35.100.00.00 Link: https://lore.kernel.org/r/1596096229-3341-8-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
If driver has not received the interrupt for the aborted SCSI command before processing the TM reply, driver polls all the reply descriptor pools looking for the reply for the aborted SCSI command before marking TM as FAILED. If it finds the reply, then it marks the TM as SUCCESS otherwise it marks it FAILED. scsih_tm_cmd_map_status() checks whether TM has aborted the timed out SCSI command or not. If TM has aborted the IO, then it returns SUCCESS else it returns FAILED. Link: https://lore.kernel.org/r/1596096229-3341-7-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
Add helper functions to check whether any SCSI command is outstanding on particular Target, LUN device. Also add function parameters 'channel', 'id' to function mpt3sas_scsih_issue_tm(). Link: https://lore.kernel.org/r/1596096229-3341-6-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
Rename Function _base_unmask_interrupts() to mpt3sas_base_unmask_interrupts() and _base_mask_interrupts() to mpt3sas_base_mask_interrupts(). Also add function declarion to mpt3sas_base.h Link: https://lore.kernel.org/r/1596096229-3341-5-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
It is not recommended to issue back-to-back host reset without any delay. However, if someone issues back-to-back host reset then we observe that target devices get unregistered and re-register with SML. And if OS drive is behind the HBA when it gets unregistered, then file-system goes into read-only mode. Normally during host reset, driver marks accessible target devices as responding and triggers the event MPT3SAS_REMOVE_UNRESPONDING_DEVICES to remove any non-responding devices through FW worker thread. While processing this event, driver unregisters the non-responding devices and clears the responding flag for all the devices. Currently, during host reset, driver is cancelling only those Firmware event works which are pending in Firmware event workqueue. It is not cancelling work which is currently running. Change the driver to cancel all events. Link: https://lore.kernel.org/r/1596096229-3341-4-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
When controller fails to transition to READY state during driver probe, dump the system interface register set. This will give snapshot of the firmware status for debugging driver load issues. Link: https://lore.kernel.org/r/1596096229-3341-3-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu S authored
Currently config_cmds.reply buffer is not memset to zero before posting config page request message. In some cases, for the current config request, the previous config reply is getting processed and we will observe PageType mismatch between request to reply buffer. It will be difficult to debug this type of issue and it confuses by thinking that HBA Firmware itself posted the wrong config reply. So it is better to memset the config_cmds.reply buffer with zeros before issuing the config request. Link: https://lore.kernel.org/r/1596096229-3341-2-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Can Guo authored
In current UFS task abort hook, namely ufshcd_abort(), if one task is aborted successfully, clk_gating.active_reqs held by this task is not decreased, which makes clk_gating.active_reqs stay above zero forever, thus clock gating would never happen. Instead of releasing resources of one task "manually", use the existing func __ufshcd_transfer_req_compl(). This change also eliminates a possible race of scsi_dma_unmap() from the real completion in IRQ handler path. Link: https://lore.kernel.org/r/1596975355-39813-10-git-send-email-cang@codeaurora.org Fixes: 1ab27c9c ("ufs: Add support for clock gating") CC: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 18 Aug, 2020 4 commits
-
-
Can Guo authored
The current IRQ handler blocks SCSI requests before scheduling eh_work, when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked, pm_runtime_get_sync() will never return because ufshcd_suspend/resume is blocked by the SCSI cmd. - In queuecommand path, hba->ufshcd_state check and ufshcd_send_command should stay under the same spin lock. This is to make sure that no more commands leak into doorbell after hba->ufshcd_state is changed. - Don't block SCSI requests before error handler starts to run, let error handler block SCSI requests when it is ready to start error recovery. - Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime PM ops, let them pass or fail them. Let them pass if eh_work is scheduled due to non-fatal errors. Fail them if eh_work is scheduled due to fatal errors, otherwise the cmds may eventually time out since UFS is in bad state, which gets error handler blocked for too long. If we fail the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails too, but it does not hurt since error handler can recover HBA runtime PM error. Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Can Guo authored
Performing dumps in the IRQ handler causes system stability issues. Move dumps to the error handler and only print basic host registers here. Link: https://lore.kernel.org/r/1596975355-39813-8-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Can Guo authored
The current error handler can not recover HBA runtime PM error if ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit error or SSU cmd error. When this happens, error handler may fail performing a full reset and restore because error handler always assumes that power, IRQs and clocks are ready after pm_runtime_get_sync returns, but actually they are not if ufshcd_resume fails[1]. If ufschd_suspend/resume fails due to UFS errors, runtime PM framework saves the error value to dev.power.runtime_error. After that, HBA dev runtime suspend/resume would not be invoked anymore unless runtime_error is cleared[2]. In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1], error handler cannot assume anything of pm_runtime_get_sync, meaning error handler should explicitly turn ON powers, IRQs and clocks again. To get the HBA runtime PM work as regard for scenario [2], error handler can clear the runtime_error by calling pm_runtime_set_active() if full reset and restore succeeds. And, more important, if pm_runtime_set_active() returns no error, which means runtime_error has been cleared, we also need to resume those scsi devices under HBA in case any of them has failed to be resumed due to HBA runtime resume failure. This is to unblock blk_queue_enter in case there are bios waiting inside it. Link: https://lore.kernel.org/r/1596975355-39813-7-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Can Guo authored
Error recovery can be invoked from multiple code paths, including hibern8 enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler() and eh_work scheduled from IRQ context. Ultimately, these paths are all trying to invoke ufshcd_reset_and_restore() in either a synchronous or asynchronous manner. This causes problems: - If link recovery happens during ungate work, ufshcd_hold() would be called recursively. Although commit 53c12d0e ("scsi: ufs: fix error recovery after the hibern8 exit failure") fixed a deadlock due to recursive calls of ufshcd_hold() by adding a check of eh_in_progress into ufshcd_hold, this check allows eh_work to run in parallel while link recovery is running. - Similar concurrency can also happen when error recovery is invoked from ufshcd_eh_host_reset_handler and ufshcd_link_recovery. - Concurrency can even happen between eh_works. eh_work, currently queued on system_wq, is allowed to have multiple instances running in parallel, but we don't have proper protection for that. If any of above concurrency scenarios happen, error recovery would fail and lead ufs device and host into bad states. To fix the concurrency problem, this change queues eh_work on a single threaded workqueue and removes link recovery calls from the hibern8 enter/exit path. In addition, make use of eh_work in eh_host_reset_handler instead of calling ufshcd_reset_and_restore. This unifies the UFS error recovery mechanism. According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when the link is broken. This essentially applies to any power mode change operations (since they all use PACP_PWR cmds in UniPro layer). So, if a power mode change operation (including AH8 enter/exit) fails, mark link state as UIC_LINK_BROKEN_STATE and schedule the eh_work. In this case, error handler needs to do a full reset and restore to recover the link back to active. Before the link state is recovered to active, ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors. Link: https://lore.kernel.org/r/1596975355-39813-6-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-