- 01 Oct, 2019 40 commits
-
-
Arkadiusz Drabczyk authored
Fix several spelling typos in comments in csio_hw.c. Link: https://lore.kernel.org/r/20190912172546.16489-1-arkadiusz@drabczyk.orgSigned-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Navid Emamdoost authored
In bfad_im_get_stats if bfa_port_get_stats fails, allocated memory needs to be released. Link: https://lore.kernel.org/r/20190910234417.22151-1-navid.emamdoost@gmail.comSigned-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Markus Elfring authored
Simplify this function implementation by using a known function. Generated by: scripts/coccinelle/api/ptr_ret.cocci [mkp: applied by hand] Link: https://lore.kernel.org/r/9e667f19-434e-ed30-78cb-9ddc6323c51e@web.deSigned-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
Don't populate the array setup_attrs on the stack but instead make it static const. Makes the object code smaller by 180 bytes. Before: text data bss dec hex filename 2140 224 0 2364 93c drivers/scsi/ufs/ufshcd-dwc.o After: text data bss dec hex filename 1863 320 0 2183 887 drivers/scsi/ufs/ufshcd-dwc.o (gcc version 9.2.1, amd64) Link: https://lore.kernel.org/r/20190906170104.10450-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
Don't populate the array 'options' on the stack but instead make it static const. Makes the object code smaller by 143 bytes. Before: text data bss dec hex filename 94483 11272 1184 106939 1a1bb drivers/scsi/ips.o After: text data bss dec hex filename 94244 11368 1184 106796 1a12c drivers/scsi/ips.o (gcc version 9.2.1, amd64) Link: https://lore.kernel.org/r/20190906164522.5644-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
Don't populate the array dev_cmd_err on the stack but instead make it static const. Makes the object code smaller by 80 bytes. Before: text data bss dec hex filename 21461 1564 0 23025 59f1 drivers/scsi/fnic/vnic_dev.o After: text data bss dec hex filename 21318 1628 0 22946 59a2 drivers/scsi/fnic/vnic_dev.o (gcc version 9.2.1, amd64) Link: https://lore.kernel.org/r/20190906163945.3889-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
The variable rc is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Link: https://lore.kernel.org/r/20190905135017.23772-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
The pointer host is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Link: https://lore.kernel.org/r/20190905134229.21194-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
YueHaibing authored
Fixes gcc '-Wunused-but-set-variable' warning: drivers/scsi/smartpqi/smartpqi_init.c: In function 'pqi_driver_version_show': drivers/scsi/smartpqi/smartpqi_init.c:6164:24: warning: variable 'ctrl_info' set but not used [-Wunused-but-set-variable] commit 6d90615f ("scsi: smartpqi: add sysfs entries") added it but it was never used. Also remove variable 'shost'. [mkp: commit desc] Link: https://lore.kernel.org/r/20190831130348.20552-1-yuehaibing@huawei.comReported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Colin Ian King authored
There is a statement that is indented one level too deeply, remove the tab, re-join broken line and remove some empty lines. Link: https://lore.kernel.org/r/20190831073903.7834-1-colin.king@canonical.com Addresses-Coverity: ("Indentation does not match nesting") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Bump mpt3sas driver version to 32.100.00.00 Link: https://lore.kernel.org/r/1568379890-18347-14-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Load driver with module parameter "max_msix_vectors". Value provided in module parameter is not used by mpt3sas driver. Driver loads with max controller supported MSI-X value. In _base_alloc_irq_vectors use reply_queue_count which is determined using user provided msix value insted of ioc->msix_vector_count which tells max supported msix value of the controller. Link: https://lore.kernel.org/r/1568379890-18347-13-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
If any faulty application issues an NVMe Encapsulated commands to HBA which doesn't support NVMe protocol then driver should return the command as invalid with the following message. "HBA doesn't support NVMe. Rejecting NVMe Encapsulated request." Otherwise below page fault kernel panic will be observed while building the PRPs as there is no PRP pools allocated for the HBA which doesn't support NVMe drives. RIP: 0010:_base_build_nvme_prp+0x3b/0xf0 [mpt3sas] Call Trace: _ctl_do_mpt_command+0x931/0x1120 [mpt3sas] _ctl_ioctl_main.isra.11+0xa28/0x11e0 [mpt3sas] ? prepare_to_wait+0xb0/0xb0 ? tty_ldisc_deref+0x16/0x20 _ctl_ioctl+0x1a/0x20 [mpt3sas] do_vfs_ioctl+0xaa/0x620 ? vfs_read+0x117/0x140 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x60/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [mkp: tweaked error string] Link: https://lore.kernel.org/r/1568379890-18347-12-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
The firmware image layout has been changed for Aero controllers. All compatible HBAs have to get Firmware Package version from Component Image Header layout. The Signature field in FW header is set to 0xEB000042 for products compatible with Component Image Header. For compatible controllers, driver fetches firmware package version from ApplicationSpecific field of Component Image Header. Link: https://lore.kernel.org/r/1568379890-18347-11-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Return the diag buffer release command with -EINVAL status if the buffer is already released. Link: https://lore.kernel.org/r/1568379890-18347-10-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set whenever application registers the diag buffer & it will be cleared when application unregisters the buffer. When this flag is enabled, and if application issues diag buffer register command without releasing the buffer, then register command will be failed with -EINVAL status by saying that this buffer is already registered by the application. When user issues a trace buffer register command through sysfs parameter, and if trace buffer is in released stated but not yet unregistered by the application which was owning it, then driver will unregister the buffer by itself and freshly register the 1MB sized trace buffer with the HBA firmware. Link: https://lore.kernel.org/r/1568379890-18347-9-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
The diag buffer which is allocated during driver load time or through sysfs parameter is marked as driver allocated diag buffer. MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer. This buffer won't be de-allocated even when application issues unregister command, driver just clears the registered status bit. Same buffer will be reused while re-registering the same diag buffer type by any application. While re-registering the same diag buffer type application has to register with the same size that the buffer was allocated during driver load time. This buffer size can be read by the application by issuing diag 'query' command. This always makes sure that the memory is available for applications for collecting the firmware logs. Only thing is that this won't allow the application to re-register the diag buffer with different size, but the buffer size which is allocated during driver load time will be enough for most of the cases for collecting the firmware logs. Link: https://lore.kernel.org/r/1568379890-18347-8-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Clear MPT3_DIAG_BUFFER_IS_RELEASED bit once diag buffer is re-registered after reading the buffer, else driver won't release the buffer and return the 'diag release' command with -EINVAL status saying that buffer is already released. Link: https://lore.kernel.org/r/1568379890-18347-7-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Application A has registered a diag buffer and looking for particular event to happen to release & read the trace buffer. Meanwhile application B has unregistered the diag buffer and now Application A can't get the required diag buffer. So proper diag buffer ownership is missing. Each application has to maintain its own Unique ID. Now driver has to save the Application's UniqueID for each diag buffer type when diag buffer is registered. And driver has to allow 'release', 'read' & 'unregister' diag commands only if application's UniqueID matches with saved UniqueID for the corresponding diag buffer type. When diag buffer is registered by the driver, then the UniqueID saved by the driver is "BRCM" (i.e. 0x4252434D) for SAS3 and above generations HBA devices. For SAS2 HBAs, driver keeps the legacy UniqueID 0x07075900 for maintaining compatibility with the legacy SAS2 application and this improvement won't be applicable for SAS2 HBA devices. Any application can own the buffer registered by the driver by sending diag register request to driver with same buffer type and size (Application can get the buffer size by sending 'query' command). Then driver changes the ownership of the buffer by saving application's UniqueID for that corresponding buffer type. Also, application can re-register the diag buffer with same size without un-registering it, but diag buffer should be released before re-registering it. By allowing this, driver no need to deallocate and allocate a new buffer for re-register command, same buffer can be re-used. Link: https://lore.kernel.org/r/1568379890-18347-6-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Memory leak can happen when diag buffer is released but not unregistered (where buffer is deallocated) by the user. During module unload time driver is not deallocating the buffer if the buffer is in released state. Deallocate the diag buffer during module unload time without any diag buffer status checks. Link: https://lore.kernel.org/r/1568379890-18347-5-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
When user issues diag register command from application with required size, and if driver unable to allocate the memory, then it will fail the register command. While failing the register command, driver is not currently clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set before trying to allocate the memory. As this bit is set, subsequent register command will be failed with BUSY status even when user wants to register the trace buffer will less memory. Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag register command with no memory status. Link: https://lore.kernel.org/r/1568379890-18347-4-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Display message before releasing the diag buffer so that user knows which event caused the release of diag buffer. Releasing of diag buffer means HBA firmware stops posting the firmware logs on the registered diag buffer. Link: https://lore.kernel.org/r/1568379890-18347-3-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Sreekanth Reddy authored
Currently if user wishes to enable the host trace buffer during driver load time, then user has to load the driver with module parameter 'diag_buffer_enable' set to one. Alternatively now the user can enable host trace buffer by enabling the following fields in manufacturing page11 in NVDATA (nvdata xml is used while building HBA firmware image): * HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can allocate, * HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host should allocate, * HostTraceBufferDecrementSizeKB - size by which host can reduce from buffer size and retry the buffer allocation when buffer allocation failed with previous calculated buffer size. The driver will register the trace buffer automatically without any module parameter during boot time when above fields are enabled in manufacturing page11 in HBA firmware. Driver follows the following algorithm for enabling the host trace buffer during driver load time: * If user has loaded the driver with module parameter 'diag_buffer_enable' set to one, then driver allocates 2MB buffer and registers this buffer with HBA firmware for capturing the firmware trace logs. * Else driver reads manufacture page11 data and checks whether HostTraceBufferMaxSizeKB filed is zero or not? - If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is successful, then it will register this buffer with HBA firmware, else in a loop the driver will try again by reducing the current buffer size with HostTraceBufferDecrementSizeKB size until memory allocation is successful or buffer size falls below HostTraceBufferMinSizeKB. If the memory allocation is successful, then the buffer will be registered with the firmware. Else, if the buffer size falls below the HostTraceBufferMinSizeKB, then driver won't register trace buffer with HBA firmware. - If HostTraceBufferMaxSizeKB is zero, then driver won't register trace buffer with HBA firmware. Link: https://lore.kernel.org/r/1568379890-18347-2-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Update lpfc version to 12.4.0.1 Link: https://lore.kernel.org/r/20190922035906.10977-21-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in lpfc driver message 0387. Presumed residual (or unused) code from previous commit. Removed fcp_txcmplq_cnt. Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior commit in the 12.4.0.0 stream added device recognition that would prevent T10 PI enablement. However, it didn't contain a complete device list. Thus some SLI-4 FCoE adapters still had T10 PI enabled. Fix by expanding the device list that identifies FCoE devices. Link: https://lore.kernel.org/r/20190922035906.10977-19-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
This patch updates ACQE handling for: - an EEPROM failure error reported by the adapter. - ensures that all data for any ACQE, recognized or not, is logged. - Given that all data is now logged unconditionally, the default case (unrecognized) data can be reduced. Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool before any associated sgl or cmd and rsp buffers are returned via their respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf structure is reused quickly, there may be a race condition between release of old and association of new resources. Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp buffer lists before releasing the lpfc_io_buf structure for re-use. Fixes: d79c9e9d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard deadlocks were seen around lpfc_scsi_prep_cmnd(). Fix by converting the locks to irqsave/irqrestore. Fixes: d79c9e9d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
While reviewing the CT behavior, issues with spinlock_irq were seen. The driver should be using spinlock_irqsave/irqrestore in the els flush routine. Changed to spinlock_irqsave/irqrestore. Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
After study, it was determined there was a double free of a CT iocb during execution of lpfc_offline_prep and lpfc_offline. The prep routine issued an abort for some CT iocbs, but the aborts did not complete fast enough for a subsequent routine that waits for completion. Thus the driver proceeded to lpfc_offline, which releases any pending iocbs. Unfortunately, the completions for the aborts were then received which re-released the ct iocbs. Turns out the issue for why the aborts didn't complete fast enough was not their time on the wire/in the adapter. It was the lpfc_work_done routine, which requires the adapter state to be UP before it calls lpfc_sli_handle_slow_ring_event() to process the completions. The issue is the prep routine takes the link down as part of it's processing. To fix, the following was performed: - Prevent the offline routine from releasing iocbs that have had aborts issued on them. Defer to the abort completions. Also means the driver fully waits for the completions. Given this change, the recognition of "driver-generated" status which then releases the iocb is no longer valid. As such, the change made in the commit 29601228 is reverted. As recognition of "driver-generated" status is no longer valid, this patch reverts the changes made in commit 29601228 ("scsi: lpfc: Fix leak of ELS completions on adapter reset") - Modify lpfc_work_done to allow slow path completions so that the abort completions aren't ignored. - Updated the fdmi path to recognize a CT request that fails due to the port being unusable. This stops FDMI retries. FDMI will be restarted on next link up. Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Scenarios were seen where a host hung when the system booted or the host was very slow in booting. The link would not come up and no luns were visible to the host. After investigation, this was found to be due to the introduction of a new ACQE that adapter may generate to report a adapter hw warning. The ACQE was delivered to the driver very early in adapter initialization, when the driver did not expect command completion. As part of handling this unexpected interrupt the an EQEs are consumed and discarded and the EQ rearmed. The issue is the CQ that cause the EQE and thus the interrupt was not processed and the CQ was left unarmed. Meaning it would no longer generate a new interrupt condition. Subsequent mailbox commands used to initialize the adapter use the same CQ, and as there was no completion interrupt generated, the driver never saw the mailbox commands complete and it would wait long command timeouts. Fix by having the early flush routine also process the related CQ and rearm the CQ. Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Coverity flagged several scenarios where checking of null pointer values wasn't consistent. Fix the code to that be consistent on checking. Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
When the port, running as a nvme target, receives an ABTS, it submits commands to the adapter to Abort i/o outstanding in the adapter. The Abort command formatting routine left a command field set to zero, which instructs the adapter to generate an ABTS on the wire as part of cleaning up the I/O. This is common operation for an initiator, but not for a target. Fix the driver to check whether an ABTS had been received for the I/O, and if so, change the Abort command formatting so that the ABTS generation is disabled (IA=1). No need to ABTS it when the other side already has. Also refactored the code such that there is a single routine being used for nvme or nvmet ABORT requests, and IA is an argument. Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
An issue was seen discovering all SCSI Luns when a target device undergoes link bounce. The driver currently does not qualify the FC4 support on the target. Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is that the target will reject the PRLI if it is not supported. If a PRLI times out, the driver will retry. The driver will not proceed with the device until both SCSI and NVMe PRLIs are resolved. In the failure case, the device is FCP only and does not respond to the NVMe PRLI, thus initiating the wait/retry loop in the driver. During that time, a RSCN is received (device bounced) causing the driver to issue a GID_FT. The GID_FT response comes back before the PRLI mess is resolved and it prematurely cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE state. Discovery with the target never completes or resets. Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes, thereby restarting the discovery process for the node. Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Faults are seen with RIP of lpfc_scsi_cmd_iocb_cmpl(). The failure is when lpfc_update_status is being called as part of the completion. After debugging, it was seen the issue was the shost pointer that the driver derived from the scsi cmd. The crash showed the cmd->device pointer being bogus, which is likely as the scsi devices were offlined prior. The bogus device pointer caused subsequent pointers derived from the location, specifically the vport, to be bogus. Fix by adjusting the calling sequence to pass in the vport rather than having to derive it from the cmd structure. Link: https://lore.kernel.org/r/20190922035906.10977-9-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Symptoms were seen of the driver not having valid data for mailbox commands. After debugging, the following sequence was found: The driver maintains a port-wide pointer of the mailbox command that is currently in execution. Once finished, the port-wide pointer is cleared (done in lpfc_sli4_mq_release()). The next mailbox command issued will set the next pointer and so on. The mailbox response data is only copied if there is a valid port-wide pointer. In the failing case, it was seen that a new mailbox command was being attempted in parallel with the completion. The parallel path was seeing the mailbox no long in use (flag check under lock) and thus set the port pointer. The completion path had cleared the active flag under lock, but had not touched the port pointer. The port pointer is cleared after the lock is released. In this case, the completion path cleared the just-set value by the parallel path. Fix by making the calls that clear mbox state/port pointer while under lock. Also slightly cleaned up the error path. Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
When target-side fault injections are made, the driver isn't reconnecting to the remote port. The driver is logging "2753" error messages which state: "PLOGI failure DID:1B2400 Status:x3/xf0240008" The failures status is indicating a Illegal field error, which points to the Temporary RPI field being used for the ELS. This error typically means the driver used an RPI that was already registered (shouldn't be registered if using it in this context). Study has found that if the driver were in discovery attempts and encountered an error, it wouldn't flag the temporary rpi in error. Yet the rpi was released for reallocation in these error paths and another ELS could allocate the rpi. In the failure situation a retry was done on an ELS that had encountered an error, and as the rpi wasn't marked in error, the ELS reused the rpi it originally allocated. But that rpi had been allocated by a different ELS issued after the original error and before the retry attempt. The different ELS had succeeded and the RPI was registered. Fix by marking the rpi state for the node to be in error, aka as needing reallocation, upon an error in the els processing. Error state marking is always done prior to release back to the internal rpi free list, which the driver wasn't doing in cases prior. Also enhanced some of the logging to help in the next case of problem troubleshooting. Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp pointer. However, further testing has shown that this change causes a later state change to occasionally be skipped, which results in a reference count never being decremented thus the rpi is never released, which causes a vport delete to never succeed. Revise the fix in the prior patch to no longer null the ndlp. Instead the RELEASE_RPI flag is set which will drive the release of the rpi. Given the new code was added at a deep indentation level, refactor the code block using a new routine that avoids the indentation issues. Fixes: 9b164068 ("scsi: lpfc: Fix use-after-free mailbox cmd completion") Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-