- 06 Feb, 2024 18 commits
-
-
Martin K. Petersen authored
Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.4.0.0 This patch set contains fixes identified by static code analyzers, updates to log messaging, bug fixes related to discovery and congestion management, and clean up patches regarding the abuse of shost lock in the driver. The patches were cut against Martin's 6.9/scsi-queue tree. Link: https://lore.kernel.org/r/20240131185112.149731-1-justintee8345@gmail.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Update copyrights to 2024 for files modified in the 14.4.0.0 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Update lpfc version to 14.4.0.0. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-17-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change load_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which incorrectly tests vport->fc_flag rather than vport->load_flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change fc_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, replace shost_lock with an explicit fc_nodes_list_lock spinlock when accessing vport->fc_nodes lists. Although vport memory region is owned by shost->hostdata, it is driver private memory and an explicit fc_nodes list lock for fc_nodes list mutations is more appropriate than locking the entire shost. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-14-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
There is no reason to use the shost_lock to synchronize an LLDD statistics counter. Convert all the nlp state statistic counters into atomic_t. Corresponding zeroing, increments, and reads are converted to atomic versions. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-13-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Desiring to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, it has been determined that there is no need for shost_lock protection when retrieving fc_host port information because it is only for display to user via sysfs. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-12-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
The ACQE notification event to reset congestion statistics should be moved into the specific lpfc_sli4_async_sli_evt() routine instead of being processed from the generic lpfc_sli4_async_event_proc() routine. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-11-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
FPIN frequency is provided by the fabric in peer congestion notifications. Currently, the frequency is only logged in a message, but it should also be saved into the phba's cgn_fpin statistics member for proper application functionality. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-10-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
The "Nodelist not empty" log message and an accompanying delay may be observed when deleting an NPIV port or unloading the lpfc driver. This can occur due to receipt of an ABTS for which there is no corresponding login context or ndlp allocated. In such cases, the driver allocates a new ndlp object to send a BLS_RJT after which the ndlp object unintentionally remains in the NLP_STE_UNUSED_NODE state forever. Add a check to conditionally remove ndlp's initial reference count when queuing a BLS response. If the initial reference is removed, then set the NLP_DROPPED flag to notify other code paths. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-9-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Requests to delete an NPIV port may fail repeatedly if the initial request is received during discovery. If the FC_UNLOADING load_flag is set, then skip CT response processing for the physical port. This allows discovery processing for other lpfc_vport objects to reach their cmpl routines before deleting the vport. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-8-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Upon first RSCN receipt of a target server's remote port that is initially acting as an initiator function, the driver marks the ndlp->nlp_type as an initiator role. Then later, when processing an RSCN for a target function role switch, that ndlp remote port is permanently stuck as an initiator role and can never transition to be discovered as an updated target role function. Remove the NLP_RCV_PLOGI early return if statement clause so that the NLP_NPR_2B_DISC flag gets set. This allows for role change detections. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-7-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Remove the early return NLP_FABRIC check in lpfc_plogi_confirm_nport() because it is possible for switch domain controllers to change WWPN. As a result, allow lpfc_plogi_confirm_nport() to detect that a new ndlp should be initialized in such cases. The old ndlp object will be cleaned up when dev_loss_tmo callbk executes. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-6-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
D_ID swaps are common during cable swaps in a SAN. Thus, there's no reason to log the event at a KERN_ERR level with the trace event logger. Change the log level to KERN_INFO and the normal LOG_ELS flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-5-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
The sg_dma_len() API should be used to retrieve a scatterlist's length instead of directly accessing scatterlist->length. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-4-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
The call to lpfc_sli4_resume_rpi() in lpfc_rcv_padisc() may return an unsuccessful status. In such cases, the elsiocb is not issued, the completion is not called, and thus the elsiocb resource is leaked. Check return value after calling lpfc_sli4_resume_rpi() and conditionally release the elsiocb resource. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-3-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
A static code analyzer tool indicates that the local variable called status in the lpfc_sli4_repost_sgl_list() routine could be used to print garbage uninitialized values in the routine's log message. Fix by initializing to zero. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-2-justintee8345@gmail.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 30 Jan, 2024 22 commits
-
-
Martin K. Petersen authored
Mike Christie <michael.christie@oracle.com> says: The following patches were made over Linus's tree which contains a fix for sd which was not in Martin's branches. The patches allow scsi_execute_cmd users to have scsi-ml retry the cmd for it instead of the caller having to parse the error and loop itself. Link: https://lore.kernel.org/r/20240123002220.129141-1-michael.christie@oracle.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
Add some kunit tests for scsi_check_passthrough() so we can easily make sure we are hitting the cases for which it's difficult to replicate in hardware or even scsi_debug. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-20-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has the SCSI midlayer retry errors instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-19-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has get_sectorsize() have the SCSI midlayer retry errors instead of driving them itself. There is one behavior change where we no longer retry when scsi_execute_cmd() returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern() we use GFP_NOIO, so retrying will probably not help. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-18-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has ses have the SCSI midlayer retry scsi_execute_cmd() errors instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-17-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has read_capacity_10() have the SCSI midlayer retry errors instead of driving them itself. There are 2 behavior changes with this patch: 1. There is one behavior change where we no longer retry when scsi_execute_cmd() returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs since the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern() we use GFP_NOIO, so retrying will probably not help. 2. For the specific UAs we checked for and retried, we would get READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left from the main loop's retries. Each UA now gets READ_CAPACITY_RETRIES_ON_RESET retries, and the other errors get up to 3 retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET is already 10 and is not based on anything specific like a spec or device, so the extra 3 we got from the main loop was probably just an accident and is not going to help. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-16-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
It's common to get a UA when doing PR commands. It could be due to a target restarting, transport level relogin or other PR commands like a release causing it. The upper layers don't get the sense and in some cases have no idea if it's a SCSI device, so this has the sd layer retry. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-15-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has scsi_report_lun_scan() have the SCSI midlayer retry errors instead of driving them itself. There is one behavior change where we no longer retry when scsi_execute_cmd() returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern() we use GFP_NOIO, so retrying will probably not help. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-14-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has scsi_mode_sense() have the SCSI midlayer retry UAs instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-13-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has ch_do_scsi() have the SCSI midlayer retry UAs instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-12-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
unit_attention is not used so remove it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-11-michael.christie@oracle.comReviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has sd_sync_cache() have the SCSI midlayer retry errors instead of driving them itself. There is one behavior change where we no longer retry when scsi_execute_cmd() returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern() we use GFP_NOIO, so retrying will probably not help. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-10-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has spi_execute() have the SCSI midlayer retry UAs instead of driving them. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-9-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has rdac have the SCSI midlayer retry errors instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-8-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has hp_sw have the SCSI midlayer retry scsi_execute_cmd() errors instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-7-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This simplifies sd_spinup_disk() so the SCSI midlayer retries errors for it. Note that we retried every UA except Medium Not Present and also if scsi_status_is_good() returned failed which would happen for all check conditions. In this patch we use SCMD_FAILURE_STAT_ANY which will trigger for the same conditions as when scsi_status_is_good() returns false and there is status. This will cover all CCs including UAs so there is no explicit failures array entry for UAs except for Medium Not Present which we don't want to retry. There is one behavior change where we no longer retry when scsi_execute_cmd() returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern() we use GFP_NOIO, so retrying will probably not help. We do not handle the outside loop's retries because we want to sleep between tries and we don't support that yet. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-6-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
We currently reuse the cmd buffer for the TUR and START_STOP commands which requires us to reset the buffer when retrying. This has us use separate buffers for the 2 commands so we can make them const and I think it makes it easier to handle for retries but does not add too much extra to the stack use. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-5-michael.christie@oracle.comReviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
Description from: Martin Wilck <mwilck@suse.com>: The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts during SCSI device probing, causing device probing to fail. This has been observed in FCoE uplink failover tests, for example. This patch fixes the issue by retrying the INQUIRY. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-4-michael.christie@oracle.comReviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin Wilck <mwilck@suse.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
This has scsi_probe_lun() ask the SCSI midlayer to retry UAs instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-3-michael.christie@oracle.comAcked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Mike Christie authored
For passthrough we don't retry any error which we get a check condition for. This results in a lot of callers driving their own retries for all UAs, specific UAs, NOT_READY, specific sense values or any type of failure. This adds the core code to allow passthrough users to specify what errors they want the SCSI midlayer to retry for them. We can then convert users to drop a lot of their sense parsing and retry handling. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-2-michael.christie@oracle.comReviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Zhijian authored
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). > ./drivers/scsi/pm8001/pm8001_ctl.c:883:8-16: WARNING: please use sysfs_emit No functional change intended CC: Jack Wang <jinpu.wang@cloud.ionos.com> CC: James E.J. Bottomley <jejb@linux.ibm.com> CC: Martin K. Petersen <martin.petersen@oracle.com> CC: linux-scsi@vger.kernel.org Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240116045151.3940401-32-lizhijian@fujitsu.comAcked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Li Zhijian authored
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). > ./drivers/scsi/isci/init.c:140:8-16: WARNING: please use sysfs_emit No functional change intended CC: Artur Paszkiewicz <artur.paszkiewicz@intel.com> CC: James E.J. Bottomley <jejb@linux.ibm.com> CC: Martin K. Petersen <martin.petersen@oracle.com> CC: linux-scsi@vger.kernel.org Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240116045151.3940401-25-lizhijian@fujitsu.comReviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-