Commits · 91fdd0788878862581128c86025c8a6262aeb868 · Kirill Smelkov / linux

20 May, 2022 14 commits

scsi: dpt_i2o: Drop redundant spinlock initialization · 91fdd078

Haowen Bai authored May 10, 2022

adpt_post_wait_lock was declared and initialized by DEFINE_SPINLOCK so we
don't need to call spin_lock_init(). Drop the call.

Link: https://lore.kernel.org/r/1652176024-3981-1-git-send-email-baihaowen@meizu.comSigned-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

91fdd078

scsi: qedf: Remove redundant variable op · fc65df48

Colin Ian King authored May 17, 2022

The variable 'op' is assigned a value and is never read. The variable is
not used and is redundant, remove it.

Link: https://lore.kernel.org/r/20220517092518.93159-1-colin.i.king@gmail.comSigned-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fc65df48

scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver() · 6c6ac8b7

John Garry authored May 17, 2022

The memories for the slot should be observed to be written prior to
observing the slot as ready.

Prior to commit 26fc0ea7 ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR"),
we had a spin_lock() + spin_unlock() immediately before marking the slot as
ready. The spin_unlock() - with release semantics - caused the slot memory
to be observed to be written.

Now that the spin_lock() + spin_unlock() is gone, use a smp_wmb().

Link: https://lore.kernel.org/r/1652774661-12935-1-git-send-email-john.garry@huawei.com
Fixes: 26fc0ea7 ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR")
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6c6ac8b7

scsi: fnic: Replace DMA mask of 64 bits with 47 bits · b559b99a

Karan Tilak Kumar authored May 13, 2022

Cisco VIC supports only 47 bits. If the host sends DMA addresses that are
greater than 47 bits, it causes work queue (WQ) errors in the VIC.

Link: https://lore.kernel.org/r/20220513205605.81788-1-kartilak@cisco.comTested-by: Karan Tilak Kumar <kartilak@cisco.com>
Co-developed-by: Dhanraj Jhawar <djhawar@cisco.com>
Signed-off-by: Dhanraj Jhawar <djhawar@cisco.com>
Co-developed-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b559b99a

scsi: mpi3mr: Add target device related sysfs attributes · 9feb5c4c

Sreekanth Reddy authored May 17, 2022

Add sysfs attributes for exposing target device details such as SAS
address, firmware device handle, and persistent ID for the
controller-attached devices and RAID volumes.

Link: https://lore.kernel.org/r/20220517115310.13062-3-sreekanth.reddy@broadcom.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9feb5c4c

scsi: mpi3mr: Add shost related sysfs attributes · e51e76ed

Sreekanth Reddy authored May 17, 2022

Add shost related sysfs attributes to display the controller's firmware
version, queue depth, number of requests, and number of reply queues. Also
add an attribute to set & get the logging_level.

Link: https://lore.kernel.org/r/20220517115310.13062-2-sreekanth.reddy@broadcom.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e51e76ed

scsi: elx: efct: Remove redundant memset() statement · e79aaa9c

Harshit Mogalapalli authored May 05, 2022

As memset() of bmbx is immediately followed by a memcpy() where bmbx is the
destination, the memset() is redundant.

Link: https://lore.kernel.org/r/20220505143703.45441-1-harshit.m.mogalapalli@oracle.comSigned-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e79aaa9c

scsi: megaraid_sas: Remove redundant memset() statement · 2f9e9a7b

Harshit Mogalapalli authored May 05, 2022

As memset() of scmd->sense_buffer is immediately followed by a memcpy()
where scmd->sense_buffer is the destination. The memset() is redundant.

Link: https://lore.kernel.org/r/20220505143214.44908-1-harshit.m.mogalapalli@oracle.comSigned-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2f9e9a7b

scsi: mpi3mr: Return error if dma_alloc_coherent() fails · bc7896d3

Dan Carpenter authored May 05, 2022

Return -ENOMEM instead of success if dma_alloc_coherent() fails.

Link: https://lore.kernel.org/r/YnOmMGHqCOtUCYQ1@kili
Fixes: 43ca1100 ("scsi: mpi3mr: Add support for PEL commands")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bc7896d3

scsi: hisi_sas: Fix rescan after deleting a disk · e9dedc13

John Garry authored May 12, 2022

Removing an ATA device via sysfs means that the device may not be found
through re-scanning:

root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete
root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan
root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john#

The problem is that the rescan of the device may conflict with the device
in being re-initialized, as follows:

 - In the rescan we call hisi_sas_slave_alloc() in store_scan() ->
   sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc()
   -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device()
   In hisi_sas_init_device() we issue an IT nexus reset for ATA devices

 - That IT nexus causes the remote PHY to go down and this triggers a bcast
   event

 - In parallel libsas processes the bcast event, finds that the phy is down
   and marks the device as gone

The hard reset issued in hisi_sas_init_device() is unncessary - as
described in the code comment - so remove it. Also set dev status as
HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call.

Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com
Fixes: 36c6b761 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback")
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e9dedc13

scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset · 71453bd9

John Garry authored May 12, 2022

We have seen errors like this when a SATA device is probed:

[524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ...
[524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00

Since commit 21c7e972 ("scsi: hisi_sas: Disable SATA disk phy for
severe I_T nexus reset failure"), we issue an ATA softreset to disks after
a phy reset to ensure that they are in sound working order. If the
softreset is issued before the remote phy has come back up then the
softreset will fail (errors as above). Remedy this by waiting for the phy
to come back up after the reset.

Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.comTested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

71453bd9

scsi: libsas: Refactor sas_ata_hard_reset() · 057e5fc0

John Garry authored May 12, 2022

Create function sas_ata_wait_after_reset() from sas_ata_hard_reset() as
some LLDDs may want to check for a remote ATA phy is up after reset.

Link: https://lore.kernel.org/r/1652354134-171343-2-git-send-email-john.garry@huawei.comTested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

057e5fc0

scsi: mpt3sas: Update driver version to 42.100.00.00 · 53d5088d

Sreekanth Reddy authored May 11, 2022

Update driver version to 42.100.00.00.

Link: https://lore.kernel.org/r/20220511072621.30657-2-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

53d5088d

scsi: mpt3sas: Fix junk chars displayed while printing ChipName · 8e129add

Sreekanth Reddy authored May 11, 2022

Terminate string after copying 16 bytes of ChipName data from Manufacturing
Page0 to prevent %s from printing junk characters.

Link: https://lore.kernel.org/r/20220511072621.30657-1-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8e129add

17 May, 2022 6 commits

scsi: ipr: Use kobj_to_dev() · aabd5fea

Minghao Chi authored May 10, 2022

Use kobj_to_dev() instead of open-coding it.

Link: https://lore.kernel.org/r/20220510105113.1351891-1-chi.minghao@zte.com.cnReported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

aabd5fea

scsi: mpi3mr: Fix a NULL vs IS_ERR() bug in mpi3mr_bsg_init() · a25eafd1

Dan Carpenter authored May 06, 2022

The bsg_setup_queue() function does not return NULL.  It returns error
pointers.  Fix the check accordingly.

Link: https://lore.kernel.org/r/YnUf7RQl+A3tigWh@kili
Fixes: 4268fa75 ("scsi: mpi3mr: Add bsg device support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a25eafd1

scsi: bnx2fc: Avoid using get_cpu() in bnx2fc_cmd_alloc() · 20f8932f

Sebastian Andrzej Siewior authored May 06, 2022

Using get_cpu() leads to disabling preemption and in this context it is not
possible to acquire the following spinlock_t on PREEMPT_RT because it
becomes a sleeping lock.

Commit 0ea5c275 ("[SCSI] bnx2fc: common free list for cleanup
commands") says that it is using get_cpu() as a fix in case the CPU is
preempted. While this might be true, the important part is that it is now
using the same CPU for locking and unlocking while previously it always
relied on smp_processor_id(). The date structure itself is protected with
a lock so it does not rely on CPU-local access.

Replace get_cpu() with raw_smp_processor_id() to obtain the current CPU
number which is used as an index for the per-CPU resource.

Link: https://lore.kernel.org/r/20220506105758.283887-5-bigeasy@linutronix.deReviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

20f8932f

scsi: libfc: Remove get_cpu() semantics in fc_exch_em_alloc() · a0548edf

Davidlohr Bueso authored May 06, 2022

The get_cpu() in fc_exch_em_alloc() was introduced in commit f018b73a
("[SCSI] libfc, libfcoe, fcoe: use smp_processor_id() only when preempt
disabled") for no other reason than to simply use smp_processor_id()
without getting a warning, because everything is done with the pool->lock
held anyway.  However, get_cpu(), by disabling preemption, does not play
well with PREEMPT_RT, particularly when acquiring a regular (and thus
sleepable) spinlock.

Therefore remove the get_cpu() and just use the unstable value as we will
have CPU locality guarantees next by taking the lock.  The window of
migration, as noted by Sebastian, is small and even if it happens the
result is correct.

Link: https://lore.kernel.org/r/20211117025956.79616-2-dave@stgolabs.net
Link: https://lore.kernel.org/r/20220506105758.283887-4-bigeasy@linutronix.deAcked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a0548edf

scsi: fcoe: Use per-CPU API to update per-CPU statistics · a912460e

Sebastian Andrzej Siewior authored May 06, 2022

The per-CPU statistics (struct fc_stats) is updated by getting a stable
per-CPU pointer via get_cpu() + per_cpu_ptr() and then performing the
increment. This can be optimized by using this_cpu_*() which will do
whatever is needed on the architecture to perform the update safe and
efficient. The read out of the individual value (fc_get_host_stats())
should be done by using READ_ONCE() instead of a plain-C access. The
difference is that READ_ONCE() will always perform a single access while
the plain-C access can be split by the compiler into two loads if it
appears beneficial. The usage of u64 has the side-effect that it is also
64bit wide on 32bit architectures and the read is always split into two
loads. The can lead to strange values if the read happens during an update
which alters both 32bit parts of the 64bit value. This can be circumvented
by either using a 32bit variables on 32bit architecures or extending the
statistics with a sequence counter.

Use this_cpu_*() API to update the statistics and READ_ONCE() to read it.

Link: https://lore.kernel.org/r/20220506105758.283887-3-bigeasy@linutronix.deReviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a912460e

scsi: fcoe: Add a local_lock to fcoe_percpu · 848b8977

Davidlohr Bueso authored May 06, 2022

fcoe_get_paged_crc_eof() relies on the caller having preemption disabled to
ensure the per-CPU fcoe_percpu context remains valid throughout the
call. This is done by either holding spinlocks (such as bnx2fc_global_lock
or qedf_global_lock) or the get_cpu() from fcoe_alloc_paged_crc_eof(). This
last one breaks PREEMPT_RT semantics as there can be memory allocation and
end up sleeping in atomic contexts.

Introduce a local_lock_t to struct fcoe_percpu that will keep the non-RT
case the same, mapping to preempt_disable/enable, while RT will use a
per-CPU spinlock allowing the region to be preemptible but still maintain
CPU locality. The other users of fcoe_percpu are already safe in this
regard and do not require local_lock()ing.

Link: https://lore.kernel.org/r/20211117025956.79616-3-dave@stgolabs.net
Link: https://lore.kernel.org/r/20220506105758.283887-2-bigeasy@linutronix.deAcked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

848b8977

11 May, 2022 20 commits

scsi: target: iscsi: Rename iscsi_session to iscsit_session · 0873fe44

Max Gurtovoy authored Apr 28, 2022

The structure iscsi_session naming is used by the iSCSI initiator
driver. Rename the target session to iscsit_session to have more readable
code.

Link: https://lore.kernel.org/r/20220428092939.36768-3-mgurtovoy@nvidia.comReviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0873fe44

scsi: target: iscsi: Rename iscsi_conn to iscsit_conn · be36d683

Max Gurtovoy authored Apr 28, 2022

The structure iscsi_conn naming is used by the iSCSI initiator
driver. Rename the target conn to iscsit_conn to have more readable code.

Link: https://lore.kernel.org/r/20220428092939.36768-2-mgurtovoy@nvidia.comReviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

be36d683

scsi: target: iscsi: Rename iscsi_cmd to iscsit_cmd · 66cd9d4e

Max Gurtovoy authored Apr 28, 2022

The structure iscsi_cmd naming is used by the iSCSI initiator driver.
Rename the target cmd to iscsit_cmd to have more readable code.

Link: https://lore.kernel.org/r/20220428092939.36768-1-mgurtovoy@nvidia.comReviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

66cd9d4e

scsi: mpi3mr: Return I/Os to an unrecoverable HBA with DID_ERROR · 256bd4f2

Sreekanth Reddy authored May 06, 2022

Complete all new I/O requests issued to an unrecoverable controller with
DID_ERROR status instead of returning the I/O requests with
SCSI_MLQUEUE_HOST_BUSY. This will prevent the infinite retries of the new
I/Os when a controller is in an unrecoverable state.

Link: https://lore.kernel.org/r/20220505184808.24049-1-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

256bd4f2

scsi: mpi3mr: Hidden drives not removed during soft reset · 2dd8389f

Sreekanth Reddy authored May 06, 2022

If any drive is missing during reset, the driver checks whether the device
is exposed to the OS. If it is, then it removes the device from the OS and
its own internal list. For hidden devices, even if they are found as
missing during reset, the driver is not removing them from its internal
list.

Modify driver to remove hidden devices from the driver's target device list
if they are missing during soft reset.

Link: https://lore.kernel.org/r/20220505184808.24049-2-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2dd8389f

scsi: mpi3mr: Increase I/O timeout value to 60s · 1aa529d4

Sreekanth Reddy authored May 06, 2022

Set each SCSI device's default I/O timeout and default error handling I/O
timeout to 60s.

Link: https://lore.kernel.org/r/20220505184808.24049-3-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1aa529d4

scsi: lpfc: Update lpfc version to 14.2.0.3 · fcb9e738

James Smart authored May 05, 2022

Update lpfc version to 14.2.0.3

Link: https://lore.kernel.org/r/20220506035519.50908-13-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fcb9e738

scsi: lpfc: Use sg_dma_address() and sg_dma_len() macros for NVMe I/O · a14396b6

James Smart authored May 05, 2022

NVMe I/O problems may be seen on IOMMU enabled platforms. Adapter I/Os
failing with transfer length mismatches.

The sg list processing routine for NVMe I/O is accessing the sg entry
directly for the length and address fields. On some IOMMU platforms,
contigous mappings are compressed to the first sg entry with the sum of the
lengths set to the sg entry dma_length field. The length fields are left
for later use by the unmap call. As such, the driver didn't see the actual
dma_length value, just the first entries length value. Drivers are to use
the sg_dma_length() and sg_dma_address() macros to reference the sg
entry. The macros select the proper length field (dma_length or length) to
reference.

Fix the offending code to use the sg_dma_xxx macros.

Link: https://lore.kernel.org/r/20220506035519.50908-12-jsmart2021@gmail.comTested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a14396b6

scsi: lpfc: Alter FPIN stat accounting logic · e6f51041

James Smart authored May 05, 2022

When configuring CMF management based on signals instead of FPINs, FPIN
alarm and warning statistics are not tracked.

Change the behavior so that FPIN alarms and warnings are always tracked
regardless of the configured mode.

Similar changes are made in the CMF signal stat accounting logic. Upon
receipt of a signal, only track signaled alarms and warnings. FPIN stats
should not be incremented upon receipt of a signal.

Link: https://lore.kernel.org/r/20220506035519.50908-11-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e6f51041

scsi: lpfc: Rework FDMI initialization after link up · de3ec318

James Smart authored May 05, 2022

After a link up, it's possible for the switch to change FDMI support (e.g.
FDMI1 vs FDMI2 vs SmartSAN). If the switch reverts to FDMI1, then the
revert is currently not detected.

Additionally, when NPIV is configured, it's possible the physical port's
RHBA is unprocessed by the switch before reciept of an NPIV port issued
RPRT. This causes some switches vendors to reject the NPIV's RPRT.

Fix by reinitializing base FDMI mode on link up, and defer FDMI vport RPRT
submission until after confirming physical port's RHBA is completed.

Link: https://lore.kernel.org/r/20220506035519.50908-10-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

de3ec318

scsi: lpfc: Change VMID registration to be based on fabric parameters · 5099478e

James Smart authored May 05, 2022

Currently, VMID registration is configured via module parameters. This
could lead to VMID compatibility issues if two ports are connected to
different brands of switches, as the two brands implement VMID differently.

Make logical changes so that VMID registration is based on common service
parameters from FLOGI_ACC with fabric rather than module parameters.

Link: https://lore.kernel.org/r/20220506035519.50908-9-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5099478e

scsi: lpfc: Decrement outstanding gidft_inp counter if lpfc_err_lost_link() · dc8a71bd

James Smart authored May 05, 2022

During large NPIV port testing, it was sometimes seen that not all vports
would log back in to the target device.

There are instances when the fabric is slow to respond to a spam of GID_PT
requests and as a result the SLI PORT may abort the GID_PT request because
the fabric takes so long. lpfc_cmpl_ct_cmd_gid_pt() would enter the
lpfc_err_lost_link() logic and attempt to lpfc_els_flush_rscn(), which is
fine, but forgets to decrement the gidft_inp counter. This results in a
vport->gidft_inp never reaching 0 and never restarting discovery again.

Decrement vport->gidft_inp if lpfc_err_lost_link() is true for both
lpfc_cmpl_ct_cmd_gid_pt() and lpfc_cmpl_ct_cmd_gid_ft().

Increase logging info during RSCN timeout and lpfc_err_lost_link() events.

Link: https://lore.kernel.org/r/20220506035519.50908-8-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dc8a71bd

scsi: lpfc: Use list_for_each_entry_safe() in rscn_recovery_check() · 4a0f4aff

James Smart authored May 05, 2022

In GID_PT mode with lpfc_ns_query=1, a race condition between iterating the
vport->fc_nodes list in lpfc_rscn_recovery_check() and cleanup of an ndlp
can trigger a crash while processing the RSCN of another initiator from the
same zone.

During iteration of the vport->fc_nodes list, an ndlp is cleaned up and
released. lpfc_dequeue_node() is called from lpfc_cleanup_node() leading to
a bad ndlp dereference in lpfc_rscn_recovery_check().

Change list_for_each_entry() to list_for_each_entry_safe() in
lpfc_rscn_recovery_check() to protect against removal of an initiator ndlp,
while walking the vport->fc_nodes list.

Link: https://lore.kernel.org/r/20220506035519.50908-7-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4a0f4aff

scsi: lpfc: Fix dmabuf ptr assignment in lpfc_ct_reject_event() · 596fc8ad

James Smart authored May 05, 2022

Upon driver receipt of a CT cmd for type = 0xFA (Management Server) and
subtype = 0x11 (Fabric Device Management Interface), the driver is
responding with garbage CT cmd data when it should send a properly formed
RJT.

The __lpfc_prep_xmit_seq64_s4() routine was using the wrong buffer for the
reject.

Fix by converting the routine to use the buffer specified in the bde within
the wqe rather than the ill-set bmp element.

Link: https://lore.kernel.org/r/20220506035519.50908-6-jsmart2021@gmail.com
Fixes: 61910d6a ("scsi: lpfc: SLI path split: Refactor CT paths")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

596fc8ad

scsi: lpfc: Inhibit aborts if external loopback plug is inserted · ead76d4c

James Smart authored May 05, 2022

After running a short external loopback test, when the external loopback is
removed and a normal cable inserted that is directly connected to a target
device, the system oops in the llpfc_set_rrq_active() routine.

When the loopback was inserted an FLOGI was transmit. As we're looped back,
we receive the FLOGI request. The FLOGI is ABTS'd as we recognize the same
wppn thus understand it's a loopback. However, as the ABTS sends address
information the port is not set to (fffffe), the ABTS is dropped on the
wire. A short 1 frame loopback test is run and completes before the ABTS
times out. The looback is unplugged and the new cable plugged in, and the
an FLOGI to the new device occurs and completes. Due to a mixup in ref
counting the completion of the new FLOGI releases the fabric ndlp. Then the
original ABTS completes and references the released ndlp generating the
oops.

Correct by no-op'ing the ABTS when in loopback mode (it will be dropped
anyway). Added a flag to track the mode to recognize when it should be
no-op'd.

Link: https://lore.kernel.org/r/20220506035519.50908-5-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ead76d4c

scsi: lpfc: Fix ndlp put following a LOGO completion · b7e952cb

James Smart authored May 05, 2022

During testing with repeated asynchronous resets of the target, an issue
was found when the driver issues a LOGO to disconnect its login and recover
all exchanges. The LOGO command takes a node reference but neglects to
remove it, keeping the node reference count artifically high.

Add a call to lpfc_nlp_put() to lpfc_nlp_logo_unreg() and move the mempool
free call to the routine exit along with the needed put. This is always
safe as this will not be the last reference removed as lpfc_unreg_rpi()
ensures there is an additional reference on the ndlp.

Link: https://lore.kernel.org/r/20220506035519.50908-4-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b7e952cb

scsi: lpfc: Fill in missing ndlp kref puts in error paths · ba3d58a1

James Smart authored May 05, 2022

Code review, following every lpfc_nlp_get() call vs calls during error
handling, discovered cases of missing put calls.

Correct by adding ndlp kref puts in the respective error paths.

Also added comments to several of the error paths to record relationships
to reference counts.

Link: https://lore.kernel.org/r/20220506035519.50908-3-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ba3d58a1

scsi: lpfc: Fix element offset in __lpfc_sli_release_iocbq_s4() · 84c6f99e

James Smart authored May 05, 2022

The prior commit that moved from iocb elements to explicit wqe elements
missed a name change.

Correct __lpfc_sli_release_iocbq_s4() to reference wqe rather than iocb.

Link: https://lore.kernel.org/r/20220506035519.50908-2-jsmart2021@gmail.com
Fixes: a680a929 ("scsi: lpfc: SLI path split: Refactor lpfc_iocbq")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

84c6f99e

scsi: ufs: ufshpb: Clean up ufshpb_suspend()/resume() · 18ebe239

Bean Huo authored May 05, 2022

ufshpb_resume() is only called when the HPB state is HPB_SUSPEND, so the
check statement for "ufshpb_get_state(hpb) != HPB_PRESENT" is useless.

Link: https://lore.kernel.org/r/20220505134707.35929-7-huobean@gmail.comReviewed-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

18ebe239

scsi: ufs: ufshpb: Add handing of device reset regions in HPB device mode · 32d6eab3

Bean Huo authored May 05, 2022

In UFS HPB Spec JESD220-3A,

"5.8. Active and inactive information upon power cycle
...
When the device is powered off by the host, the device may restore L2P map
data upon power up or build from the host's HPB READ command. In case
device powered up and lost HPB information, device can signal to the host
through HPB Sense data, by setting HPB Operation as '2' which will inform
the host that device reset HPB information."

Therefore, for HPB device control mode, if the UFS device is reset via the
RST_N pin, the active region information in the device will be reset. If
the host side receives this notification from the device side, it is
recommended to inactivate all active regions in the host's HPB cache.

Link: https://lore.kernel.org/r/20220505134707.35929-6-huobean@gmail.comReviewed-by: Keoseong Park <keosung.park@samsung.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

32d6eab3