Commits · b4efbec4c2a75b619fae4e8768be379e88c78687 · Kirill Smelkov / linux

26 Apr, 2022 3 commits

scsi: mpt3sas: Fix writel() use · b4efbec4

Damien Le Moal authored Mar 08, 2022

writel() internally executes cpu_to_le32() to convert the value being
written to little endian. The caller should thus not use this conversion
function for the value passed to writel(). Remove the cpu_to_le32() calls
in _base_put_smid_scsi_io_atomic(), _base_put_smid_fast_path_atomic(),
_base_put_smid_hi_priority_atomic() _base_put_smid_default_atomic() and
_base_handshake_req_reply_wait().

Link: https://lore.kernel.org/r/20220307234854.148145-3-damien.lemoal@opensource.wdc.comSigned-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b4efbec4

scsi: mpt3sas: Fix _ctl_set_task_mid() TaskMID check · dceaef94

Damien Le Moal authored Mar 08, 2022

The TaskMID field of struct Mpi2SCSITaskManagementRequest_t is a 16-bit
little endian value. Fix the search loop in _ctl_set_task_mid() to add a
cpu_to_le16() conversion before checking the value of TaskMID to avoid
sparse warnings. While at it, simplify the search loop code to remove an
unnecessarily complicated if condition.

Link: https://lore.kernel.org/r/20220307234854.148145-2-damien.lemoal@opensource.wdc.comSigned-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dceaef94

scsi: qla2xxx: Remove free_sg command flag · ad14649f

Gleb Chesnokov authored Apr 15, 2022

The use of the free_sg command flag was dropped in commit 2c39b5ca
("qla2xxx: Remove SRR code"). Hence remove this flag and its check.

Link: https://lore.kernel.org/r/AS8PR10MB4952747D20B76DC8FE793CCA9DEE9@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COMReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Gleb Chesnokov <Chesnokov.G@raidix.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ad14649f

19 Apr, 2022 31 commits

scsi: core: Increase max device queue_depth to 4096 · f9bdac31

Sumit Saxena authored Apr 14, 2022

The maximum SCSI device queue depth of 1024 is not sufficient for RAID
volumes configured behind Broadcom RAID controllers.  For a 16-drive RAID
volume with a device queue depth limit of 1024, only 64 I/Os (1024/16) can
be issued per drive. That is not sufficient to saturate the device.

Link: https://lore.kernel.org/r/20220414103601.140687-1-sumit.saxena@broadcom.com
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f9bdac31

scsi: fcoe: Simplify if-if to if-else · 65db22e5

Yihao Han authored Apr 08, 2022

Replace 'if (!is_zero_ether_addr(mac))' with 'else' for simplification and
add curly brackets according to the kernel coding style:

"Do not unnecessarily use braces where a single statement will do."

...

"This does not apply if only one branch of a conditional statement is a
single statement; in the latter case use braces in both branches"

Please refer to:
https://www.kernel.org/doc/html/v5.17-rc8/process/coding-style.html

Link: https://lore.kernel.org/r/20220408081237.14037-1-hanyihao@vivo.comSigned-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

65db22e5

scsi: pmcraid: Remove unneeded semicolon · 21a023ce

Jiapeng Chong authored Apr 01, 2022

Fix the following coccicheck warning:

./drivers/scsi/pmcraid.c:4593:2-3: Unneeded semicolon.

Link: https://lore.kernel.org/r/20220401030640.28246-1-jiapeng.chong@linux.alibaba.comReported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

21a023ce

scsi: lpfc: Copyright updates for 14.2.0.2 patches · 66c20a97

James Smart authored Apr 12, 2022

Update copyrights to 2022 for files modified in the 14.2.0.2 patch set.

Link: https://lore.kernel.org/r/20220412222008.126521-27-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

66c20a97

scsi: lpfc: Update lpfc version to 14.2.0.2 · 4af4d0e2

James Smart authored Apr 12, 2022

Update lpfc version to 14.2.0.2.

Link: https://lore.kernel.org/r/20220412222008.126521-26-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4af4d0e2

scsi: lpfc: Expand setting ELS_ID field in ELS_REQUEST64_WQE · fd4a0c6d

James Smart authored Apr 12, 2022

ELS_ID field for ELS_REQUEST64_WQE is not filled out when FIP is not
supported by the HBA.

Move setting ELS_ID logic into __lpfc_sli_prep_els_req_rsp_s4(), and remove
ELS_ID FIP dependency logic from lpfc_sli_prep_wqe().

Introduce PLOGI ELS_ID and as a result update wqe_els_id_MASK because PLOGI
ELS_ID = 0x4 occupies up to 3 bits.

While in __lpfc_sli_prep_els_req_rsp_s4() routine, remove SLI3-isms.

Link: https://lore.kernel.org/r/20220412222008.126521-25-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fd4a0c6d

scsi: lpfc: Update stat accounting for READ_STATUS mbox command · f4fbf4ac

James Smart authored Apr 12, 2022

READ_STATUS tx/rx byte count fields are now expanded to 64 bit wide
counters. This patch updates logic for the READ_STATUS mbox command when
displaying tx_word and rx_word statistics in sysfs.

Link: https://lore.kernel.org/r/20220412222008.126521-24-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f4fbf4ac

scsi: lpfc: Change FA-PWWN detection methodology · 1b6f71f7

James Smart authored Apr 12, 2022

Do not rely on vendor version field of the CSPs to determine if we are in a
FA-PWWN environment. Instead, use the following procedure:

First, during HBA initialization, driver does a READ_CONFIG to determine if
FA-PWWN is configured on the HBA. A LPFC_FAWWPN_CONFIG hba_flag is set
accordingly.

Next, when the link comes up before the driver gets a link up event, the
firmware logs into the fabric with FA-PWWN. If the fabric port does not
support FA-PWWN, the driver will get a Misconfigured FA-WWN async event
before the link up. A LPFC_FAWWPN_FABRIC hba_flag will be set accordingly.

Finally, if the fabric supports FA-PWWN, the firmware will replace its CSPs
WWN with the Fabric Assigned ones. Then after link up, the driver will
retrieve the Fabric Assigned WWN when it does a READ_SPARAM mbox command.

Link: https://lore.kernel.org/r/20220412222008.126521-23-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1b6f71f7

scsi: lpfc: Refactor cleanup of mailbox commands · ef47575f

James Smart authored Apr 12, 2022

The intention of this patch is to refactor mailbox memory allocation and
cleanup steps in one routine respectively to prevent memory leaks or memory
errors related to mailbox commands. There are trivial localized fixes as
well.

Provide lpfc_mbox_rsrc_prep() - this routine allocates the dmabuf and the
mbuf associated with it. It also catches allocation errors and returns
status.

Provide lpfc_mbox_rsrc_cleanup() - this routine verifies a dmabuf exists
and if so releases the associated mbuf and the dmabuf memory. It then sets
the ctx_buf to NULL and releases the mailbox memory to the mailbox pool.

Link: https://lore.kernel.org/r/20220412222008.126521-22-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ef47575f

scsi: lpfc: Fix field overload in lpfc_iocbq data structure · d51cf5bd

James Smart authored Apr 12, 2022

The lpfc_iocbq data structure has void * pointers that are overloaded to be
as many as 8 different data types and the driver translates the void * by
casting.  This patch removes the void * pointers by declaring the specific
types needed by the driver.  It also expands the context_un to include more
seldom used pointer types to save structure bytes.  It also groups the u8
types together to pack the 8 bytes needed.  This work allows the lpfc_iocbq
data structure to be more strongly typed and keeps it from being allocated
from the 512 byte slab.

[mkp: rolled in zeroday fix]

Link: https://lore.kernel.org/r/20220412222008.126521-21-jsmart2021@gmail.comReported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d51cf5bd

scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion · 1045592f

James Smart authored Apr 12, 2022

During an NVMe target reboot, the target may initialize itself as FCP only
during the first RSCN and shortly after trigger a second RSCN claiming NVMe
support. The timing of these RSCNs occur before FCP-PRLI for the first
RSCN completes leading discovery issues over NVMe.

Change RSCN and NVME-PRLI send logic based on a new FC_RSCN_MEMENTO flag
that signals when lpfc_end_rscn() is completed and serves as a memento that
discovery was started from RSCN.

Link: https://lore.kernel.org/r/20220412222008.126521-20-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1045592f

scsi: lpfc: Register for Application Services FC-4 type in Fabric topology · 6c983d32

James Smart authored Apr 12, 2022

Add new FC-4 type 0x60 Application Services for fabric registration when
VMID is enabled.

Modified rft struture to indicate __be format. Removed redundant ipReg
variable as it was not used.

Link: https://lore.kernel.org/r/20220412222008.126521-19-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6c983d32

scsi: lpfc: Remove false FDMI NVMe FC-4 support for NPIV ports · 6c8a3ce6

James Smart authored Apr 12, 2022

FDMI FC-4 Active Type for vports mistakenly shows NVMe support.

Add a check to only set the NVMe support bit for the physical port.

Link: https://lore.kernel.org/r/20220412222008.126521-18-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6c8a3ce6

scsi: lpfc: Revise FDMI reporting of supported port speed for trunk groups · c364c453

James Smart authored Apr 12, 2022

Trunk port FDMI supported port speed shows single port supported speed
rather than the trunked port speed.

Modify supported port speed logic calculation during registration.

Link: https://lore.kernel.org/r/20220412222008.126521-17-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c364c453

scsi: lpfc: Fix call trace observed during I/O with CMF enabled · d6d45f67

James Smart authored Apr 12, 2022

The following was seen with CMF enabled:

BUG: using smp_processor_id() in preemptible
code: systemd-udevd/31711
kernel: caller is lpfc_update_cmf_cmd+0x214/0x420  [lpfc]
kernel: CPU: 12 PID: 31711 Comm: systemd-udevd
kernel: Call Trace:
kernel: <TASK>
kernel: dump_stack_lvl+0x44/0x57
kernel: check_preemption_disabled+0xbf/0xe0
kernel: lpfc_update_cmf_cmd+0x214/0x420 [lpfc]
kernel: lpfc_nvme_fcp_io_submit+0x23b4/0x4df0 [lpfc]

this_cpu_ptr() calls smp_processor_id() in a preemptible context.

Fix by using per_cpu_ptr() with raw_smp_processor_id() instead.

Link: https://lore.kernel.org/r/20220412222008.126521-16-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d6d45f67

scsi: lpfc: Correct CRC32 calculation for congestion stats · 5295d19d

James Smart authored Apr 12, 2022

lpfc_cgn_calc_crc32() is returning 32 bits, and lpfc_cgn_update_stat() was
using u16 to store the crc32 value. Correct by redeclaring the local
variable to u32.

Link: https://lore.kernel.org/r/20220412222008.126521-15-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5295d19d

scsi: lpfc: Move MI module parameter check to handle dynamic disable · 39a1a86b

James Smart authored Apr 12, 2022

lpfc_refresh_params() can be called for an async event handler. This could
potentially override the value initialized by lpfc_cmf_setup().

Move module parameter check to lpfc_refresh_params().

Link: https://lore.kernel.org/r/20220412222008.126521-14-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

39a1a86b

scsi: lpfc: Remove unnecessary NULL pointer assignment for ELS_RDF path · d531d987

James Smart authored Apr 12, 2022

The command IOCB ndlp pointer is overwritten in lpfc_issue_els_rdf(), and
the original ndlp pointer is stored ahead of time.

This null ptr assignment can be safely removed.

Link: https://lore.kernel.org/r/20220412222008.126521-13-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d531d987

scsi: lpfc: Transition to NPR state upon LOGO cmpl if link down or aborted · 76395c88

James Smart authored Apr 12, 2022

In P2P topology, a target controller reboot sometimes results in not
reestablishing a login because the ndlp is stuck in LOGO state.

Fix by transitioning to NPR state if we get link down before LOGO
completes.

Link: https://lore.kernel.org/r/20220412222008.126521-12-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

76395c88

scsi: lpfc: Update fc_prli_sent outstanding only after guaranteed IOCB submit · 31e88786

James Smart authored Apr 12, 2022

If lpfc_sli_issue_iocb() fails, then the fc_prli_sent is never decremented.

Move the fc_prli_sent++ to after a guaranteed IOCB submit.

Link: https://lore.kernel.org/r/20220412222008.126521-11-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

31e88786

scsi: lpfc: Protect memory leak for NPIV ports sending PLOGI_RJT · 672d1cb4

James Smart authored Apr 12, 2022

There is a potential memory leak in lpfc_ignore_els_cmpl() and
lpfc_els_rsp_reject() that was allocated from NPIV PLOGI_RJT
(lpfc_rcv_plogi()'s login_mbox).

Check if cmdiocb->context_un.mbox was allocated in lpfc_ignore_els_cmpl(),
and then free it back to phba->mbox_mem_pool along with mbox->ctx_buf for
service parameters.

For lpfc_els_rsp_reject() failure, free both the ctx_buf for service
parameters and the login_mbox.

Link: https://lore.kernel.org/r/20220412222008.126521-10-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

672d1cb4

scsi: lpfc: Fix null pointer dereference after failing to issue FLOGI and PLOGI · 577a942d

James Smart authored Apr 12, 2022

If lpfc_issue_els_flogi() fails and returns non-zero status, the node
reference count is decremented to trigger the release of the nodelist
structure. However, if there is a prior registration or dev-loss-evt work
pending, the node may be released prematurely. When dev-loss-evt
completes, the released node is referenced causing a use-after-free null
pointer dereference.

Similarly, when processing non-zero ELS PLOGI completion status in
lpfc_cmpl_els_plogi(), the ndlp flags are checked for a transport
registration before triggering node removal. If dev-loss-evt work is
pending, the node may be released prematurely and a subsequent call to
lpfc_dev_loss_tmo_handler() results in a use after free ndlp dereference.

Add test for pending dev-loss before decrementing the node reference count
for FLOGI, PLOGI, PRLI, and ADISC handling.

Link: https://lore.kernel.org/r/20220412222008.126521-9-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

577a942d

scsi: lpfc: Clear fabric topology flag before initiating a new FLOGI · 3483a44b

James Smart authored Apr 12, 2022

Previous topologies may no longer be in fabric mode, so clear FC_FABRIC in
fc_flag for every new FLOGI.

Link: https://lore.kernel.org/r/20220412222008.126521-8-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3483a44b

scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock · 03cbbd7c

James Smart authored Apr 12, 2022

During stress I/O tests with 500+ vports, hard LOCKUP call traces are
observed.

CPU A:
 native_queued_spin_lock_slowpath+0x192
 _raw_spin_lock_irqsave+0x32
 lpfc_handle_fcp_err+0x4c6
 lpfc_fcp_io_cmd_wqe_cmpl+0x964
 lpfc_sli4_fp_handle_cqe+0x266
 __lpfc_sli4_process_cq+0x105
 __lpfc_sli4_hba_process_cq+0x3c
 lpfc_cq_poll_hdler+0x16
 irq_poll_softirq+0x76
 __softirqentry_text_start+0xe4
 irq_exit+0xf7
 do_IRQ+0x7f

CPU B:
 native_queued_spin_lock_slowpath+0x5b
 _raw_spin_lock+0x1c
 lpfc_abort_handler+0x13e
 scmd_eh_abort_handler+0x85
 process_one_work+0x1a7
 worker_thread+0x30
 kthread+0x112
 ret_from_fork+0x1f

Diagram of lockup:

CPUA                            CPUB
----                            ----
lpfc_cmd->buf_lock
                            phba->hbalock
                            lpfc_cmd->buf_lock
phba->hbalock

Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock in
lpfc_abort_handler routine so that it tries to take the lpfc_cmd->buf_lock
first before phba->hbalock.

Link: https://lore.kernel.org/r/20220412222008.126521-7-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

03cbbd7c

scsi: lpfc: Requeue SCSI I/O to upper layer when fw reports link down · b6474465

James Smart authored Apr 12, 2022

During heavy I/O stress tests with 100+ vports and cable pulls, it may take
a while before the vport logs back into the fabric to resume I/O.

Currently, the driver immediately fails the I/O with DID_ERROR.

Change behavior to return DID_REQUEUE, and rely on SCSI layer's max retry
of 5 before erroring out the I/O.

Link: https://lore.kernel.org/r/20220412222008.126521-6-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b6474465

scsi: lpfc: Zero SLI4 fcp_cmnd buffer's fcpCntl0 field · 787d0580

James Smart authored Apr 12, 2022

It's possible that the fcpCntl0 reserved field is allocated non-zero.

For certain target storage arrays this could cause problems expecting
reserved fields to be all zero.

SLI3 path already allocates fcp_cmnd buffer with dma_pool_zalloc() in
lpfc_new_scsi_buf_s3. The fcpCntl0 field itself is never proactively set
throughout the SCSI I/O path. Thus, we only change the SLI4 fcp_cmnd
buffer allocation to dma_pool_zalloc.

Link: https://lore.kernel.org/r/20220412222008.126521-5-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

787d0580

scsi: lpfc: Fix diagnostic fw logging after a function reset · a6de9a2f

James Smart authored Apr 12, 2022

The lpfc_sli4_ras_setup() routine is only called from the
lpfc_pci_probe_one_s4() routine, which means diagnostic fw logging
initialization only occurs during probing.

Thus, any path involving a reset of the HBA that restarts the state of the
SLI port does not reinitialize diagnostic fw logging.

Move lpfc_sli4_ras_setup() into lpfc_sli4_hba_setup() so that the
LOWLEVEL_SET_DIAG_LOG_OPTIONS mailbox command can be sent after a function
reset.

Link: https://lore.kernel.org/r/20220412222008.126521-4-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a6de9a2f

scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() · e294647b

James Smart authored Apr 12, 2022

In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard
lockup call trace hangs the system.

Call Trace:
 _raw_spin_lock_irqsave+0x32/0x40
 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
 lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
 lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
 lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
 lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
 lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
 lpfc_do_work+0x1485/0x1d70 [lpfc]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x40
Kernel panic - not syncing: Hard LOCKUP

The same CPU tries to claim the phba->port_list_lock twice.

Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and
lpfc_printf_log() macros before calling lpfc_dmp_dbg().  There is no need
to take the phba->port_list_lock within lpfc_dmp_dbg().

Link: https://lore.kernel.org/r/20220412222008.126521-3-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e294647b

scsi: lpfc: Tweak message log categories for ELS/FDMI/NVMe rescan · b83a8c21

James Smart authored Apr 12, 2022

Several log message categories were updated:

 - Enable msg 4623 (Xmit of ECD) to display for ELS logging.

 - Change msg 0220 (FDMI cmd failed) to display for ELS logging.

 - Change msg 6460 (FDMI RPA failure) to be warning not hard error.

 - Change msg 6172 (NVME rescan of DID) to be logged under NVMe discovery.

Link: https://lore.kernel.org/r/20220412222008.126521-2-jsmart2021@gmail.comCo-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b83a8c21

Merge branch '5.18/scsi-fixes' into 5.19/scsi-staging · 08c84a75

Martin K. Petersen authored Apr 18, 2022

Pull in 5.18 fixes branch which contains a bunch of fixes required for
the lpfc driver update.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

08c84a75

scsi: ufs: core: Remove redundant HPB unmap · 25a0bf21

Po-Wen Kao authored Apr 12, 2022

Since the HPB mapping is already reset in ufshpb_init() by setting flag
QUERY_FLAG_IDN_HPB_RESET, there is no need doing so again in
ufshpb_hpb_lu_prepared().

This also resolves the issue where HPB WRITE BUFFER is issued before UAC is
cleared.

Link: https://lore.kernel.org/r/20220412073131.10644-1-powen.kao@mediatek.comAcked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

25a0bf21

12 Apr, 2022 6 commits

scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainer · 70a3baee

Mike Christie authored Apr 07, 2022

I've been doing a lot of iscsi patches because Oracle is paying me to work
on iSCSI again. It was supposed to be temp assignment, but my co-worker
that was working on iscsi moved to a new group so it looks like I'm back on
this code again. After talking to Chris and Lee this patch adds me back as
co-maintainer, so I can help them and people remember to cc me on issues.

Link: https://lore.kernel.org/r/20220408001314.5014-11-michael.christie@oracle.comTested-by: Manish Rangankar <mrangankar@marvell.com>
Acked-by: Lee Duncan <lduncan@suse.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

70a3baee

scsi: qedi: Fix failed disconnect handling · 857b0652

Mike Christie authored Apr 07, 2022

We set the qedi_ep state to EP_STATE_OFLDCONN_START when the ep is
created. Then in qedi_set_path we kick off the offload work. If userspace
times out the connection and calls ep_disconnect, qedi will only flush the
offload work if the qedi_ep state has transitioned away from
EP_STATE_OFLDCONN_START. If we can't connect we will not have transitioned
state and will leave the offload work running, and we will free the qedi_ep
from under it.

This patch just has us init the work when we create the ep, then always
flush it.

Link: https://lore.kernel.org/r/20220408001314.5014-10-michael.christie@oracle.comTested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

857b0652

scsi: iscsi: Fix NOP handling during conn recovery · 44ac9710

Mike Christie authored Apr 07, 2022

If a offload driver doesn't use the xmit workqueue, then when we are doing
ep_disconnect libiscsi can still inject PDUs to the driver. This adds a
check for if the connection is bound before trying to inject PDUs.

Link: https://lore.kernel.org/r/20220408001314.5014-9-michael.christie@oracle.comTested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44ac9710

scsi: iscsi: Merge suspend fields · 5bd85625

Mike Christie authored Apr 07, 2022

Move the tx and rx suspend fields into one flags field.

Link: https://lore.kernel.org/r/20220408001314.5014-8-michael.christie@oracle.comTested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5bd85625

scsi: iscsi: Fix unbound endpoint error handling · 03690d81

Mike Christie authored Apr 07, 2022

If a driver raises a connection error before the connection is bound, we
can leave a cleanup_work queued that can later run and disconnect/stop a
connection that is logged in. The problem is that drivers can call
iscsi_conn_error_event for endpoints that are connected but not yet bound
when something like the network port they are using is brought down.
iscsi_cleanup_conn_work_fn will check for this and exit early, but if the
cleanup_work is stuck behind other works, it might not get run until after
userspace has done ep_disconnect. Because the endpoint is not yet bound
there was no way for ep_disconnect to flush the work.

The bug of leaving stop_conns queued was added in:

Commit 23d6fefb ("scsi: iscsi: Fix in-kernel conn failure handling")

and:

Commit 0ab71045 ("scsi: iscsi: Perform connection failure entirely in
kernel space")

was supposed to fix it, but left this case.

This patch moves the conn state check to before we even queue the work so
we can avoid queueing.

Link: https://lore.kernel.org/r/20220408001314.5014-7-michael.christie@oracle.com
Fixes: 0ab71045 ("scsi: iscsi: Perform connection failure entirely in kernel space")
Tested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

03690d81

scsi: iscsi: Fix conn cleanup and stop race during iscsid restart · 7c6e99c1

Mike Christie authored Apr 07, 2022

If iscsid is doing a stop_conn at the same time the kernel is starting
error recovery we can hit a race that allows the cleanup work to run on a
valid connection. In the race, iscsi_if_stop_conn sees the cleanup bit set,
but it calls flush_work on the clean_work before iscsi_conn_error_event has
queued it. The flush then returns before the queueing and so the
cleanup_work can run later and disconnect/stop a conn while it's in a
connected state.

The patch:

Commit 0ab71045 ("scsi: iscsi: Perform connection failure entirely in
kernel space")

added the late stop_conn call bug originally, and the patch:

Commit 23d6fefb ("scsi: iscsi: Fix in-kernel conn failure handling")

attempted to fix it but only fixed the normal EH case and left the above
race for the iscsid restart case. For the normal EH case we don't hit the
race because we only signal userspace to start recovery after we have done
the queueing, so the flush will always catch the queued work or see it
completed.

For iscsid restart cases like boot, we can hit the race because iscsid will
call down to the kernel before the kernel has signaled any error, so both
code paths can be running at the same time. This adds a lock around the
setting of the cleanup bit and queueing so they happen together.

Link: https://lore.kernel.org/r/20220408001314.5014-6-michael.christie@oracle.com
Fixes: 0ab71045 ("scsi: iscsi: Perform connection failure entirely in kernel space")
Tested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7c6e99c1