Commits · 7a8ff7d9854a1727435557184c8255bbbca60920 · Kirill Smelkov / linux

24 Aug, 2021 10 commits

scsi: qla2xxx: Fix NVMe session down detection · 7a8ff7d9

Quinn Tran authored Aug 16, 2021

When Target port transitions personality from one to another (NVMe <-->
FCP), there could be some overlap of the two where one layer is going down
while the other layer is coming up. This overlap can cause temporary I/O
error. Detect those errors/transitions and recover from them. Triggers
session tear down and allow relogin to re-drive the connection under the
following conditions:

 - NVMe command error

 - On PRLO + N2N (rida format 2)

Link: https://lore.kernel.org/r/20210817051315.2477-11-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7a8ff7d9

scsi: qla2xxx: Fix NVMe retry · f8844457

Quinn Tran authored Aug 16, 2021

For target port that register itself as both FCP + NVMe, initiator driver
will try to login one mode at a time. If the last mode did not succeed,
then driver will try the other mode.

When error is encountered, current code only flip to other mode one time
(NVMe->FCP) and remain on the last mode.  Driver wrongly assumed target
port does not support PRLI NVMe, instead it was not ready to receive PRLI.

This patch will alternate back and forth on every PRLI failure until login
retry count has depleted or it is succeeded.

Link: https://lore.kernel.org/r/20210817051315.2477-10-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f8844457

scsi: qla2xxx: Fix hang on NVMe command timeouts · 2cabf10d

Arun Easi authored Aug 16, 2021

The abort callback gets called only when it gets posted to firmware. The
refcounting is done properly in the callback. On internal errors, the
callback is not invoked leading to a hung I/O. Fix this by having separate
error code when command gets returned from firmware.

Link: https://lore.kernel.org/r/20210817051315.2477-9-njavali@marvell.comSigned-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2cabf10d

scsi: qla2xxx: Fix NVMe | FCP personality change · f6e327fc

Quinn Tran authored Aug 16, 2021

Currently driver saves the personality type (FCP|NVMe) at the start of
first discovery of the remote device. If the remote device personality do
change over time, then qla driver needs to present that to user to decide.

Link: https://lore.kernel.org/r/20210817051315.2477-8-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f6e327fc

scsi: qla2xxx: edif: Do secure PLOGI when auth app is present · 1dc64a36

Quinn Tran authored Aug 16, 2021

For initiator mode, always do secure login when authentication app started.
Also remove redundant flags to indicate secure connection.

Link: https://lore.kernel.org/r/20210817051315.2477-7-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1dc64a36

scsi: qla2xxx: edif: Add N2N support for EDIF · 4de067e5

Quinn Tran authored Aug 16, 2021

For EDIF + N2N to work, firmware 9.8 or later is required. The driver will
pause after PLOGI to allow app to authenticate. Once authentication
completes, app will tell driver to do PRLI.

Link: https://lore.kernel.org/r/20210817051315.2477-6-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4de067e5

scsi: qla2xxx: Fix hang during NVMe session tear down · 310e69ed

Arun Easi authored Aug 16, 2021

The following hung task call trace was seen:

    [ 1230.183294] INFO: task qla2xxx_wq:523 blocked for more than 120 seconds.
    [ 1230.197749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1230.205585] qla2xxx_wq      D    0   523      2 0x80004000
    [ 1230.205636] Workqueue: qla2xxx_wq qlt_free_session_done [qla2xxx]
    [ 1230.205639] Call Trace:
    [ 1230.208100]  __schedule+0x2c4/0x700
    [ 1230.211607]  schedule+0x38/0xa0
    [ 1230.214769]  schedule_timeout+0x246/0x2f0
    [ 1230.222651]  wait_for_completion+0x97/0x100
    [ 1230.226921]  qlt_free_session_done+0x6a0/0x6f0 [qla2xxx]
    [ 1230.232254]  process_one_work+0x1a7/0x360

...when device side port resets were done.

Abort threads were getting out without processing due to the "deleted"
flag check. The delete thread, meanwhile, could not proceed with a
logout (that would have cleared out pending requests) as the logout IOCB
work was not progressing. It appears like the hung qlt_free_session_done()
thread is causing the ha->wq works on hold. The qlt_free_session_done()
was hung waiting for nvme_fc_unregister_remoteport() + localport_delete cb
to be complete, which would only happen when all I/Os are released.

Fix this by allowing abort to progress until device delete is completely
done. This should make the qlt_free_session_done() proceed without hang and
thus clear up the deadlock.

Link: https://lore.kernel.org/r/20210817051315.2477-5-njavali@marvell.comSigned-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

310e69ed

scsi: qla2xxx: edif: Fix EDIF enable flag · d07b75ba

Quinn Tran authored Aug 16, 2021

edif_enabled is prematurely turned on if hardware is capable of handling
the feature. However, firmware also needs to support EDIF before enabling
this bit.

Link: https://lore.kernel.org/r/20210817051315.2477-4-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d07b75ba

scsi: qla2xxx: edif: Reject AUTH ELS on session down · 22547929

Quinn Tran authored Aug 16, 2021

Reject inflight AUTH ELS if driver is going through session recovery.

Link: https://lore.kernel.org/r/20210817051315.2477-3-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

22547929

scsi: qla2xxx: edif: Fix stale session · b15ce2f3

Quinn Tran authored Aug 16, 2021

When firmware indicates session has been torn down via UPDATE SA IOCB or
ELS Passthrough IOCB, the driver needs to also tear down the session.

Link: https://lore.kernel.org/r/20210817051315.2477-2-njavali@marvell.comSigned-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b15ce2f3

18 Aug, 2021 8 commits

scsi: sd: Do not exit sd_spinup_disk() quietly · 848ade90

Christian Loehle authored Aug 16, 2021

The sd_spinup_disk() function logs what is happening. Unfortunately this
output stops if the media was marked as removed in the meantime. Add a
print for this case too.

Link: https://lore.kernel.org/r/CWXP265MB26803209FD08A64222EEEA02C4FD9@CWXP265MB2680.GBRP265.PROD.OUTLOOK.COMReviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

848ade90

scsi: ibmvfc: Do not wait for initial device scan · 7a3795f2

Hannes Reinecke authored Aug 17, 2021

The initial device scan might take some time, and there really is no need
to wait for it during probe(). So return immediately from scsi_scan_host()
during probe() and avoid any udev stalls during booting.

Link: https://lore.kernel.org/r/20210817075306.11315-1-mwilck@suse.comAcked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7a3795f2

scsi: target: Fix sense key for invalid EXTENDED COPY request · 0394b504

Sergey Samoylenko authored Aug 03, 2021

TCM fails to pass the following tests in libiscsi:

  SCSI.ExtendedCopy.DescrType
  SCSI.ExtendedCopy.DescrLimits
  SCSI.ExtendedCopy.ParamHdr
  SCSI.ExtendedCopy.ValidSegDescr
  SCSI.ExtendedCopy.ValidTgtDescr

The xcopy code always returns the same NOT READY sense key for all detected
errors. Change the sense key for invalid requests to ILLEGAL REQUEST, and
for aborted transfers to COPY ABORTED.

Link: https://lore.kernel.org/r/20210803145410.80147-3-s.samoylenko@yadro.com
Fixes: d877d727 ("target: Fix a deadlock between the XCOPY code and iSCSI session shutdown")
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0394b504

scsi: target: Allows backend drivers to fail with specific sense codes · 44678553

Sergey Samoylenko authored Aug 03, 2021

Currently, backend drivers can fail I/O with SAM_STAT_CHECK_CONDITION which
gets us TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE.

Add a new helper that allows backend drivers to fail with specific sense
codes.

This is based on a patch from Mike Christie <michael.christie@oracle.com>.

Cc: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20210803145410.80147-2-s.samoylenko@yadro.comReviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44678553

scsi: smartpqi: Replace one-element array with flexible-array member · 5f492a7a

Gustavo A. R. Silva authored Aug 10, 2021

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members"[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Refactor the code a bit according to the use of a flexible-array member in
struct pqi_event_config instead of a one-element array, and use the
struct_size() helper.

This helps with the ongoing efforts to globally enable -Warray-bounds and
get us closer to being able to tighten the FORTIFY_SOURCE routines on
memcpy().

This issue was found with the help of Coccinelle and audited and fixed,
manually.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Link: https://lore.kernel.org/r/20210810210741.GA58765@embeddedorSigned-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5f492a7a

scsi: target: pscsi: Fix possible null-pointer dereference in pscsi_complete_cmd() · 0f99792c

Tuo Li authored Aug 09, 2021

The return value of transport_kmap_data_sg() is assigned to the variable
buf:

  buf = transport_kmap_data_sg(cmd);

And then it is checked:

  if (!buf) {

This indicates that buf can be NULL. However, it is dereferenced in the
following statements:

  if (!(buf[3] & 0x80))
    buf[3] |= 0x80;
  if (!(buf[2] & 0x80))
    buf[2] |= 0x80;

To fix these possible null-pointer dereferences, dereference buf and call
transport_kunmap_data_sg() only when buf is not NULL.

Link: https://lore.kernel.org/r/20210810040414.248167-1-islituo@gmail.comReported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0f99792c

scsi: core: Remove scsi_cmnd.tag · 4c7b6ea3

John Garry authored Aug 13, 2021

It is never read, so get rid of it.

Link: https://lore.kernel.org/r/1628862553-179450-4-git-send-email-john.garry@huawei.comReviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4c7b6ea3

scsi: ibmvfc: Stop using scsi_cmnd.tag · 6a036ce0

John Garry authored Aug 17, 2021

Use scsi_cmd_to_rq(scsi_cmnd)->tag in preference to scsi_cmnd.tag.

Link: https://lore.kernel.org/r/1629207817-211936-1-git-send-email-john.garry@huawei.comSigned-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6a036ce0

16 Aug, 2021 5 commits

scsi: fnic: Stop setting scsi_cmnd.tag · e0aebd25

John Garry authored Aug 13, 2021

It is never read. Setting it and the request tag seems dodgy anyway.

Link: https://lore.kernel.org/r/1628862553-179450-3-git-send-email-john.garry@huawei.comReviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e0aebd25

scsi: wd719: Stop using scsi_cmnd.tag · e2a1dc57

John Garry authored Aug 13, 2021

Use scsi_cmd_to_rq(cmd)->tag instead.

Link: https://lore.kernel.org/r/1628862553-179450-2-git-send-email-john.garry@huawei.comReviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e2a1dc57

scsi: qedf: Fix error codes in qedf_alloc_global_queues() · ccc89737

Dan Carpenter authored Aug 10, 2021

This driver has some left over "return 1" on failure style code mixed with
"return negative error codes" style code. The caller doesn't care so we
should just convert everything to return negative error codes.

Then there was a problem that there were two variables used to store error
codes which just resulted in confusion. If qedf_alloc_bdq() returned a
negative error code, we accidentally returned success instead of
propagating the error code. So get rid of the "rc" variable and use
"status" every where.

Also remove the "status = 0" initialization so that these sorts of bugs
will be detected by the compiler in the future.

Link: https://lore.kernel.org/r/20210810085023.GA23998@kili
Fixes: 61d8658b ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ccc89737

scsi: qedi: Fix error codes in qedi_alloc_global_queues() · 4dbe57d4

Dan Carpenter authored Aug 10, 2021

This function had some left over code that returned 1 on error instead
negative error codes.  Convert everything to use negative error codes.  The
caller treats all non-zero returns the same so this does not affect run
time.

A couple places set "rc" instead of "status" so those error paths ended up
returning success by mistake.  Get rid of the "rc" variable and use
"status" everywhere.

Remove the bogus "status = 0" initialization, as a future proofing measure
so the compiler will warn about uninitialized error codes.

Link: https://lore.kernel.org/r/20210810084753.GD23810@kili
Fixes: ace7f46b ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4dbe57d4

scsi: smartpqi: Fix an error code in pqi_get_raid_map() · d1f6581a

Dan Carpenter authored Aug 10, 2021

Return -EINVAL on failure instead of success.

Link: https://lore.kernel.org/r/20210810084613.GB23810@kili
Fixes: a91aaae0 ("scsi: smartpqi: allow for larger raid maps")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d1f6581a

13 Aug, 2021 1 commit

scsi: mpi3mr: Use the proper SCSI midlayer interfaces for PI · 92cc94ad

Martin K. Petersen authored Jun 08, 2021

Use the SCSI midlayer interfaces to query protection interval, reference
tag, and per-command DIX flags

Link: https://lore.kernel.org/r/20210806040023.5355-4-martin.petersen@oracle.com
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

92cc94ad

12 Aug, 2021 16 commits

scsi: qla2xxx: Update version to 10.02.06.100-k · bd19573e

Nilesh Javali authored Aug 09, 2021

Link: https://lore.kernel.org/r/20210810043720.1137-15-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bd19573e

scsi: qla2xxx: Sync queue idx with queue_pair_map idx · c8fadf01

Saurav Kashyap authored Aug 09, 2021

The first invocation of function find_first_zero_bit will return 0 and
queue_id gets set to 0.

An index of queue_pair_map also gets set to 0.

	qpair_id = find_first_zero_bit(ha->qpair_qid_map, ha->max_qpairs);

        set_bit(qpair_id, ha->qpair_qid_map);
        ha->queue_pair_map[qpair_id] = qpair;

In the alloc_queue callback driver checks the map, if queue is already
allocated:

	ha->queue_pair_map[qidx]

This works fine as long as max_qpairs is greater than nvme_max_hw_queues(8)
since the size of the queue_pair_map is equal to max_qpair. In case nr_cpus
is less than 8, max_qpairs is less than 8. This creates wrong value
returned as qpair.

[ 1572.353669] qla2xxx [0000:24:00.3]-2121:6: Returning existing qpair of 4e00000000000000 for idx=2
[ 1572.354458] general protection fault: 0000 [#1] SMP PTI
[ 1572.354461] CPU: 1 PID: 44 Comm: kworker/1:1H Kdump: loaded Tainted: G          IOE    --------- -  - 4.18.0-304.el8.x86_64 #1
[ 1572.354462] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 03/01/2013
[ 1572.354467] Workqueue: kblockd blk_mq_run_work_fn
[ 1572.354485] RIP: 0010:qla_nvme_post_cmd+0x92/0x760 [qla2xxx]
[ 1572.354486] Code: 84 24 5c 01 00 00 00 00 b8 0a 74 1e 66 83 79 48 00 0f 85 a8 03 00 00 48 8b 44 24 08 48 89 ee 4c 89 e7 8b 50 24 e8 5e 8e 00 00 <f0> 41 ff 47 04 0f ae f0 41 f6 47 24 04 74 19 f0 41 ff 4f 04 b8 f0
[ 1572.354487] RSP: 0018:ffff9c81c645fc90 EFLAGS: 00010246
[ 1572.354489] RAX: 0000000000000001 RBX: ffff8ea3e5070138 RCX: 0000000000000001
[ 1572.354490] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8ea4c866b800
[ 1572.354491] RBP: ffff8ea4c866b800 R08: 0000000000005010 R09: ffff8ea4c866b800
[ 1572.354492] R10: 0000000000000001 R11: 000000069d1ca3ff R12: ffff8ea4bc460000
[ 1572.354493] R13: ffff8ea3e50702b0 R14: ffff8ea4c4c16a58 R15: 4e00000000000000
[ 1572.354494] FS:  0000000000000000(0000) GS:ffff8ea4dfd00000(0000) knlGS:0000000000000000
[ 1572.354495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1572.354496] CR2: 000055884504fa58 CR3: 00000005a1410001 CR4: 00000000000606e0
[ 1572.354497] Call Trace:
[ 1572.354503]  ? check_preempt_curr+0x62/0x90
[ 1572.354506]  ? dma_direct_map_sg+0x72/0x1f0
[ 1572.354509]  ? nvme_fc_start_fcp_op.part.32+0x175/0x460 [nvme_fc]
[ 1572.354511]  ? blk_mq_dispatch_rq_list+0x11c/0x730
[ 1572.354515]  ? __switch_to_asm+0x35/0x70
[ 1572.354516]  ? __switch_to_asm+0x41/0x70
[ 1572.354518]  ? __switch_to_asm+0x35/0x70
[ 1572.354519]  ? __switch_to_asm+0x41/0x70
[ 1572.354521]  ? __switch_to_asm+0x35/0x70
[ 1572.354522]  ? __switch_to_asm+0x41/0x70
[ 1572.354523]  ? __switch_to_asm+0x35/0x70
[ 1572.354525]  ? entry_SYSCALL_64_after_hwframe+0xb9/0xca
[ 1572.354527]  ? __switch_to_asm+0x41/0x70
[ 1572.354529]  ? __blk_mq_sched_dispatch_requests+0xc6/0x170
[ 1572.354531]  ? blk_mq_sched_dispatch_requests+0x30/0x60
[ 1572.354532]  ? __blk_mq_run_hw_queue+0x51/0xd0
[ 1572.354535]  ? process_one_work+0x1a7/0x360
[ 1572.354537]  ? create_worker+0x1a0/0x1a0
[ 1572.354538]  ? worker_thread+0x30/0x390
[ 1572.354540]  ? create_worker+0x1a0/0x1a0
[ 1572.354541]  ? kthread+0x116/0x130
[ 1572.354543]  ? kthread_flush_work_fn+0x10/0x10
[ 1572.354545]  ? ret_from_fork+0x35/0x40

Fix is to use index 0 for admin and first IO queue.

Link: https://lore.kernel.org/r/20210810043720.1137-14-njavali@marvell.com
Fixes: e84067d7 ("scsi: qla2xxx: Add FC-NVMe F/W initialization and transport registration")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c8fadf01

scsi: qla2xxx: Changes to support kdump kernel for NVMe BFS · 4a0a542f

Saurav Kashyap authored Aug 09, 2021

The MSI-X and MSI calls fails in kdump kernel. Because of this
qla2xxx_create_qpair() fails leading to .create_queue callback failure.
The fix is to return existing qpair instead of allocating new one and
allocate a single hw queue.

[   19.975838] qla2xxx [0000:d8:00.1]-00c7:11: MSI-X: Failed to enable support,
giving   up -- 16/-28.
[   19.984885] qla2xxx [0000:d8:00.1]-0037:11: Falling back-to MSI mode --
ret=-28.
[   19.992278] qla2xxx [0000:d8:00.1]-0039:11: Falling back-to INTa mode --
ret=-28.
..
..
..
[   21.141518] qla2xxx [0000:d8:00.0]-2104:2: qla_nvme_alloc_queue: handle
00000000e7ee499d, idx =1, qsize 32
[   21.151166] qla2xxx [0000:d8:00.0]-0181:2: FW/Driver is not multi-queue capable.
[   21.158558] qla2xxx [0000:d8:00.0]-2122:2: Failed to allocate qpair
[   21.164824] nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (-22)
[   21.171612] nvme nvme0: NVME-FC{0}: Reconnect attempt in 2 seconds

Link: https://lore.kernel.org/r/20210810043720.1137-13-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4a0a542f

scsi: qla2xxx: Changes to support kdump kernel · 62e0dec5

Saurav Kashyap authored Aug 09, 2021

Avoid allocating firmware dump and only allocate a single queue for a kexec
kernel.

Link: https://lore.kernel.org/r/20210810043720.1137-12-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

62e0dec5

scsi: qla2xxx: Suppress unnecessary log messages during login · a5741427

Arun Easi authored Aug 09, 2021

Suppress logging of retryable errors. These can still be seen if extended
logging is enabled.

Link: https://lore.kernel.org/r/20210810043720.1137-11-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a5741427

scsi: qla2xxx: Fix NPIV create erroneous error · a5721444

Quinn Tran authored Aug 09, 2021

When user creates multiple NPIVs, the switch capabilities field is checked
before a vport is allowed to be created. This field is being toggled if a
switch scan is in progress. This creates erroneous reject of vport create.

Link: https://lore.kernel.org/r/20210810043720.1137-10-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a5721444

scsi: qla2xxx: Fix unsafe removal from linked list · 0c9a5f3e

Quinn Tran authored Aug 09, 2021

On NPIV delete, the VPort is taken off a linked list in an unsafe manner.
The check for VPort refcount should be done behind lock before taking off
the element.

[ 2733.016907] general protection fault: 0000 [#1] SMP NOPTI
[ 2733.016908] qla2xxx [0000:22:00.1]-7088:27: VP[4] deleted.
[ 2733.016912] CPU: 22 PID: 23481 Comm: qla2xxx_15_dpc Kdump: loaded Tainted:
G           OE KX    5.3.18-47-default #1 SLE15-SP3
[ 2733.016914] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.1.4 02/17/2021
[ 2733.016929] RIP: 0010:qla2x00_abort_isp+0x90/0x850 [qla2xxx]
[ 2733.016933] RSP: 0018:ffffb9cfc91efe98 EFLAGS: 00010087
[ 2733.016935] RAX: 0000000000000292 RBX: dead000000000100 RCX: 0000000000000000
[ 2733.016936] RDX: 0000000000000001 RSI: ffff944bfeb99558 RDI: ffff944bfc4b4488
[ 2733.016937] RBP: ffff944bfc4b2868 R08: 00000000000187a2 R09: 0000000000000001
[ 2733.016937] R10: ffffb9cfc91efcc8 R11: 0000000000000001 R12: ffff944bfc4b4000
[ 2733.016938] R13: ffff944bfc4b4870 R14: ffff944bfc4b4488 R15: ffff944bda895c80
[ 2733.016939] FS:  0000000000000000(0000) GS:ffff944bfeb80000(0000) knlGS:0000000000000000
[ 2733.016940] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2733.016940] CR2: 00007fc173e74458 CR3: 0000001ff57de000 CR4: 0000000000350ee0
[ 2733.016941] Call Trace:
[ 2733.016951]   qla2xxx_pci_error_detected+0x190/0x190 [qla2xxx]
[ 2733.016957]   qla2x00_do_dpc+0x560/0xa10 [qla2xxx]
[ 2733.016962]   kthread+0x10d/0x130
[ 2733.016963]   kthread_park+0xa0/0xa0
[ 2733.016966]   ret_from_fork+0x22/0x40

Link: https://lore.kernel.org/r/20210810043720.1137-9-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0c9a5f3e

scsi: qla2xxx: Fix port type info · 01c97f2d

Quinn Tran authored Aug 09, 2021

Over time, fcport->port_type became a flag field. The flags within this
field were not defined properly. This caused external tools to read wrong
info.

Link: https://lore.kernel.org/r/20210810043720.1137-8-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

01c97f2d

scsi: qla2xxx: Add debug print of 64G link speed · 85818882

Quinn Tran authored Aug 09, 2021

Add debug print of 64G link speed.

Link: https://lore.kernel.org/r/20210810043720.1137-7-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

85818882

scsi: qla2xxx: Show OS name and version in FDMI-1 · 137316ba

Arun Easi authored Aug 09, 2021

To be consistent with other OS drivers, register OS name and version in
FDMI-1 fabric registration.

Link: https://lore.kernel.org/r/20210810043720.1137-6-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

137316ba

scsi: qla2xxx: Changes to support FCP2 Target · 44c57f20

Saurav Kashyap authored Aug 09, 2021

Add changes to support FCP2 Target.

Link: https://lore.kernel.org/r/20210810043720.1137-5-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44c57f20

scsi: qla2xxx: Adjust request/response queue size for 28xx · ade660d4

Quinn Tran authored Aug 09, 2021

Adjust request/respond queue size for 28xx to match 27xx adapter.

Link: https://lore.kernel.org/r/20210810043720.1137-4-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ade660d4

scsi: qla2xxx: Add host attribute to trigger MPI hang · 4c15442d

Arun Easi authored Aug 09, 2021

Add a mechanism to trigger MPI pause for debugging purposes.

Link: https://lore.kernel.org/r/20210810043720.1137-2-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4c15442d

scsi: qedi: Add support for fastpath doorbell recovery · 9757f8af

Shai Malin authored Aug 05, 2021

Driver fastpath employs doorbells to indicate to the device that work is
available. Each doorbell translates to a message sent to the device over
PCI. These messages are queued by the doorbell queue HW block, and handled
by the HW.

If a sufficient amount of CPU cores are sending messages at a sufficient
rate, the queue can overflow, and messages can be dropped. There are many
entities in the driver which can send doorbell messages.  When overflow
happens, a fatal HW attention is indicated, and the Doorbell HW block stops
accepting new doorbell messages until recovery procedure is done.

When overflow occurs, all doorbells are dropped. Since doorbells are
aggregatives, if more doorbells are sent nothing has to be done.  But if
the "last" doorbell is dropped, the doorbelling entity doesn’t know this
happened, and may wait forever for the device to perform the action.  The
doorbell recovery mechanism addresses just that - it sends the last
doorbell of every entity.

[mkp: fix missing brackets reported by Guenter Roeck]

Link: https://lore.kernel.org/r/20210804221412.5048-1-smalin@marvell.comCo-developed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9757f8af

Merge branch '5.14/scsi-fixes' into 5.15/scsi-staging · 31548020

Martin K. Petersen authored Aug 11, 2021

Resolve mpt3sas conflict between 5.14/scsi-fixes and 5.15/scsi-staging
reported by sfr.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

31548020

scsi: isci: Use the proper SCSI midlayer interfaces for PI · 4cc0096e

Martin K. Petersen authored Aug 06, 2021

Use scsi_prot_ref_tag() instead of scsi_get_lba() to get the reference tag
for a given I/O.

Link: https://lore.kernel.org/r/20210806040023.5355-3-martin.petersen@oracle.comReviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4cc0096e