Commits · 27ba3e8ff3ab86449e63d38a8d623053591e65fa · nexedi / linux

16 Sep, 2020 1 commit

scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks · 27ba3e8f

Damien Le Moal authored Sep 15, 2020

When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).

When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.

Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.

Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b7205307 ("block: allow partitions on host aware zone devices")
Cc: <stable@vger.kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

27ba3e8f

15 Sep, 2020 1 commit

scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported · 7f04839e

James Smart authored Sep 11, 2020

Initial FLOGIs are failing with the following message:

 lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported

In a prior patch, the READ_SPARAM command was re-ordered to post after
CONFIG_LINK as the driver is expected to update the driver's copy of the
service parameters for the FLOGI payload. If the bb-credit recovery feature
is enabled, this is fine. But on adapters were bb-credit recovery isn't
enabled, it would cause the FLOGI to fail.

Fix by restoring the original command order (READ_SPARAM before
CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery
support and reissuing READ_SPARAM to obtain the updated service parameters
(effectively adding in the fix command order).

[mkp: corrected SHA]

Link: https://lore.kernel.org/r/20200911200147.110826-1-james.smart@broadcom.com
Fixes: 835214f5 ("scsi: lpfc: Fix broken Credit Recovery after driver load")
CC: <stable@vger.kernel.org> # v5.7+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7f04839e

10 Sep, 2020 1 commit

scsi: libsas: Fix error path in sas_notify_lldd_dev_found() · 244359c9

Dan Carpenter authored Sep 05, 2020

In sas_notify_lldd_dev_found(), if we can't allocate the necessary
resources, then it seems like the wrong thing to mark the device as found
and to increment the reference count.  None of the callers ever drop the
reference in that situation.

[mkp: tweaked commit desc based on feedback from John]

Link: https://lore.kernel.org/r/20200905125836.GF183976@mwanda
Fixes: 735f7d2f ("[SCSI] libsas: fix domain_device leak")
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

244359c9

03 Sep, 2020 3 commits

scsi: mpt3sas: Don't call disable_irq from IRQ poll handler · b614d55b

Tomas Henzl authored Sep 01, 2020

disable_irq() might sleep, replace it with disable_irq_nosync(). For
synchronisation 'irq_poll_scheduled' is sufficient

Fixes: 320e77ac scsi: mpt3sas: Irq poll to avoid CPU hard lockups
Link: https://lore.kernel.org/r/20200901145026.12174-1-thenzl@redhat.comSigned-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b614d55b

scsi: megaraid_sas: Don't call disable_irq from process IRQ poll · d2af3914

Tomas Henzl authored Aug 27, 2020

disable_irq() might sleep. Replace it with disable_irq_nosync() which is
sufficient as irq_poll_scheduled protects against concurrently running
complete_cmd_fusion() from megasas_irqpoll() and megasas_isr_fusion().

Link: https://lore.kernel.org/r/20200827165332.8432-1-thenzl@redhat.com
Fixes: a6ffd5bf scsi: megaraid_sas: Call disable_irq from process IRQ poll
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d2af3914

scsi: target: iscsi: Fix hang in iscsit_access_np() when getting tpg->np_login_sem · ed43ffea

Hou Pu authored Jul 29, 2020

The iSCSI target login thread might get stuck with the following stack:

cat /proc/`pidof iscsi_np`/stack
[<0>] down_interruptible+0x42/0x50
[<0>] iscsit_access_np+0xe3/0x167
[<0>] iscsi_target_locate_portal+0x695/0x8ac
[<0>] __iscsi_target_login_thread+0x855/0xb82
[<0>] iscsi_target_login_thread+0x2f/0x5a
[<0>] kthread+0xfa/0x130
[<0>] ret_from_fork+0x1f/0x30

This can be reproduced via the following steps:

1. Initiator A tries to log in to iqn1-tpg1 on port 3260. After finishing
   PDU exchange in the login thread and before the negotiation is finished
   the the network link goes down. At this point A has not finished login
   and tpg->np_login_sem is held.

2. Initiator B tries to log in to iqn2-tpg1 on port 3260. After finishing
   PDU exchange in the login thread the target expects to process remaining
   login PDUs in workqueue context.

3. Initiator A' tries to log in to iqn1-tpg1 on port 3260 from a new
   socket. A' will wait for tpg->np_login_sem with np->np_login_timer
   loaded to wait for at most 15 seconds. The lock is held by A so A'
   eventually times out.

4. Before A' got timeout initiator B gets negotiation failed and calls
   iscsi_target_login_drop()->iscsi_target_login_sess_out().  The
   np->np_login_timer is canceled and initiator A' will hang forever.
   Because A' is now in the login thread, no new login requests can be
   serviced.

Fix this by moving iscsi_stop_login_thread_timer() out of
iscsi_target_login_sess_out(). Also remove iscsi_np parameter from
iscsi_target_login_sess_out().

Link: https://lore.kernel.org/r/20200729130343.24976-1-houpu@bytedance.com
Cc: stable@vger.kernel.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Hou Pu <houpu@bytedance.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ed43ffea

02 Sep, 2020 2 commits

scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA · 53de092f

Luo Jiaxing authored Aug 26, 2020

It was discovered that sdparm will fail when attempting to disable write
cache on a SATA disk connected via libsas.

In the ATA command set the write cache state is controlled through the SET
FEATURES operation. This is roughly corresponds to MODE SELECT in SCSI and
the latter command is what is used in the SCSI-ATA translation layer. A
subtle difference is that a MODE SELECT carries data whereas SET FEATURES
is defined as a non-data command in ATA.

Set the DMA data direction to DMA_NONE if the requested ATA command is
identified as non-data.

[mkp: commit desc]

Fixes: fa1c1e8f ("[SCSI] Add SATA support to libsas")
Link: https://lore.kernel.org/r/1598426666-54544-1-git-send-email-luojiaxing@huawei.comReviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

53de092f

scsi: target: iscsi: Fix data digest calculation · 5528d031

Varun Prakash authored Aug 25, 2020

Current code does not consider 'page_off' in data digest calculation. To
fix this, add a local variable 'first_sg' and set first_sg.offset to
sg->offset + page_off.

Link: https://lore.kernel.org/r/1598358910-3052-1-git-send-email-varun@chelsio.com
Fixes: e48354ce ("iscsi-target: Add iSCSI fabric support for target v4.1")
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Christie <michael.christie@oralce.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5528d031

01 Sep, 2020 7 commits

scsi: lpfc: Update lpfc version to 12.8.0.4 · bc534069

James Smart authored Aug 28, 2020

Update lpfc version to 12.8.0.4

Link: https://lore.kernel.org/r/20200828175332.130300-5-james.smart@broadcom.comCo-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bc534069

scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional events · 441f6b5b

James Smart authored Aug 28, 2020

Currently the driver registers for Link Integrity events only.

This patch adds registration for the following FPIN types:

 - Delivery Notifications
 - Congestion Notification
 - Peer Congestion Notification

Link: https://lore.kernel.org/r/20200828175332.130300-4-james.smart@broadcom.comCo-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

441f6b5b

scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery · 7b08e89f

James Smart authored Aug 28, 2020

The driver is unable to successfully login with remote device. During pt2pt
login, the driver completes its FLOGI request with the remote device having
WWN precedence. The remote device issues its own (delayed) FLOGI after
accepting the driver's and, upon transmitting the FLOGI, immediately
recognizes it has already processed the driver's FLOGI thus it transitions
to sending a PLOGI before waiting for an ACC to its FLOGI.

In the driver, the FLOGI is received and an ACC sent, followed by the PLOGI
being received and an ACC sent. The issue is that the PLOGI reception
occurs before the response from the adapter from the FLOGI ACC is
received. Processing of the PLOGI sets state flags to perform the REG_RPI
mailbox command and proceed with the rest of discovery on the port. The
same completion routine used by both FLOGI and PLOGI is generic in
nature. One of the things it does is clear flags, and those flags happen to
drive the rest of discovery. So what happened was the PLOGI processing set
the flags, the FLOGI ACC completion cleared them, thus when the PLOGI ACC
completes it doesn't see the flags and stops.

Fix by modifying the generic completion routine to not clear the rest of
discovery flag (NLP_ACC_REGLOGIN) unless the completion is also associated
with performing a mailbox command as part of its handling. For things such
as FLOGI ACC, there isn't a subsequent action to perform with the adapter,
thus there is no mailbox cmd ptr. PLOGI ACC though will perform REG_RPI
upon completion, thus there is a mailbox cmd ptr.

Link: https://lore.kernel.org/r/20200828175332.130300-3-james.smart@broadcom.comCo-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7b08e89f

scsi: lpfc: Fix setting IRQ affinity with an empty CPU mask · 7ac836eb

James Smart authored Aug 28, 2020

Some systems are reporting the following log message during driver unload
or system shutdown:

  ics_rtas_set_affinity: No online cpus in the mask

A prior commit introduced the writing of an empty affinity mask in calls to
irq_set_affinity_hint() when disabling interrupts or when there are no
remaining online CPUs to service an eq interrupt. At least some ppc64
systems are checking whether affinity masks are empty or not.

Do not call irq_set_affinity_hint() with an empty CPU mask.

Fixes: dcaa2136 ("scsi: lpfc: Change default IRQ model on AMD architectures")
Link: https://lore.kernel.org/r/20200828175332.130300-2-james.smart@broadcom.com
Cc: <stable@vger.kernel.org> # v5.5+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7ac836eb

scsi: qla2xxx: Fix regression on sparc64 · 2a87d485

René Rebe authored Aug 27, 2020

Commit 98aee70d ("qla2xxx: Add endianizer to max_payload_size
modifier.") in 2014 broke qla2xxx on sparc64, e.g. as in the Sun Blade 1000
/ 2000. Unbreak by partial revert to fix endianness in nvram firmware
default initialization. Also mark the second frame_payload_size in nvram_t
__le16 to avoid new sparse warnings.

Link: https://lore.kernel.org/r/20200827.222729.1875148247374704975.rene@exactcode.com
Fixes: 98aee70d ("qla2xxx: Add endianizer to max_payload_size modifier.")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: René Rebe <rene@exactcode.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2a87d485

scsi: libfc: Fix for double free() · 5a5b80f9

Javed Hasan authored Aug 25, 2020

Fix for '&fp->skb' double free.

Link:
https://lore.kernel.org/r/20200825093940.19612-1-jhasan@marvell.comReported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5a5b80f9

scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abort · ea403fde

Dinghao Liu authored Aug 23, 2020

When pm8001_tag_alloc() fails, task should be freed just like it is done in
the subsequent error paths.

Link: https://lore.kernel.org/r/20200823091453.4782-1-dinghao.liu@zju.edu.cnAcked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ea403fde

25 Aug, 2020 3 commits

scsi: scsi_debug: Remove superfluous close zone in resp_open_zone() · 75d46c6d

Niklas Cassel authored Aug 21, 2020

resp_open_zone() always calls zbc_open_zone() with parameter explicit set
to true.

If zbc_open_zone() is called with parameter explicit set to true, and the
current zone state is implicit open, it will call zbc_close_zone() on the
zone before proceeding.

Therefore, there is no need for resp_open_zone() to call zbc_close_zone()
on an implicitly open zone before calling zbc_open_zone().

Remove superfluous close zone in resp_open_zone().

Link: https://lore.kernel.org/r/20200821130007.39938-1-niklas.cassel@wdc.comReviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

75d46c6d

scsi: libcxgbi: Fix a use after free in cxgbi_conn_xmit_pdu() · 38660389

Dan Carpenter authored Aug 24, 2020

We accidentally move this logging printk after the free, but that leads to
a use after free.

Link: https://lore.kernel.org/r/20200824085933.GD208317@mwanda
Fixes: e33c2482 ("scsi: cxgb4i: Add support for iSCSI segmentation offload")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

38660389

scsi: qedf: Fix null ptr reference in qedf_stag_change_work · f308a35f

Ye Bin authored Aug 24, 2020

Link: https://lore.kernel.org/r/20200824033436.45570-1-yebin10@huawei.comAcked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f308a35f

18 Aug, 2020 22 commits

Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe" · dca93232

Quinn Tran authored Aug 06, 2020

FCP T10-PI and NVMe features are independent of each other. This patch
allows both features to co-exist.

This reverts commit 5da05a26.

Link: https://lore.kernel.org/r/20200806111014.28434-12-njavali@marvell.com
Fixes: 5da05a26 ("scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dca93232

Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command" · de7e6194

Saurav Kashyap authored Aug 06, 2020

FCoE adapter initialization failed for ISP8021 with the following patch
applied. In addition, reproduction of the issue the patch originally tried
to address has been unsuccessful.

This reverts commit 3cb182b3.

Link: https://lore.kernel.org/r/20200806111014.28434-11-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

de7e6194

scsi: qla2xxx: Fix null pointer access during disconnect from subsystem · 83949613

Quinn Tran authored Aug 06, 2020

NVMEAsync command is being submitted to QLA while the same NVMe controller
is in the middle of reset. The reset path has deleted the association and
freed aen_op->fcp_req.private. Add a check for this private pointer before
issuing the command.

...
 6 [ffffb656ca11fce0] page_fault at ffffffff8c00114e
    [exception RIP: qla_nvme_post_cmd+394]
    RIP: ffffffffc0d012ba  RSP: ffffb656ca11fd98  RFLAGS: 00010206
    RAX: ffff8fb039eda228  RBX: ffff8fb039eda200  RCX: 00000000000da161
    RDX: ffffffffc0d4d0f0  RSI: ffffffffc0d26c9b  RDI: ffff8fb039eda220
    RBP: 0000000000000013   R8: ffff8fb47ff6aa80   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffb656ca11fdc8  R12: ffff8fb27d04a3b0
    R13: ffff8fc46dd98a58  R14: 0000000000000000  R15: ffff8fc4540f0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 7 [ffffb656ca11fe08] nvme_fc_start_fcp_op at ffffffffc0241568 [nvme_fc]
 8 [ffffb656ca11fe50] nvme_fc_submit_async_event at ffffffffc0241901 [nvme_fc]
 9 [ffffb656ca11fe68] nvme_async_event_work at ffffffffc014543d [nvme_core]
10 [ffffb656ca11fe98] process_one_work at ffffffff8b6cd437
11 [ffffb656ca11fed8] worker_thread at ffffffff8b6cdcef
12 [ffffb656ca11ff10] kthread at ffffffff8b6d3402
13 [ffffb656ca11ff50] ret_from_fork at ffffffff8c000255

--
PID: 37824  TASK: ffff8fb033063d80  CPU: 20  COMMAND: "kworker/u97:451"
 0 [ffffb656ce1abc28] __schedule at ffffffff8be629e3
 1 [ffffb656ce1abcc8] schedule at ffffffff8be62fe8
 2 [ffffb656ce1abcd0] schedule_timeout at ffffffff8be671ed
 3 [ffffb656ce1abd70] wait_for_completion at ffffffff8be639cf
 4 [ffffb656ce1abdd0] flush_work at ffffffff8b6ce2d5
 5 [ffffb656ce1abe70] nvme_stop_ctrl at ffffffffc0144900 [nvme_core]
 6 [ffffb656ce1abe80] nvme_fc_reset_ctrl_work at ffffffffc0243445 [nvme_fc]
 7 [ffffb656ce1abe98] process_one_work at ffffffff8b6cd437
 8 [ffffb656ce1abed8] worker_thread at ffffffff8b6cdb50
 9 [ffffb656ce1abf10] kthread at ffffffff8b6d3402
10 [ffffb656ce1abf50] ret_from_fork at ffffffff8c000255

Link: https://lore.kernel.org/r/20200806111014.28434-10-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

83949613

scsi: qla2xxx: Check if FW supports MQ before enabling · dffa1145

Saurav Kashyap authored Aug 06, 2020

OS boot during Boot from SAN was stuck at dracut emergency shell after
enabling NVMe driver parameter. For non-MQ support the driver was enabling
MQ. Add a check to confirm if FW supports MQ.

Link: https://lore.kernel.org/r/20200806111014.28434-9-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dffa1145

scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba · 897d68eb

Arun Easi authored Aug 06, 2020

qla_nvme_register_hba() puts out a warning when there are not enough queue
pairs available for FC-NVME. Just fail the NVME registration rather than a
WARNING + call Trace.

Link: https://lore.kernel.org/r/20200806111014.28434-8-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

897d68eb

scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime · 49030003

Arun Easi authored Aug 06, 2020

ql2xextended_error_logging can now be set to 1 to get the default mask
value, as opposed to at module load time only.

Link: https://lore.kernel.org/r/20200806111014.28434-7-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

49030003

scsi: qla2xxx: Reduce noisy debug message · 81b9d1e1

Quinn Tran authored Aug 06, 2020

Update debug level and message for ELS IOCB done.

Link: https://lore.kernel.org/r/20200806111014.28434-6-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

81b9d1e1

scsi: qla2xxx: Fix login timeout · abb31aea

Quinn Tran authored Aug 06, 2020

Multipath errors were seen during failback due to login timeout. The
remote device sent LOGO, the local host tore down the session and did
relogin. The RSCN arrived indicates remote device is going through failover
after which the relogin is in a 20s timeout phase. At this point the
driver is stuck in the relogin process. Add a fix to delete the session as
part of abort/flush the login.

Link: https://lore.kernel.org/r/20200806111014.28434-5-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

abb31aea

scsi: qla2xxx: Indicate correct supported speeds for Mezz card · 4709272f

Quinn Tran authored Aug 06, 2020

Correct the supported speeds for 16G Mezz card.

Link: https://lore.kernel.org/r/20200806111014.28434-4-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4709272f

scsi: qla2xxx: Flush I/O on zone disable · a117579d

Quinn Tran authored Aug 06, 2020

Perform implicit logout to flush I/O on zone disable.

Link: https://lore.kernel.org/r/20200806111014.28434-3-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a117579d

scsi: qla2xxx: Flush all sessions on zone disable · 10ae30ba

Quinn Tran authored Aug 06, 2020

On Zone Disable, certain switches would ignore all commands. This causes
timeout for both switch scan command and abort of that command. On
detection of this condition, all sessions will be shutdown.

Link: https://lore.kernel.org/r/20200806111014.28434-2-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

10ae30ba

scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values · c314a014

Enzo Matsumiya authored Aug 05, 2020

Improves readability of qla_mbx.c.

Link: https://lore.kernel.org/r/20200805200546.22497-1-ematsumiya@suse.deReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c314a014

scsi: scsi_debug: Fix scp is NULL errors · 223f91b4

Douglas Gilbert authored Aug 13, 2020

John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).

The random=1 parameter setting was pivotal in finding this error. The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval. The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).

Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.comReported-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

223f91b4

scsi: zfcp: Fix use-after-free in request timeout handlers · 2d9a2c5f

Steffen Maier authored Aug 13, 2020

Before v4.15 commit 75492a51 ("s390/scsi: Convert timers to use
timer_setup()"), we intentionally only passed zfcp_adapter as context
argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
adapter recovery, it was unnecessary to sync against races between timeout
and (late) completion.  Likewise, we only passed zfcp_erp_action as context
argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
it was unnecessary to sync against races between timeout and (late)
completion.

Meanwhile the timeout handlers get timer_list as context argument and do a
timer-specific container-of to zfcp_fsf_req which can have been freed.

Fix it by making sure that any request timeout handlers, that might just
have started before del_timer(), are completed by using del_timer_sync()
instead. This ensures the request free happens afterwards.

Space time diagram of potential use-after-free:

Basic idea is to have 2 or more pending requests whose timeouts run out at
almost the same time.

req 1 timeout     ERP thread        req 2 timeout
----------------  ----------------  ---------------------------------------
zfcp_fsf_request_timeout_handler
fsf_req = from_timer(fsf_req, t, timer)
adapter = fsf_req->adapter
zfcp_qdio_siosl(adapter)
zfcp_erp_adapter_reopen(adapter,...)
                  zfcp_erp_strategy
                  ...
                  zfcp_fsf_req_dismiss_all
                  list_for_each_entry_safe
                    zfcp_fsf_req_complete 1
                    del_timer 1
                    zfcp_fsf_req_free 1
                    zfcp_fsf_req_complete 2
                                    zfcp_fsf_request_timeout_handler
                    del_timer 2
                                    fsf_req = from_timer(fsf_req, t, timer)
                    zfcp_fsf_req_free 2
                                    adapter = fsf_req->adapter
                                              ^^^^^^^ already freed

Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com
Fixes: 75492a51 ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2d9a2c5f

scsi: ufs: No need to send Abort Task if the task in DB was cleared · d87a1f6d

Bean Huo authored Aug 11, 2020

If the bit corresponding to a task in the Doorbell register has been
cleared, no need to poll the status of the task on the device side and to
send an Abort Task TM. Instead, let it directly goto cleanup.

In addition, to keep original debug output, move the goto below the debug
print.

Link: https://lore.kernel.org/r/20200811141859.27399-3-huobean@gmail.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d87a1f6d

scsi: ufs: Clean up completed request without interrupt notification · b10178ee

Stanley Chu authored Aug 11, 2020

If somehow no interrupt notification is raised for a completed request and
its doorbell bit is cleared by host, UFS driver needs to cleanup its
outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally
in the following scenario:

After ufshcd_abort() returns, this request will be requeued by SCSI layer
with its outstanding bit set. Any future completed request will trigger
ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At
this time the "abnormal outstanding bit" will be detected and the "requeued
request" will be chosen to execute request post-processing flow. This is
wrong because this request is still "alive".

Link: https://lore.kernel.org/r/20200811141859.27399-2-huobean@gmail.comReviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b10178ee

scsi: ufs: Improve interrupt handling for shared interrupts · 127d5f7c

Adrian Hunter authored Aug 11, 2020

For shared interrupts, the interrupt status might be zero, so check that
first.

Link: https://lore.kernel.org/r/20200811133936.19171-2-adrian.hunter@intel.comReviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

127d5f7c

scsi: ufs: Fix interrupt error message for shared interrupts · 6337f58c

Adrian Hunter authored Aug 11, 2020

The interrupt might be shared, in which case it is not an error for the
interrupt handler to be called when the interrupt status is zero, so don't
print the message unless there was enabled interrupt status.

Link: https://lore.kernel.org/r/20200811133936.19171-1-adrian.hunter@intel.com
Fixes: 9333d775 ("scsi: ufs: Fix irq return code")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6337f58c

scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL · 8da76f71

Adrian Hunter authored Aug 10, 2020

Intel EHL UFS host controller advertises auto-hibernate capability but it
does not work correctly. Add a quirk for that.

[mkp: checkpatch fix]

Link: https://lore.kernel.org/r/20200810141024.28859-1-adrian.hunter@intel.com
Fixes: 8c09d752 ("scsi: ufshdc-pci: Add Intel PCI IDs for EHL")
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8da76f71

scsi: ufs-mediatek: Fix incorrect time to wait link status · 215d3267

Stanley Chu authored Aug 09, 2020

Fix incorrect calculation of "ms" based waiting time in function
ufs_mtk_setup_clocks().

Link: https://lore.kernel.org/r/20200809055702.20140-1-stanley.chu@mediatek.com
Fixes: 9006e398 ("scsi: ufs-mediatek: Do not gate clocks if auto-hibern8 is not entered yet")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

215d3267

scsi: ufs: Fix possible infinite loop in ufshcd_hold · 93b6c5db

Stanley Chu authored Aug 09, 2020

In ufshcd_suspend(), after clk-gating is suspended and link is set
as Hibern8 state, ufshcd_hold() is still possibly invoked before
ufshcd_suspend() returns. For example, MediaTek's suspend vops may
issue UIC commands which would call ufshcd_hold() during the command
issuing flow.

Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled,
then ufshcd_hold() may enter infinite loops because there is no
clk-ungating work scheduled or pending. In this case, ufshcd_hold()
shall just bypass, and keep the link as Hibern8 state.

Link: https://lore.kernel.org/r/20200809050734.18740-1-stanley.chu@mediatek.comReviewed-by: Avri Altman <avri.altman@wdc.com>
Co-developed-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

93b6c5db

scsi: fcoe: Fix I/O path allocation · fa39ab51

Mike Christie authored Aug 07, 2020

ixgbe_fcoe_ddp_setup() can be called from the main I/O path and is called
with a spin_lock held, so we have to use GFP_ATOMIC allocation instead of
GFP_KERNEL.

Link: https://lore.kernel.org/r/1596831813-9839-1-git-send-email-michael.christie@oracle.com
cc: Hannes Reinecke <hare@suse.de>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fa39ab51