Commits · 94a68c814328836d022d1e7ced1b762834917bd2 · Kirill Smelkov / linux

08 Feb, 2022 9 commits

scsi: smartpqi: Quickly propagate path failures to SCSI midlayer · 94a68c81

Murthy Bhat authored Feb 01, 2022

Return DID_NO_CONNECT when a path failure is detected.

When a path fails during I/O and AIO path gets disabled for a multipath
device, the I/O was retried in the RAID path slowing down path fail
detection. Returning DID_NO_CONNECT allows multipath to switch paths more
quickly.

Link: https://lore.kernel.org/r/164375209313.440833.9992416628621839233.stgit@brunhilda.pdev.netReviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Sagar Biradar <sagar.biradar@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

94a68c81

scsi: smartpqi: Eliminate drive spin down on warm boot · 70ba20be

Sagar Biradar authored Feb 01, 2022

Avoid drive spin down during a warm boot.

Call the BMIC Flush Cache command (0xc2) to indicate the reason for the
flush cache (shutdown, hibernate, suspend, or restart).

Link: https://lore.kernel.org/r/164375208810.440833.11254644025650871435.stgit@brunhilda.pdev.netReviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Signed-off-by: Sagar Biradar <sagar.biradar@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

70ba20be

scsi: smartpqi: Enable SATA NCQ priority in sysfs · 2a47834d

Gilbert Wu authored Feb 01, 2022

Add device attribute 'sas_ncq_prio_enable' to enable SATA NCQ priority
support and provide I/O priority in SCSI command and pass priority
information to controller firmware.

This device attribute works only when device has NCQ priority support and
controller firmware can handle I/Os with NCQ priority attribute.

Link: https://lore.kernel.org/r/164375208306.440833.7392577382127815362.stgit@brunhilda.pdev.netReviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2a47834d

scsi: smartpqi: Add PCI IDs · c57ee4cc

Don Brace authored Feb 01, 2022

Add in new ZTE controllers:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
ZTE SmartROC3100 RS241-18i 2G       9005 / 028F / 1CF2 / 5449
ZTE SmartROC3100 RS242-18i 4G       9005 / 028F / 1CF2 / 544A
ZTE SmartIOC2100 RS243-18i          9005 / 028F / 1CF2 / 544B
ZTE SmartROC3100 RM241B-18i 2G      9005 / 028F / 1CF2 / 544D
ZTE SmartROC3100 RM242B-18i 4G      9005 / 028F / 1CF2 / 544E
ZTE SmartIOC2100 RM243B-18i         9005 / 028F / 1CF2 / 544F

Add PCI ID for 1100-24i controller:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
HBA 1100-24i                        9005 / 028F / 9005 / 1304

Add PCI IDs for HPE and Adaptec devices:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
Adaptec Smart HBA 2200-8io /e       9005 / 028F / 9005 / 1463
Adaptec Smart HBA 2200-16io /e      9005 / 028F / 9005 / 14C2
HPE SR308i-p Gen11                  9005 / 028F / 1590 / 0382
HPE SR308i-o Gen11                  9005 / 028F / 1590 / 0383
HPE SR932i-p Gen11                  9005 / 028F / 1590 / 0381

Add PCI IDs for Inspur controllers:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
INSPUR RS0800M5H24i                 9005 / 028F / 1BD4 / 006B
INSPUR RS0800M5E8I                  9005 / 028F / 1BD4 / 006C
INSPUR RS0800M5H8I                  9005 / 028F / 1BD4 / 006D
INSPUR RS0804M5R16i                 9005 / 028F / 1BD4 / 006F
INSPUR RS0800M5E16i                 9005 / 028F / 1BD4 / 0070
INSPUR RS0800M5H16i                 9005 / 028F / 1BD4 / 0071
INSPUR RS0800M5E16i                 9005 / 028F / 1BD4 / 0072
NT RAID 3100-24i                    9005 / 028F / 1F0C / 3161

Add HPE and Adaptec OROC PCI IDs:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
HPE SR216i-o Gen11                  9005 / 028F / 9005 / 036F
Adaptec SmartRAID 3284-16io /e/uC   9005 / 028F / 9005 / 1473
Adaptec SmartRAID 3254-16io /e      9005 / 028F / 9005 / 1474

Add PCI IDs for new channel controllers:
                                    VID  / DID  / SVID / SDID
                                    ----   ----   ----   ----
Adaptec SmartRAID 3254-8i /e        9005 / 028F / 9005 / 14A4
Adaptec SmartRAID 3252-8i /e        9005 / 028F / 9005 / 14A5
Adaptec SmartRAID 3204-8i /e        9005 / 028F / 9005 / 14A6

Align PCI IDs with OOB driver. Easier to check for differences.

Link: https://lore.kernel.org/r/164375207802.440833.15947108153078495425.stgit@brunhilda.pdev.netReviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Co-developed-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Co-developed-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Co-developed-by: Sagar Biradar <sagar.biradar@microchip.com>
Signed-off-by: Sagar Biradar <sagar.biradar@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c57ee4cc

scsi: smartpqi: Fix rmmod stack trace · c4ff687d

Don Brace authored Feb 01, 2022

Prevent "BUG: scheduling while atomic: rmmod" stack trace.

Stop setting spin_locks before calling OS functions to remove devices.

Link: https://lore.kernel.org/r/164375207296.440833.4996145011193819683.stgit@brunhilda.pdev.netReviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c4ff687d

scsi: mpt3sas: Convert to flexible arrays · d20b3dae

Kees Cook authored Feb 01, 2022

This converts to a flexible array instead of the old-style 1-element
arrays. The existing code already did the correct math for finding the size
of the resulting flexible array structure, so there is no binary
difference.

The other two structures converted to use flexible arrays appear to have no
users at all.

Link: https://lore.kernel.org/r/20220201223948.1455637-1-keescook@chromium.orgSigned-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d20b3dae

scsi: usb: storage: Complete the SCSI request directly · 23fe0755

Sebastian Andrzej Siewior authored Feb 01, 2022

The USB storage driver can complete its requests directly from a kernel
thread. Use scsi_done_direct() to avoid waking ksoftirqd.

Link: https://lore.kernel.org/r/20220201210954.570896-3-sebastian@breakpoint.ccSigned-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

23fe0755

scsi: core: Add scsi_done_direct() for immediate completion · b84b6ec0

Sebastian Andrzej Siewior authored Feb 03, 2022

Add scsi_done_direct() which behaves like scsi_done() except that it
invokes blk_mq_complete_request_direct() in order to complete the request.

Callers from process context can complete the request directly instead
waking ksoftirqd.

Link: https://lore.kernel.org/r/Yfw7JaszshmfYa1d@flowReviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b84b6ec0

scsi: core: Make "access_state" sysfs attribute always visible · 7cddf7e8

Martin Wilck authored Jan 27, 2022

If a SCSI device handler module is loaded after some SCSI devices have
already been probed (e.g. via request_module() by dm-multipath), the
"access_state" and "preferred_path" sysfs attributes remain invisible for
these devices, although the handler is attached and live. The reason is
that the visibility is only checked when the sysfs attribute group is first
created. This results in an inconsistent user experience depending on the
load order of SCSI low-level drivers vs. device handler modules.

This patch changes user space API: attempting to read the "access_state" or
"preferred_path" attributes will now result in -EINVAL rather than -ENODEV
for devices that have no device handler, and tests for the existence of
these attributes will have a different result.

Link: https://lore.kernel.org/r/20220127141351.30706-1-mwilck@suse.comReviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7cddf7e8

31 Jan, 2022 7 commits

scsi: lpfc: Remove redundant flush_workqueue() call · d1d87c33

Minghao Chi (CGEL ZTE) authored Jan 27, 2022

destroy_workqueue() already drains the queue before destroying it, so there
is no need to flush it explicitly.

Remove the redundant flush_workqueue() call.

Link: https://lore.kernel.org/r/20220127014330.1185114-1-chi.minghao@zte.com.cnReported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d1d87c33

scsi: qedi: Remove redundant flush_workqueue() calls · 0603be71

Minghao Chi (CGEL ZTE) authored Jan 27, 2022

destroy_workqueue() already drains the queue before destroying it, so there
is no need to flush it explicitly.

Remove the redundant flush_workqueue() calls.

Link: https://lore.kernel.org/r/20220127013934.1184923-1-chi.minghao@zte.com.cnReported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0603be71

scsi: bfa: Replace snprintf() with sysfs_emit() · 2245ea91

Yang Guang authored Jan 27, 2022

coccinelle report:
./drivers/scsi/bfa/bfad_attr.c:908:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:860:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:888:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:853:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:808:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:728:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:822:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:927:9-17:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:900:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:874:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:714:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:839:8-16:
WARNING: use scnprintf or sprintf

Use sysfs_emit() instead of scnprintf() or sprintf().

Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.1643181468.git.yang.guang5@zte.com.cnReported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2245ea91

scsi: mvsas: Replace snprintf() with sysfs_emit() · 0ad3867b

Yang Guang authored Jan 27, 2022

coccinelle report:
./drivers/scsi/mvsas/mv_init.c:699:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/mvsas/mv_init.c:747:8-16:
WARNING: use scnprintf or sprintf

Use sysfs_emit() instead of scnprintf() or sprintf().

Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.1643182948.git.yang.guang5@zte.com.cnReported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0ad3867b

scsi: bnx2fc: Make use of the helper macro kthread_run() · 687ba48e

Yin Xiujiang authored Jan 26, 2022

Repalce kthread_create/wake_up_process() with kthread_run() to simplify the
code.

Link: https://lore.kernel.org/r/20220126014248.466806-1-yinxiujiang@kylinos.cnSigned-off-by: Yin Xiujiang <yinxiujiang@kylinos.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

687ba48e

scsi: bnx2fc: Fix typo in comments · dd84a4b0

Cai Huoqing authored Jan 28, 2022

Replace 'Offlaod' with 'Offload'.

Link: https://lore.kernel.org/r/20220128063101.6953-1-cai.huoqing@linux.devAcked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dd84a4b0

scsi: ufs: Add checking lifetime attribute for WriteBooster · f681d107

Jinyoung Choi authored Jan 27, 2022

Because WB performs writes in SLC mode, it is not possible to use
WriteBooster indefinitely. Vendors can set a lifetime limit in the device.
If the lifetime exceeds this limit, the device ican disable the WB feature.

The feature is defined in the "bWriteBoosterBufferLifeTimeEst (IDN = 1E)"
attribute.

With lifetime exceeding the limit value, the current driver continuously
performs the following query:

	- Write Flag: WB_ENABLE / DISABLE
	- Read attr: Available Buffer Size
	- Read attr: Current Buffer Size

This patch recognizes that WriteBooster is no longer supported by the
device, and prevents unnecessary queries.

Link: https://lore.kernel.org/r/1891546521.01643252701746.JavaMail.epsvc@epcpadp3Reviewed-by: Asutosh Das <quic_asutoshd@quicinc.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f681d107

25 Jan, 2022 24 commits

scsi: scsi_debug: Add environmental reporting log subpage · 0790797a

Douglas Gilbert authored Jan 08, 2022

Log subpages are starting to appear in real devices (e.g. SSDs) so add
support for one. Adopt approach where all "wild" sub-pages are themselves
listed as long as there is at least one non-wild page or subpage for a
given page number.

Link: https://lore.kernel.org/r/20220109012853.301953-10-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0790797a

scsi: scsi_debug: Add no_rwlock parameter · 7109f370

Douglas Gilbert authored Jan 08, 2022

By default, this driver places a read lock around all user data fetches and
a write lock around all user data modifying operations (e.g. WRITE
commands). These locks have "per store" granularity. Other drivers that
have a similar function (e.g. null_blk) do not take this data integrity
step and run significantly faster in some tests.

In the common case of a (simulated) device to device copy (e.g. what dd
and its variants do) there should be no need for locks around data
accesses. So add the driver and sysfs parameter no_rwlock which is boolean
and when set does what its name suggests. The default is false for backward
comaptibility.

Link: https://lore.kernel.org/r/20220109012853.301953-7-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7109f370

scsi: scsi_debug: Divide power on reset UNIT ATTENTION · 500d0d24

Douglas Gilbert authored Jan 08, 2022

To distinguish between resets sent by the SCSI mid-level error
handling and newly introduced devices (LUs), this Unit Attention:

power on, reset, or bus reset occurred [0x29,0x0]

has been subdivided into that UA for the reset case and this new UA:

power on occurred [0x29,0x1]

for the new device (LU) case. This makes debug a little easier to follow
when it is turned on (e.g. 'echo 0x1 > opts').

Bump driver version number.

Link: https://lore.kernel.org/r/20220109012853.301953-6-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

500d0d24

scsi: scsi_debug: Refine sdebug_blk_mq_poll() · b05d4e48

Douglas Gilbert authored Jan 08, 2022

Refine the sdebug_blk_mq_poll() function so it only takes the spinlock on
the queue when it can see one or more requests with the in_use bitmap flag
set.

Link: https://lore.kernel.org/r/20220109012853.301953-5-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b05d4e48

scsi: scsi_debug: Use TASK SET FULL more · 7d5a129b

Douglas Gilbert authored Jan 08, 2022

When the internal in_use bit array in this driver is full returning
SCSI_MLQUEUE_HOST_BUSY leads to the mid-level reissuing the request which
is unhelpful. Previously TASK SET FULL status was only returned if ALL_TSF
[0x400] is placed in the opts variable (at load time or via sysfs). Now
ignore that setting and always return TASK SET FULL when in_use array is
full. Also set DID_ABORT together with TASK SET FULL so the mid-level gives
up immediately.

Aside: the situations addressed by this patch lead to lockups and
timeouts. They have only been detected when blk_poll() is used. That
mechanism is relatively new in the SCSI subsystem suggesting the mid-level
may need more work in that area.

Link: https://lore.kernel.org/r/20220109012853.301953-4-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7d5a129b

scsi: scsi_debug: Strengthen defer_t accesses · d9d23a5a

Douglas Gilbert authored Jan 08, 2022

Use READ_ONCE() and WRITE_ONCE() macros when accessing the
sdebug_defer::defer_t value.

Link: https://lore.kernel.org/r/20220109012853.301953-3-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d9d23a5a

scsi: scsi_debug: Address races following module load · 2aad3cd8

Douglas Gilbert authored Jan 08, 2022

When scsi_debug is loaded as a module with many (simulated) hosts, targets,
and devices (LUs), modprobe can take a long time to return.  Only a small
amount of this time is spent in the scsi_debug_init(); the rest is other
parts of the kernel reacting to to the appearance of new storage
devices. As soon as scsi_debug_init() has completed the user space may call
'rmmod scsi_debug' and this was found to cause race problems as outlined
here:

    https://bugzilla.kernel.org/show_bug.cgi?id=212337

To reliably generate this race a sysfs parameter called rm_all_hosts was
added and the code was strengthened in this area. The main change was to
make the count of scsi_debug hosts present an atomic.  Then it was found
that the handling of the existing add_host parameter needed the same
strengthening. Further: 'echo -9999 >
/sys/bus/pseudo/drivers/scsi_debug/add_host has the same effect as
rm_all_hosts so rm_all_hosts was not needed.

To inhibit a race between two invocations of writes to add_host, a mutex
was added. Also address a possible race when rmmod is called but LUs are
still being added.

The logic to remove (all) hosts is rather crude: it works backwards down a
linked lists of hosts. Any pending requests are terminated with
DID_NO_CONNECT as are any new requests. In the case where not all hosts are
being removed, the ones that remain may have lost requests as just
outlined. The lowest numbered host (id) hosts will remain.

Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220109012853.301953-2-dgilbert@interlog.comSigned-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2aad3cd8

scsi: qla2xxx: Update version to 10.02.07.300-k · 0dd392d1

Nilesh Javali authored Jan 09, 2022

Link: https://lore.kernel.org/r/20220110050218.3958-18-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0dd392d1

scsi: qla2xxx: Check for firmware dump already collected · cfbafad7

Joe Carnuccio authored Jan 09, 2022

While allocating firmware dump, check if dump is already collected and do
not re-allocate the buffer.

Link: https://lore.kernel.org/r/20220110050218.3958-17-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cfbafad7

scsi: qla2xxx: Add devids and conditionals for 28xx · 0d6a536c

Joe Carnuccio authored Jan 09, 2022

This is an update to the original 28xx adapter enablement. Add a bunch of
conditionals that are applicable for 28xx.

Link: https://lore.kernel.org/r/20220110050218.3958-16-njavali@marvell.com
Fixes: ecc89f25 ("scsi: qla2xxx: Add Device ID for ISP28XX")
Cc: stable@vger.kernel.org
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0d6a536c

scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair() · a60447e7

Saurav Kashyap authored Jan 09, 2022

[   12.323788] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/1020
[   12.332297] caller is qla2xxx_create_qpair+0x32a/0x5d0 [qla2xxx]
[   12.338417] CPU: 7 PID: 1020 Comm: systemd-udevd Tainted: G          I      --------- ---  5.14.0-29.el9.x86_64 #1
[   12.348827] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.6.0 05/22/2018
[   12.356356] Call Trace:
[   12.358821]  dump_stack_lvl+0x34/0x44
[   12.362514]  check_preemption_disabled+0xd9/0xe0
[   12.367164]  qla2xxx_create_qpair+0x32a/0x5d0 [qla2xxx]
[   12.372481]  qla2x00_probe_one+0xa3a/0x1b80 [qla2xxx]
[   12.377617]  ? _raw_spin_lock_irqsave+0x19/0x40
[   12.384284]  local_pci_probe+0x42/0x80
[   12.390162]  ? pci_match_device+0xd7/0x110
[   12.396366]  pci_device_probe+0xfd/0x1b0
[   12.402372]  really_probe+0x1e7/0x3e0
[   12.408114]  __driver_probe_device+0xfe/0x180
[   12.414544]  driver_probe_device+0x1e/0x90
[   12.420685]  __driver_attach+0xc0/0x1c0
[   12.426536]  ? __device_attach_driver+0xe0/0xe0
[   12.433061]  ? __device_attach_driver+0xe0/0xe0
[   12.439538]  bus_for_each_dev+0x78/0xc0
[   12.445294]  bus_add_driver+0x12b/0x1e0
[   12.451021]  driver_register+0x8f/0xe0
[   12.456631]  ? 0xffffffffc07bc000
[   12.461773]  qla2x00_module_init+0x1be/0x229 [qla2xxx]
[   12.468776]  do_one_initcall+0x44/0x200
[   12.474401]  ? load_module+0xad3/0xba0
[   12.479908]  ? kmem_cache_alloc_trace+0x45/0x410
[   12.486268]  do_init_module+0x5c/0x280
[   12.491730]  __do_sys_init_module+0x12e/0x1b0
[   12.497785]  do_syscall_64+0x3b/0x90
[   12.503029]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   12.509764] RIP: 0033:0x7f554f73ab2e

Link: https://lore.kernel.org/r/20220110050218.3958-15-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a60447e7

scsi: qla2xxx: Fix T10 PI tag escape and IP guard options for 28XX adapters · 4c103a80

Joe Carnuccio authored Jan 09, 2022

28XX adapters are capable of detecting both T10 PI tag escape values as
well as IP guard. This was missed due to the adapter type missed in the
corresponding macros. Fix this by adding support for 28xx in those macros.

Link: https://lore.kernel.org/r/20220110050218.3958-14-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Tested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4c103a80

scsi: qla2xxx: edif: Fix clang warning · 73825fd7

Quinn Tran authored Jan 09, 2022

Silence compile warning due to unaligned memory access.

qla_edif.c:713:45: warning: taking address of packed member 'u' of class or
   structure 'auth_complete_cmd' may result in an unaligned pointer value
   [-Waddress-of-packed-member]
    fcport = qla2x00_find_fcport_by_pid(vha, &appplogiok.u.d_id);

Link: https://lore.kernel.org/r/20220110050218.3958-13-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

73825fd7

scsi: qla2xxx: Fix warning for missing error code · 14cb838d

Nilesh Javali authored Jan 09, 2022

Fix smatch-reported warning message:

drivers/scsi/qla2xxx/qla_target.c:3324 qlt_xmit_response() warn: missing error
code 'res'

Link: https://lore.kernel.org/r/20220110050218.3958-12-njavali@marvell.com
Fixes: 4a8f7101 ("scsi: qla2xxx: Fix unmap of already freed sgl")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

14cb838d

scsi: qla2xxx: Fix device reconnect in loop topology · 8ad4be3d

Arun Easi authored Jan 09, 2022

A device logout in loop topology initiates a device connection teardown
which loses the FW device handle. In loop topo, the device handle is not
regrabbed leading to device login failures and eventually to loss of the
device. Fix this by taking the main login path that does it.

Link: https://lore.kernel.org/r/20220110050218.3958-11-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8ad4be3d

scsi: qla2xxx: Add ql2xnvme_queues module param to configure number of NVMe queues · 65120de2

Shreyas Deodhar authored Jan 09, 2022

Add ql2xnvme_queues module parameter to configure number of NVMe queues

Usage:

Number of NVMe Queues that can be configured.

Final value will be min(ql2xnvme_queues, num_cpus, num_chip_queues),

  1 - Minimum number of queues supported
  8 - Default value
128 - Maximum number of queues supported

Link: https://lore.kernel.org/r/20220110050218.3958-10-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

65120de2

scsi: qla2xxx: Fix wrong FDMI data for 64G adapter · 1cfbbacb

Bikash Hazarika authored Jan 09, 2022

Corrected transmission speed mask values for FC.

Supported Speed: 16 32 20 Gb/s ===> Should be 64 instead of 20
Supported Speed: 16G 32G 48G   ===> Should be 64G instead of 48G

Link: https://lore.kernel.org/r/20220110050218.3958-9-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1cfbbacb

scsi: qla2xxx: Add retry for exec firmware · 355f5ffe

Quinn Tran authored Jan 09, 2022

Per FW request, Exec FW can fail due to temporary error resulting in driver
not attaching to the adapter. Add retry of this command up to 4 retries.

Link: https://lore.kernel.org/r/20220110050218.3958-8-njavali@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

355f5ffe

scsi: qla2xxx: Fix scheduling while atomic · afd438ff

Quinn Tran authored Jan 09, 2022

The driver makes a call into midlayer (fc_remote_port_delete) which can put
the thread to sleep. The thread that originates the call is in interrupt
context. The combination of the two trigger a crash. Schedule the call in
non-interrupt context where it is more safe.

kernel: BUG: scheduling while atomic: swapper/7/0/0x00010000
kernel: Call Trace:
kernel:  <IRQ>
kernel:  dump_stack+0x66/0x81
kernel:  __schedule_bug.cold.90+0x5/0x1d
kernel:  __schedule+0x7af/0x960
kernel:  schedule+0x28/0x80
kernel:  schedule_timeout+0x26d/0x3b0
kernel:  wait_for_completion+0xb4/0x140
kernel:  ? wake_up_q+0x70/0x70
kernel:  __wait_rcu_gp+0x12c/0x160
kernel:  ? sdev_evt_alloc+0xc0/0x180 [scsi_mod]
kernel:  synchronize_sched+0x6c/0x80
kernel:  ? call_rcu_bh+0x20/0x20
kernel:  ? __bpf_trace_rcu_invoke_callback+0x10/0x10
kernel:  sdev_evt_alloc+0xfd/0x180 [scsi_mod]
kernel:  starget_for_each_device+0x85/0xb0 [scsi_mod]
kernel:  ? scsi_init_io+0x360/0x3d0 [scsi_mod]
kernel:  scsi_init_io+0x388/0x3d0 [scsi_mod]
kernel:  device_for_each_child+0x54/0x90
kernel:  fc_remote_port_delete+0x70/0xe0 [scsi_transport_fc]
kernel:  qla2x00_schedule_rport_del+0x62/0xf0 [qla2xxx]
kernel:  qla2x00_mark_device_lost+0x9c/0xd0 [qla2xxx]
kernel:  qla24xx_handle_plogi_done_event+0x55f/0x570 [qla2xxx]
kernel:  qla2x00_async_login_sp_done+0xd2/0x100 [qla2xxx]
kernel:  qla24xx_logio_entry+0x13a/0x3c0 [qla2xxx]
kernel:  qla24xx_process_response_queue+0x306/0x400 [qla2xxx]
kernel:  qla24xx_msix_rsp_q+0x3f/0xb0 [qla2xxx]
kernel:  __handle_irq_event_percpu+0x40/0x180
kernel:  handle_irq_event_percpu+0x30/0x80
kernel:  handle_irq_event+0x36/0x60

Link: https://lore.kernel.org/r/20220110050218.3958-7-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

afd438ff

scsi: qla2xxx: Fix premature hw access after PCI error · e35920ab

Quinn Tran authored Jan 09, 2022

After a recoverable PCI error has been detected and recovered, qla driver
needs to check to see if the error condition still persist and/or wait
for the OS to give the resume signal.

Sep  8 22:26:03 localhost kernel: WARNING: CPU: 9 PID: 124606 at qla_tmpl.c:440
qla27xx_fwdt_entry_t266+0x55/0x60 [qla2xxx]
Sep  8 22:26:03 localhost kernel: RIP: 0010:qla27xx_fwdt_entry_t266+0x55/0x60
[qla2xxx]
Sep  8 22:26:03 localhost kernel: Call Trace:
Sep  8 22:26:03 localhost kernel: ? qla27xx_walk_template+0xb1/0x1b0 [qla2xxx]
Sep  8 22:26:03 localhost kernel: ? qla27xx_execute_fwdt_template+0x12a/0x160
[qla2xxx]
Sep  8 22:26:03 localhost kernel: ? qla27xx_fwdump+0xa0/0x1c0 [qla2xxx]
Sep  8 22:26:03 localhost kernel: ? qla2xxx_pci_mmio_enabled+0xfb/0x120
[qla2xxx]
Sep  8 22:26:03 localhost kernel: ? report_mmio_enabled+0x44/0x80
Sep  8 22:26:03 localhost kernel: ? report_slot_reset+0x80/0x80
Sep  8 22:26:03 localhost kernel: ? pci_walk_bus+0x70/0x90
Sep  8 22:26:03 localhost kernel: ? aer_dev_correctable_show+0xc0/0xc0
Sep  8 22:26:03 localhost kernel: ? pcie_do_recovery+0x1bb/0x240
Sep  8 22:26:03 localhost kernel: ? aer_recover_work_func+0xaa/0xd0
Sep  8 22:26:03 localhost kernel: ? process_one_work+0x1a7/0x360
..
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-8041:22: detected PCI
disconnect.
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-107ff:22:
qla27xx_fwdt_entry_t262: dump ram MB failed. Area 5h start 198013h end 198013h
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-107ff:22: Unable to
capture FW dump
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-1015:22: cmd=0x0,
waited 5221 msecs
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-680d:22: mmio
enabled returning.
Sep  8 22:26:03 localhost kernel: qla2xxx [0000:42:00.2]-d04c:22: MBX
Command timeout for cmd 0, iocontrol=ffffffff jiffies=10140f2e5
mb[0-3]=[0xffff 0xffff 0xffff 0xffff]

Link: https://lore.kernel.org/r/20220110050218.3958-6-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e35920ab

scsi: qla2xxx: Fix warning message due to adisc being flushed · 64f24af7

Quinn Tran authored Jan 09, 2022

Fix warning message due to adisc being flushed.  Linux kernel triggered a
warning message where a different error code type is not matching up with
the expected type. Add additional translation of one error code type to
another.

WARNING: CPU: 2 PID: 1131623 at drivers/scsi/qla2xxx/qla_init.c:498
qla2x00_async_adisc_sp_done+0x294/0x2b0 [qla2xxx]
CPU: 2 PID: 1131623 Comm: drmgr Not tainted 5.13.0-rc1-autotest #1
..
GPR28: c000000aaa9c8890 c0080000079ab678 c00000140a104800 c00000002bd19000
NIP [c00800000790857c] qla2x00_async_adisc_sp_done+0x294/0x2b0 [qla2xxx]
LR [c008000007908578] qla2x00_async_adisc_sp_done+0x290/0x2b0 [qla2xxx]
Call Trace:
[c00000001cdc3620] [c008000007908578] qla2x00_async_adisc_sp_done+0x290/0x2b0 [qla2xxx] (unreliable)
[c00000001cdc3710] [c0080000078f3080] __qla2x00_abort_all_cmds+0x1b8/0x580 [qla2xxx]
[c00000001cdc3840] [c0080000078f589c] qla2x00_abort_all_cmds+0x34/0xd0 [qla2xxx]
[c00000001cdc3880] [c0080000079153d8] qla2x00_abort_isp_cleanup+0x3f0/0x570 [qla2xxx]
[c00000001cdc3920] [c0080000078fb7e8] qla2x00_remove_one+0x3d0/0x480 [qla2xxx]
[c00000001cdc39b0] [c00000000071c274] pci_device_remove+0x64/0x120
[c00000001cdc39f0] [c0000000007fb818] device_release_driver_internal+0x168/0x2a0
[c00000001cdc3a30] [c00000000070e304] pci_stop_bus_device+0xb4/0x100
[c00000001cdc3a70] [c00000000070e4f0] pci_stop_and_remove_bus_device+0x20/0x40
[c00000001cdc3aa0] [c000000000073940] pci_hp_remove_devices+0x90/0x130
[c00000001cdc3b30] [c0080000070704d0] disable_slot+0x38/0x90 [rpaphp] [
c00000001cdc3b60] [c00000000073eb4c] power_write_file+0xcc/0x180
[c00000001cdc3be0] [c0000000007354bc] pci_slot_attr_store+0x3c/0x60
[c00000001cdc3c00] [c00000000055f820] sysfs_kf_write+0x60/0x80 [c00000001cdc3c20]
[c00000000055df10] kernfs_fop_write_iter+0x1a0/0x290
[c00000001cdc3c70] [c000000000447c4c] new_sync_write+0x14c/0x1d0
[c00000001cdc3d10] [c00000000044b134] vfs_write+0x224/0x330
[c00000001cdc3d60] [c00000000044b3f4] ksys_write+0x74/0x130
[c00000001cdc3db0] [c00000000002df70] system_call_exception+0x150/0x2d0
[c00000001cdc3e10] [c00000000000d45c] system_call_common+0xec/0x278

Link: https://lore.kernel.org/r/20220110050218.3958-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

64f24af7

scsi: qla2xxx: Fix stuck session in gpdb · 725d3a0d

Quinn Tran authored Jan 09, 2022

Fix stuck sessions in get port database. When a thread is in the process of
re-establishing a session, a flag is set to prevent multiple threads /
triggers from doing the same task. This flag was left on, where any attempt
to relogin was locked out. Clear this flag, if the attempt has failed.

Link: https://lore.kernel.org/r/20220110050218.3958-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

725d3a0d

scsi: qla2xxx: Implement ref count for SRB · 31e6cdbe

Saurav Kashyap authored Jan 09, 2022

The timeout handler and the done function are racing. When
qla2x00_async_iocb_timeout() starts to run it can be preempted by the
normal response path (via the firmware?). qla24xx_async_gpsc_sp_done()
releases the SRB unconditionally. When scheduling back to
qla2x00_async_iocb_timeout() qla24xx_async_abort_cmd() will access an freed
sp->qpair pointer:

qla2xxx [0000:83:00.0]-2871:0: Async-gpsc timeout - hdl=63d portid=234500 50:06:0e:80:08:77:b6:21.
qla2xxx [0000:83:00.0]-2853:0: Async done-gpsc res 0, WWPN 50:06:0e:80:08:77:b6:21
qla2xxx [0000:83:00.0]-2854:0: Async-gpsc OUT WWPN 20:45:00:27:f8:75:33:00 speeds=2c00 speed=0400.
qla2xxx [0000:83:00.0]-28d8:0: qla24xx_handle_gpsc_event 50:06:0e:80:08:77:b6:21 DS 7 LS 6 rc 0 login 1|1 rscn 1|0 lid 5
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: qla24xx_async_abort_cmd+0x1b/0x1c0 [qla2xxx]

Obvious solution to this is to introduce a reference counter. One reference
is taken for the normal code path (the 'good' case) and one for the timeout
path. As we always race between the normal good case and the timeout/abort
handler we need to serialize it. Also we cannot assume any order between
the handlers. Since this is slow path we can use proper synchronization via
locks.

When we are able to cancel a timer (del_timer returns 1) we know there
can't be any error handling in progress because the timeout handler hasn't
expired yet, thus we can safely decrement the refcounter by one.

If we are not able to cancel the timer, we know an abort handler is
running. We have to make sure we call sp->done() in the abort handlers
before calling kref_put().

Link: https://lore.kernel.org/r/20220110050218.3958-3-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

31e6cdbe

scsi: qla2xxx: Refactor asynchronous command initialization · d4523bd6

Daniel Wagner authored Jan 09, 2022

Move common open-coded asynchronous command initializing code such as
setting up the timer and the done callback into one function. This is a
preparation step and allows us later on to change the low level error flow
handling at a central place.

Link: https://lore.kernel.org/r/20220110050218.3958-2-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d4523bd6