Commits · cc8ed1a9d65c537742b40658d94e0ebea9cb0e81 · Kirill Smelkov / linux

19 Feb, 2019 1 commit

scsi: dt-bindings: ufs: Add HI3670 UFS controller binding · cc8ed1a9

Manivannan Sadhasivam authored Jan 05, 2019

Add devicetree binding for HI3670 UFS controller. HI3760 SoC is very
similar to HI3660 SoC with almost same IPs. Only major difference in terms
of UFS is the PHY. HI3670 has 10nm PHY. But since the original driver
(HI3660 UFS) cannot make HI3670 UFS functional, a separate compatible
is added for HI3670 without any fallback.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Wei Li <liwei213@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cc8ed1a9

14 Feb, 2019 2 commits

scsi: lpfc: fix a handful of indentation issues · 258f84fa

Colin Ian King authored Feb 12, 2019

There are a handful of statements that are indented incorrectly. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

258f84fa

scsi: qlogicpti: Use of_node_name_eq for node name comparisons · 7f8e12f1

Rob Herring authored Feb 13, 2019

Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

As prom_name is not used for anything else, remove it.

Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7f8e12f1

13 Feb, 2019 5 commits

scsi: sd: Fix typo in sd_first_printk() · df46cac3

Dietmar Hahn authored Feb 05, 2019

Commit b2bff6ce ("[SCSI] sd: Quiesce mode sense error messages")
added the macro sd_first_printk(). The macro takes "sdsk" as argument
but dereferences "sdkp". This hasn't caused any real issues since all
callers of sd_first_printk() have an sdkp. But fix the typo.

[mkp: Turned this into a real patch and tweaked commit description]
Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

df46cac3

scsi: megaraid_sas: driver version update · 0de05405

Shivasharan S authored Feb 08, 2019

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0de05405

scsi: megaraid_sas: Update structures for HOST_DEVICE_LIST DCMD · a3742d68

Shivasharan S authored Feb 08, 2019

Add padding to make the structure variables in MR_HOST_DEVICE_LIST_ENTRY
64-bit aligned.  Also, add reserved fields to MR_HOST_DEVICE_LIST for
future firmware usage.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a3742d68

scsi: lpfc: Fix error code if kcalloc() fails · fad28e3d

Dan Carpenter authored Feb 11, 2019

This should return -ENOMEM if kcalloc() fails, but it accidentally
returns success instead.

Fixes: 6a828b0f ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fad28e3d

scsi: ufs: fix a typo in comment · 2174b185

Chengguang Xu authored Feb 10, 2019

poitner -> pointer.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Pedro Sousa <pedrom.sousa@synopsys.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2174b185

12 Feb, 2019 7 commits

scsi: scsi_debug: Implement support for write protect · 9447b6ce

Martin K. Petersen authored Feb 08, 2019

Teach scsi_debug to honor SWP in the Control Mode Page and report the
resulting WP state in the Device-Specific Parameter field.

In check_device_access_params() verify that commands that will write
the medium are permitted to do so.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>

9447b6ce

scsi: core: Move resid from scsi_data_buffer to scsi_cmnd · 9fa505ad

Bart Van Assche authored Feb 08, 2019

This patch does not change any functionality but reduces the size of
struct scsi_cmnd.

Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9fa505ad

scsi: sd: Remove superfluous residual assignments · 80f82c16

Bart Van Assche authored Feb 08, 2019

Since commit 26e85fcd ("[SCSI] sd: Permit merged discard requests";
kernel v3.10) sd_done() sets the residual not only for failed special
requests but also for special requests that succeeded. Hence remove the
code from functions called by sd_init_command() that sets the residual.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

80f82c16

scsi: uas: Use scsi_[gs]et_resid() where appropriate · 229531be

Bart Van Assche authored Feb 08, 2019

This patch does not change any functionality.

Cc: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

229531be

scsi: scsi_debug: Use scsi_[gs]et_resid() where appropriate · 42d387be

Bart Van Assche authored Feb 08, 2019

This patch does not change any functionality.

Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

42d387be

scsi: libiscsi: Use scsi_[gs]et_resid() where appropriate · 960bf87a

Bart Van Assche authored Feb 08, 2019

This patch does not change any functionality.

Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

960bf87a

scsi: scsi_debug: Fix a recently introduced regression · c208556a

Bart Van Assche authored Feb 08, 2019

A recent commit removed an element from opcode_info_arr[] but did not
modify opcode_ind_arr[] nor was SDEB_I_XDWRITEREAD removed. Remove
SDEB_I_XDWRITEREAD and bring the two arrays again in sync. This patch
avoids that the following is reported:

BUG: KASAN: null-ptr-deref in scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
Read of size 1 at addr 0000000000000001 by task iscsi-test-cu/683
CPU: 3 PID: 683 Comm: iscsi-test-cu Not tainted 5.0.0-rc5-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 kasan_report.cold.3+0x5/0x3e
 __asan_load1+0x47/0x50
 scsi_debug_queuecommand+0x60f/0xc90 [scsi_debug]
 scsi_queue_rq+0xc17/0x12e0
 blk_mq_dispatch_rq_list+0x5fc/0xb10
 blk_mq_sched_dispatch_requests+0x2f7/0x300
 __blk_mq_run_hw_queue+0xd6/0x180
 __blk_mq_delay_run_hw_queue+0x25c/0x290
 blk_mq_run_hw_queue+0x119/0x1b0
 blk_mq_sched_insert_request+0x274/0x350
 blk_execute_rq_nowait+0x78/0x90
 blk_execute_rq+0xcc/0x140
 sg_io+0x30f/0x700
 scsi_cmd_ioctl+0x4d4/0x540
 scsi_cmd_blk_ioctl+0x7b/0x8b
 sd_ioctl+0xba/0x150
 blkdev_ioctl+0x6e1/0xea0
 block_ioctl+0x79/0x90
 do_vfs_ioctl+0x12b/0x9b0
 ksys_ioctl+0x41/0x80
 __x64_sys_ioctl+0x43/0x50
 do_syscall_64+0x71/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Cc: Christoph Hellwig <hch@lst.de>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Fixes: ae3d56d8 ("scsi: remove bidirectional command support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c208556a

08 Feb, 2019 8 commits

scsi: hisi_sas: Do some more tidy-up · 4a8bec88

John Garry authored Feb 06, 2019

Do some very minor tidy-up, for things like needlessly initing variable and
not leaving whitespace before quote endings.

Originally-from: Xiang Chen <chenxiang66@hisilicon.com>
Originally-from: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4a8bec88

scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental · 4fefe5bb

Xiang Chen authored Feb 06, 2019

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts are
processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to the upper
layer for SCSI MQ, but we need to sort out the per-HBA tags performance
issue.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4fefe5bb

scsi: hisi_sas: Issue internal abort on all relevant queues · 795f25a3

John Garry authored Feb 06, 2019

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort is
processed for a single IO or a device after all the relevant command(s)
which it is attempting to abort have been processed by the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue will
be processed in order, and we will not have a scenario where the internal
abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so this is
not safe for internal abort since it would definitely not be guaranteed
that commands for the command devices are issued on the same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

795f25a3

scsi: hisi_sas: change queue depth from 512 to 4096 · 1273d65f

Xiang Chen authored Feb 06, 2019

If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512 to
4096 which is the max number of IOs for v3 hw.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1273d65f

scsi: hisi_sas: Add manual trigger for debugfs dump · 7c5e1363

Luo Jiaxing authored Feb 06, 2019

Add an interface to manually trigger a debugfs dump.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7c5e1363

scsi: hisi_sas: Add support for DIX feature for v3 hw · b3cce125

Xiang Chen authored Feb 06, 2019

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly is adding new
DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They were
detected by checkpatch --strict.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b3cce125

scsi: dt-bindings: ufs: Fix the compatible string definition · 1ace9f00

Douglas Anderson authored Oct 12, 2018

If you look at the bindings for the UFS Host Controller it says:

- compatible: must contain "jedec,ufs-1.1" or "jedec,ufs-2.0", may
              also list one or more of the following:
                 "qcom,msm8994-ufshc"
                 "qcom,msm8996-ufshc"
                 "qcom,ufshc"

My reading of that is that it's fine to just have either of these:
1. "qcom,msm8996-ufshc", "jedec,ufs-2.0"
2. "qcom,ufshc", "jedec,ufs-2.0"

As far as I can tell neither of the above is actually a good idea.

For #1 it turns out that the driver currently only keys off the
compatible string "qcom,ufshc" so it won't actually probe.

For #2 the driver won't probe but it's not a good idea to keep the SoC
name out of the compatible string.

Let's update the compatible string to make it really explicit.  We'll
include a nod to the existing driver and the old binding and say that
we should always include the "qcom,ufshc" string in addition to the
SoC compatible string.

While we're at it we'll also include another example SoC known to have
UFS: sdm845.

Fixes: 47555a5c ("scsi: ufs: make the UFS variant a platform device")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1ace9f00

scsi: ata: Use unsigned int for cmd's type in ioctls in scsi_host_template · 6f4e626f

Nathan Chancellor authored Feb 07, 2019

Clang warns several times in the scsi subsystem (trimmed for brevity):

drivers/scsi/hpsa.c:6209:7: warning: overflow converting case value to
switch condition type (2147762695 to 18446744071562347015) [-Wswitch]
        case CCISS_GETBUSTYPES:
             ^
drivers/scsi/hpsa.c:6208:7: warning: overflow converting case value to
switch condition type (2147762694 to 18446744071562347014) [-Wswitch]
        case CCISS_GETHEARTBEAT:
             ^

The root cause is that the _IOC macro can generate really large numbers,
which don't fit into type 'int', which is used for the cmd parameter in
the ioctls in scsi_host_template. My research into how GCC and Clang are
handling this at a low level didn't prove fruitful. However, looking at
the rest of the kernel tree, all ioctls use an 'unsigned int' for the
cmd parameter, which will fit all of the _IOC values in the scsi/ata
subsystems.

Make that change because none of the ioctls expect a negative value for
any command, it brings the ioctls inline with the reset of the kernel,
and it removes ambiguity, which is never good when dealing with compilers.

Link: https://github.com/ClangBuiltLinux/linux/issues/85
Link: https://github.com/ClangBuiltLinux/linux/issues/154
Link: https://github.com/ClangBuiltLinux/linux/issues/157Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Bradley Grove <bgrove@attotech.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6f4e626f

06 Feb, 2019 17 commits

scsi: lpfc: Update lpfc version to 12.2.0.0 · 42fb055a

James Smart authored Jan 28, 2019

Update lpfc version to 12.2.0.0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

42fb055a

scsi: lpfc: Update 12.2.0.0 file copyrights to 2019 · 0d041215

James Smart authored Jan 28, 2019

For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0d041215

scsi: lpfc: Fix nvmet issues when link bounce under IO load · c160c0f8

James Smart authored Jan 28, 2019

Various null pointer dereference and general protection fault panics occur
when there is a link bounce under load. There are a large number of "error"
message 6413 indicating "bad release".

The issues resolve to list corruptions due to missing or inconsistent lock
protection. Lockups are due to nested locks in the unsolicited abort
path. The unsolicited abort path calls the wrong abort processing
routine. There was also duplicate context release while aborts were still
active in the hardware.

Removed duplicate locks and added lock protection around list item
removal. Commonized lock handling around the abort processing routines.
Prevent context release while still in ABTS list.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c160c0f8

scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall · 472e146d

James Smart authored Jan 28, 2019

When the transport calls into the lpfc target to release an IO job
structure, which corresponds to an exchange, and if the driver was waiting
for an exchange in order to post a previously received command to the
transport, the driver immediately takes the IO job and reuses the context
for the prior command and calls nvmet_fc_rcv_fcp_req() to tell the
transport about a newly received command.

Problem is, the execution of the IO job release may be in the context of
the back end driver and its bio completion handlers, thus it may be in a
irq context and protection code kicks in in the bio and request layers that
are subsequently called.

Rework lpfc so that instead of immediately upcalling, queue it to a
deferred work thread and have the thread make the upcall.

Took advantage of this change to remove duplicated code with the normal
command receive path that preps the IO job and upcalls nvmet_fc. Created a
common routine both paths use.

Also corrected some errors that were found during review of the context
freeing and reuse - basically unlocked operations and a somewhat disjoint
set of calls to release associated job elements. Cleaned up this path and
added locks for coherency.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

472e146d

scsi: lpfc: Fix default driver parameter collision for allowing NPIV support · f6e84790

James Smart authored Jan 28, 2019

The conversion to enable SCSI and NVME fc4 support ran into an issue with
NPIV support. With NVME, NPIV is not currently supported, but with SCSI it
was. The driver reverted to its lowest setting meaning NPIV with SCSI was
not allowed.

Convert the NPIV checks and implementation so that SCSI can continue to
allow NPIV support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f6e84790

scsi: lpfc: Rework locking on SCSI io completion · c2017260

James Smart authored Jan 28, 2019

A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.

Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c2017260

scsi: lpfc: Enable SCSI and NVME fc4s by default · b1684a0b

James Smart authored Jan 28, 2019

Now that performance mods don't split resources by protocol and enable both
protocols by default, there's no reason not to enable concurrent SCSI and
NVME fc4 support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b1684a0b

scsi: lpfc: Resize cpu maps structures based on possible cpus · 222e9239

James Smart authored Jan 28, 2019

The work done to date utilized the number of present cpus when sizing
per-cpu structures. Structures should have been sized based on the max
possible cpu count.

Convert the driver over to possible cpu count for sizing allocation.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

222e9239

scsi: lpfc: Utilize new IRQ API when allocating MSI-X vectors · 75508a8b

James Smart authored Jan 28, 2019

Current driver uses the older IRQ API for MSIX allocation

Change driver to utilize pci_alloc_irq_vectors when allocating IRQ vectors.

Make lpfc_cpu_affinity_check use pci_irq_get_affinity to determine how the
kernel mapped all the IRQs.

Remove msix_entries from SLI4 structure, replaced with pci_irq_vector()
usage.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

75508a8b

scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing · 32517fc0

James Smart authored Jan 28, 2019

When driving high iop counts, auto_imax coalescing kicks in and drives the
performance to extremely small iops levels.

There are two issues:

 1) auto_imax is enabled by default. The auto algorithm, when iops gets
    high, divides the iops by the hdwq count and uses that value to
    calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
    they have load or not. The EQ_delay is only manipulated every 5s (a
    long time). Thus there were large 5s swings of no interrupt delay
    followed by large/maximum delay, before repeating.

 2) When processing a CQ, the driver got mixed up on the rate of when
    to ring the doorbell to keep the chip appraised of the eqe or cqe
    consumption as well as how how long to sit in the thread and
    process queue entries. Currently, the driver capped its work at
    64 entries (very small) and exited/rearmed the CQ.  Thus, on heavy
    loads, additional overheads were taken to exit and re-enter the
    interrupt handler. Worse, if in the large/maximum coalescing
    windows,k it could be a while before getting back to servicing.

The issues are corrected by the following:

 - A change in defaults. Auto_imax is turned OFF and fcp_imax is set
   to 0. Thus all interrupts are immediate.

 - Cleanup of field names and their meanings. Existing names were
   non-intuitive or used for duplicate things.

 - Added max_proc_limit field, to control the length of time the
   handlers would service completions.

 - Reworked EQ handling:
    Added common routine that walks eq, applying notify interval and max
      processing limits. Use queue_claimed to claim ownership of the queue
      while processing. Always rearm the queue whenever the common routine
      is called.
    Rework queue element processing, namely to eliminate hba_index vs
      host_index. Only one index is necessary. The queue entry can be
      marked invalid and the host_index updated immediately after eqe
      processing.
    After rework, xx_release routines are now DB write functions. Renamed
      the routines as such.
    Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
    Replaced the 2 individual loops that walk an eq with a call to the
      common routine.
    Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
    Added per-cpu counters to detect interrupt rates and scale
      interrupt coalescing values.

 - Reworked CQ handling:
    Added common routine that walks cq, applying notify interval and max
      processing limits. Use queue_claimed to claim ownership of the queue
      while processing. Always rearm the queue whenever the common routine
      is called.
    Rework queue element processing, namely to eliminate hba_index vs
      host_index. Only one index is necessary. The queue entry can be
      marked invalid and the host_index updated immediately after cqe
      processing.
    After rework, xx_release routines are now DB write functions.  Renamed
      the routines as such.
    Replaced the 3 individual loops that walk a cq with a call to the
      common routine.
    Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
      queue reference. Add increment for mbox completion to handler.

 - Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
   dynamic changing of the CQ max_proc_limit value being used.

Although this leaves an EQ as an immediate interrupt, that interrupt will
only occur if a CQ bound to it is in an armed state and has cqe's to
process.  By staying in the cq processing routine longer, high loads will
avoid generating more interrupts as they will only rearm as the processing
thread exits. The immediately interrupt is also beneficial to idle or
lower-processing CQ's as they get serviced immediately without being
penalized by sharing an EQ with a more loaded CQ.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

32517fc0

scsi: lpfc: cleanup: convert eq_delay to usdelay · cb733e35

James Smart authored Jan 28, 2019

Review of the eq coalescing logic showed the code was a bit fragmented.
Sometimes it would save/set via an interrupt max value, while in others it
would do so via a usdelay. There were also two places changing eq delay,
one place that issued mailbox commands, and another that changed via
register writes if supported.

Clean this up by:

 - Standardizing the operation of lpfc_modify_hba_eq_delay() routine so
   that it is always told of a us delay to impose. The routine then chooses
   the best way to set that - via register or via mbx.

 - Rather than two value types stored in eq->q_mode (usdelay if change via
   register, imax if change via mbox) - q_mode always contains usdelay.
   Before any value change, old vs new value is compared and only if
   different is a change done.

 - Revised the dmult calculation. dmult is not set based on overall imax
   divided by hardware queues - instead imax applies to a single cpu and
   the value will be replicated to all cpus.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cb733e35

scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues · 6a828b0f

James Smart authored Jan 28, 2019

So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.

This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6a828b0f

scsi: lpfc: Fix setting affinity hints to correlate with hardware queues · b3295c2a

James Smart authored Jan 28, 2019

The desired affinity for the hardware queue behavior is for hdwq 0 to be
affinitized with cpu 0, hdwq 1 to cpu 1, and so on.  The implementation so
far does not do this if the number of cpus is greater than the number of
hardware queues (e.g. hardware queue allocation was administratively
reduced or hardware queue resources could not scale to the cpu count).

Correct the queue affinitization logic when queue count is less than
cpu count.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b3295c2a

scsi: lpfc: Allow override of hardware queue selection policies · 45aa312e

James Smart authored Jan 28, 2019

Default behavior is to use the information from the upper IO stacks to
select the hardware queue to use for IO submission.  Which typically has
good cpu affinity.

However, the driver, when used on some variants of the upstream kernel, has
found queuing information to be suboptimal for FCP or IO completion locked
on particular cpus.

For command submission situations, the lpfc_fcp_io_sched module parameter
can be set to specify a hardware queue selection policy that overrides the
os stack information.

For IO completion situations, rather than queing cq processing based on the
cpu servicing the interrupting event, schedule the cq processing on the cpu
associated with the hardware queue's cq.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

45aa312e

scsi: lpfc: Adapt partitioned XRI lists to efficient sharing · c490850a

James Smart authored Jan 28, 2019

The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.

Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.

On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.

On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.

Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c490850a

scsi: lpfc: Synchronize hardware queues with SCSI MQ interface · ace44e48

James Smart authored Jan 28, 2019

Now that the lower half has much better per-cpu parallelization using the
hardware queues, the SCSI MQ support needs to be tied into it.

The involves the following mods:

 - Use the hardware queue info from the midlayer to help select the
   hardware queue to utilize. This required change to the get_scsi-buf_xxx
   routines.

 - Remove lpfc_sli4_scmd_to_wqidx_distr() routine. No longer needed.

 - Includes fix for SLI-3 that does not have multi queue parallelization.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ace44e48

scsi: lpfc: Convert ring number to hardware queue for nvme wqe posting. · 1fbf9742

James Smart authored Jan 28, 2019

SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.

Replace ring number with the hardware queue that should be used.

Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1fbf9742