Commits · 439ae285b9d9da12343e3d51a0dff1b473a6a6ac · Kirill Smelkov / linux

23 Mar, 2017 34 commits

scsi: ipr: Fix abort path race condition · 439ae285

Brian King authored Mar 15, 2017

This fixes a race condition in the error handlomg paths of ipr. While a
command is outstanding to the adapter, it is placed on a pending queue
for the hrrq it is associated with, while holding the HRRQ lock. When a
command is completed, it is removed from the pending queue, under HRRQ
lock, and placed on a local list. This list is then iterated through
without any locks and each command's done function is invoked, inside of
which, the command gets returned to the free list while grabbing the
HRRQ lock. This fixes two race conditions when commands have been
removed from the pending list but have not yet been added to the free
list. Both of these changes fix race conditions that could result in
returning success from eh_abort_handler and then later calling scsi_done
for the same request.

The first race condition is in ipr_cancel_op. It looks through each
pending queue to see if the command to be aborted is still outstanding
or not. Rather than looking on the pending queue, reverse the logic to
check to look for commands that are NOT on the free queue. The second
race condition can occur when in ipr_wait_for_ops where we are waiting
for responses for commands we've aborted.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

439ae285

scsi: ipr: Remove redundant initialization · 960e9648

Brian King authored Mar 15, 2017

Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd
done function. This should have already been setup at command allocation
time and if its since been changed, it means we are in the ipr_erp*
functions and need to wait for them to complete and don't want to
override that here.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

960e9648

scsi: ipr: Fix missed EH wakeup · 66a0d59c

Brian King authored Mar 15, 2017

Following a command abort or device reset, ipr's EH handlers wait for
the commands getting aborted to get sent back from the adapter prior to
returning from the EH handler. This fixes up some cases where the
completion handler was not getting called, which would have resulted in
the EH thread waiting until it timed out, greatly extending EH time.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

66a0d59c

scsi: hisi_sas: add is_sata_phy_v2_hw() · 4935933e

Xiaofei Tan authored Mar 23, 2017

Add helper function is_sata_phy_v2_hw() to judge whether the attached
device is SATA disk for a root PHY.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4935933e

scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk · 6073b771

Xiang Chen authored Mar 23, 2017

When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is
SATA or SAS disk.  So use dev_is_sata to identify SATA or SAS disk.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6073b771

scsi: hisi_sas: check hisi_sas_lu_reset() error message · 14d3f397

John Garry authored Mar 23, 2017

Unless we actually get some sort of failure in hisi_sas_lu_reset(),
don't print a message.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

14d3f397

scsi: hisi_sas: release SMP slot in lldd_abort_task · ccbfe5a0

Xiang Chen authored Mar 23, 2017

When an SMP task timeouts, it will call lldd_abort_task to release the
associated slot, and then will release the sas_task.

Currently in lldd_abort_task, if we fail to internally abort IO, then
the slot of SMP IO is not released, but sas_task will still be later
released, so the slot's sas_task is NULL, which will cause NULL pointer
when hisi_sas_slot_task_free happens later.

To resolve, check the return value of internal abort, and release the
slot if it failed.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ccbfe5a0

scsi: hisi_sas: add hisi_sas_clear_nexus_ha() · 8b05ad6a

John Garry authored Mar 23, 2017

Add function for upper-layer to reset controller when all else fails.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8b05ad6a

scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link · 4df642db

John Garry authored Mar 23, 2017

For consistency, remove the "hisi_sas_" prefix.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4df642db

scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq · 981843c6

Xiaofei Tan authored Mar 23, 2017

Handle the situation that PHY UP and DOWN irq happen simultaneously.
There is no mechanism of SoC HW to ensure this situation will never
happen. So, we add this handle just in case.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

981843c6

scsi: hisi_sas: some modifications to v2 hw reg init values · f1dc7518

John Garry authored Mar 23, 2017

This patch includes:
(1) Disable transport layer retry
(2) Support CQ time and count interrupt coal
(3) fix link FIFO full issue
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Zhao Nenglong <zhaonenglong@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f1dc7518

scsi: hisi_sas: process error codes according to their priority · 634a9585

Xiang Chen authored Mar 23, 2017

There are some rules to decide which error code has the high priority
when errors happen together:

(1) Error phase of CQ decides the error happens on RX or TX;

(2) For TX error, when DMA/TRANS TX error happen simultaneously, the
    priority of DMA TX error is higher than TRANS TX error, so for the
    priority of TX error: DW2 (DMA TX part) > DW0;

(3) For RX error, when TRANS/DMA/SIPC RX error happen simultaneously,
    the priority of TRANS RX error is higher than DMA and SIPC RX error,
    and we should also keep the rules (the priority of DW3 > DW2), so
    for the priority of RX error: DW1 > DW3 > DW2(SIPC RX part);

(4) There are also a priority we should keep in the same error type.

So, modify slot error code to handle this.

In addition to this, some some error codes are modified according to
recommendation from SoC designer.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

634a9585

scsi: hisi_sas: remove task free'ing for timeouts · 6fcdda80

John Garry authored Mar 23, 2017

When a TMF or internal abort times-out, do not free slot. We expect this
to be done upon later escalated error handling.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6fcdda80

scsi: hisi_sas: fix some sas_task.task_state_lock locking · 54c9dd2d

John Garry authored Mar 23, 2017

Some more locking needs to be added/modified for when
read-modify-writing sas_task.task_state_flags.

Note: since we can attempt to grab this lock in interrupt
      context we should use irq variant of spin_lock.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

54c9dd2d

scsi: hisi_sas: free slots after hardreset · 6131243a

Xiang Chen authored Mar 23, 2017

After hardreset, we clear up IOs of remote disks, so we need to free
those slots in LLDD.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6131243a

scsi: hisi_sas: check for SAS_TASK_STATE_ABORTED in slot complete · a305f337

John Garry authored Mar 23, 2017

Check in slot_complete_v2_hw() for whether a task has already been
completed by upper layer.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a305f337

scsi: hisi_sas: hardreset for SATA disk in LU reset · 055945df

John Garry authored Mar 23, 2017

When issuing an LU reset for a SATA target, issue an internal abort and
a hard reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

055945df

scsi: hisi_sas: modify hisi_sas_abort_task() for SSP · c35279f2

John Garry authored Mar 23, 2017

Currently an internal abort is executed regardless of the result of the
TMF. We should also check the result of the internal abort to see if we
should free the slot.

So change the status code STAT_IO_COMPLETE to TMF_RESP_FUNC_SUCC,
meaning the slot has been successfully aborted.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c35279f2

scsi: hisi_sas: modify error handling for v2 hw · fc866951

Xiang Chen authored Mar 23, 2017

For error codes which need abort-and-retry, simulate IO timeout and let
SCSI+ATA layers process those errors.

Previously for SSP, we should try to abort the IO in the LLDD and then
pass back to upper layer, but sometimes this would also error. So
Instead of adding special error handling for this scenario in the LLDD,
allow the upper layer to handle completely.

No performance hit is seen by taking this approach.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fc866951

scsi: hisi_sas: only reset link for PHY_FUNC_LINK_RESET · b4c67a6c

John Garry authored Mar 23, 2017

We currently do a hard reset for a link reset. Change this to do a link
reset only.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b4c67a6c

scsi: hisi_sas: error hisi_sas_task_prep() when port down · ddabca21

John Garry authored Mar 23, 2017

When sas_port is NULL, then return SAS_PHY_DOWN.

In addition, when the sas_dev is gone then explicitly return
SAS_PHY_DOWN.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ddabca21

scsi: hisi_sas: remove hisi_sas_port_deformed() · 405314df

John Garry authored Mar 23, 2017

Currently when a root PHY is deformed from a asd_sas_port we try to
release the slots in the LLDD, and fail.

Regardless, it is not right to release this early.

This patch removes the deformed function. As it was before, port
deformation is still done in hisi_sas_phy_down().

It would be nice to actually remove the hisi_sas_port_{de}formed() pair,
however we cannot as we need to know the asd_sas_port index libsas has
associated with an asd_sas_phy.

The hw does actually generate a port id for a PHY, but this seems to a
random number, so ignored for this purpose.

This patch also changes the code to link slots to the hisi_sas_device,
and not hisi_sas_port.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

405314df

scsi: hisi_sas: add softreset function for SATA disk · 7c594f04

Xiang Chen authored Mar 23, 2017

Add softreset to clear IO after internal abort device for SATA disk.

The SATA error handling for the controller is based on device internal
abort and softreset function.

The controller does not support internal abort for single IO, so we need
to execute internal abort for device.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7c594f04

scsi: hisi_sas: move PHY init to hisi_sas_scan_start() · 396b8044

John Garry authored Mar 23, 2017

Relocate the PHY init code from LLDD hw init path to
hisi_sas_scan_start().
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

396b8044

scsi: hisi_sas: add controller reset · 06ec0fb9

Xiang Chen authored Mar 23, 2017

There are some scenarios that we need to warm-reset to reset registers
of SAS controller. During reset we disable interrupts/DQs/PHYs, and
after reset we re-init the hardware and rescan the topology to see if
anything changed.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

06ec0fb9

scsi: hisi_sas: add to_hisi_sas_port() · 2e244f0f

John Garry authored Mar 23, 2017

Introduce function to get hisi_sas_port from asd_sas_port.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2e244f0f

scsi: fnic: bug fix for fip.fip_subcode in fnic_fcoe_send_vlan_req · b8e1aa3c

Satish Kharat authored Mar 22, 2017

This is a bug introduced when they moved the fip subcodes to central
place. Was sending FIP_SC_VL_NOTE in fip.fip_subcode for VLAN request in
fnic_fcoe_send_vlan_req. Change is to use FIP_SC_VL_REQ instead.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b8e1aa3c

scsi: fnic: Adding debug IO and Abort latency counter to fnic stats · 445d2960

Satish Kharat authored Feb 28, 2017

The IO and Abort latency counter counts the time taken to complete the
IO and abort command into broad buckets. This is not intended for
performance measurement, just a debug statistic.  current_max_io_time
tries to keep track of the maximum time an IO has taken to complete if
it is > 30sec.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

445d2960

scsi: fnic: Adding Check Condition counter to misc fnicstats · 39fcbbc0

Satish Kharat authored Feb 28, 2017

Just a simple counter of number of check conditions encountered on that
host.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

39fcbbc0

scsi: fnic: Avoid false out-of-order detection for aborted command · b9202b4a

Satish Kharat authored Feb 28, 2017

If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion, avoid this
being flagged as out-of-order completion by setting the FNIC_IO_DONE
flag in fnic_fcpio_icmnd_cmpl_handler
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b9202b4a

scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative · 7ef539c8

Satish Kharat authored Feb 28, 2017

Fixing the IO stats update (Active IOs and IO completion) to prevent
"Number of Active IOs" from becoming negative in the fnistats output.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7ef539c8

scsi: fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case · ccc6d704

Satish Kharat authored Feb 28, 2017

Getting rid of else case to make the flow look bit simpler.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ccc6d704

scsi: fnic: Ratelimit printks to avoid flooding when vlan is not set by the switch.i · b43abcbb

Satish Kharat authored Feb 28, 2017

This is to avoid the log from being filled with vlan discovery messages
when there is no vlan configured on the switch.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b43abcbb

scsi: fnic: switch to pci_alloc_irq_vectors · cca678df

Christoph Hellwig authored Feb 01, 2017

Not a full cleanup for the IRQ code, for that we'd need to know if the
max number of the various CQ types is going to stay 1 forever.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cca678df

16 Mar, 2017 1 commit

scsi: esas2r: Remove redundant NULL check on buffer · 33bd712b

Colin Ian King authored Mar 15, 2017

Buffer is a pointer to the static char array event_buffer and therefore
can never be null, so the check is redundant. Remove it.

Detected by CoverityScan, CID#1077556 ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bradley Grove <bgrove@attotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

33bd712b

15 Mar, 2017 5 commits

scsi: remove incorrect __exit markups · ba21222d

Dmitry Torokhov authored Mar 01, 2017

Even if bus is not hot-pluggable, devices can be unbound from the driver
via sysfs, so we should not be using __exit annotations on remove()
methods. The only exception is drivers registered with
platform_driver_probe() which specifically disables sysfs bind/unbind
attributes.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ba21222d

scsi: stex: Add S6 support · 61b745fa

Charles authored Feb 17, 2017

1. Add reboot notifier and register it in stex_probe for all supported
   device.

2. For all supported device in restart flow, we get a callback from
   notifier and set S6flag for stex_shutdown & stex_hba_stop to send
   restart command to FW.
Signed-off-by: Charles.Chiou <charles.chiou@tw.promise.com>
Signed-off-by: Paul.Lyu <paul.lyu@tw.promise.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

61b745fa

scsi: stex: Support Pegasus 3 product · d6570227

Charles authored Feb 17, 2017

Pegasus series is a RAID support product using Thunderbolt technology.
The newest product, Pegasus 3(P3) supports Thunderbolt 3 technology with
a different chip.

1. Change driver version.

2. Add P3 VID, DID and define it's device address.

3. P3 use msi interrupt, so stex_request_irq P3 type enable msi.

4. For hibernation, use msi_lock in stex_ss_handshake to prevent msi
   register write again when handshaking.

5. P3 doesn't need read() as flush.

6. In stex_ss_intr & stex_abort, P3 only clear interrupt register when
   getting vendor defined interrupt.
Signed-off-by: Charles.Chiou <charles.chiou@tw.promise.com>
Signed-off-by: Paul.Lyu <paul.lyu@tw.promise.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d6570227

scsi: libiscsi: qedi: convert iscsi_task.refcount from atomic_t to refcount_t · 6dc618cd

Elena Reshetova authored Mar 09, 2017

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter. This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6dc618cd

scsi: libfc: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t · 22c70d1a

Elena Reshetova authored Mar 09, 2017

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter. This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

22c70d1a