Commits · 10e5e37581fc5817d52ff5f8e2b3362f64eb04f4 · Kirill Smelkov / linux

18 May, 2018 40 commits

scsi: ufs: Add clock ungating to a separate workqueue · 10e5e375

Vijay Viswanath authored May 03, 2018

UFS driver can receive a request during memory reclaim by kswapd. So
when ufs driver puts the ungate work in queue, and if there are no idle
workers, kthreadd is invoked to create a new kworker. Since kswapd task
holds a mutex which kthreadd also needs, this can cause a deadlock
situation. So ungate work must be done in a separate work queue with
WQ_MEM_RECLAIM flag enabled. Such a workqueue will have a rescue thread
which will be called when the above deadlock condition is possible.
Signed-off-by: Vijay Viswanath <vviswana@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

10e5e375

scsi: ufs: make sure all interrupts are processed · 7f6ba4f1

Venkat Gopalakrishnan authored May 03, 2018

As multiple requests are submitted to the ufs host controller in
parallel there could be instances where the command completion interrupt
arrives later for a request that is already processed earlier as the
corresponding doorbell was cleared when handling the previous
interrupt. Read the interrupt status in a loop after processing the
received interrupt to catch such interrupts and handle it.
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7f6ba4f1

scsi: ufs: ufs-qcom: remove broken hci version quirk · 69a6fff0

Subhash Jadavani authored May 03, 2018

UFSHCD_QUIRK_BROKEN_UFS_HCI_VERSION is only applicable for QCOM UFS host
controller version 2.x.y and this has been fixed from version 3.x.y
onwards, hence this change removes this quirk for version 3.x.y onwards.

[mkp: applied by hand]
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

69a6fff0

scsi: ufs: add reference counting for scsi block requests · 38135535

Subhash Jadavani authored May 03, 2018

Currently we call the scsi_block_requests()/scsi_unblock_requests()
whenever we want to block/unblock scsi requests but as there is no
reference counting, nesting of these calls could leave us in undesired
state sometime. Consider following call flow sequence:

1. func1() calls scsi_block_requests() but calls func2() before
   calling scsi_unblock_requests()
2. func2() calls scsi_block_requests()
3. func2() calls scsi_unblock_requests()
4. func1() calls scsi_unblock_requests()

As there is no reference counting, we will have scsi requests unblocked
after #3 instead of it to be unblocked only after #4. Though we may not
have failures seen with this, we might run into some failures in future.
Better solution would be to fix this by adding reference counting.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

38135535

scsi: ufs: ufshcd: fix possible unclocked register access · b334456e

Subhash Jadavani authored May 03, 2018

Vendor specific setup_clocks ops may depend on clocks managed by ufshcd
driver so if the vendor specific setup_clocks callback is called when
the required clocks are turned off, it results into unclocked register
access.

This change make sure that required clocks are enabled before vendor
specific setup_clocks callback is called.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b334456e

scsi: ufs: fix exception event handling · 2e3611e9

Maya Erez authored May 03, 2018

The device can set the exception event bit in one of the response UPIU,
for example to notify the need for urgent BKOPs operation.  In such a
case, the host driver calls ufshcd_exception_event_handler to handle
this notification.  When trying to check the exception event status (for
finding the cause for the exception event), the device may be busy with
additional SCSI commands handling and may not respond within the 100ms
timeout.

To prevent that, we need to block SCSI commands during handling of
exception events and allow retransmissions of the query requests, in
case of timeout.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2e3611e9

scsi: dpt_i2o: Remove VLA usage · 60d6d22d

Kees Cook authored May 02, 2018

On the quest to remove all VLAs from the kernel[1] this moves the
sg_list variable off the stack, as already done for other allocated
buffers in adpt_i2o_passthru(). Additionally consolidates the error path
for kfree().

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

60d6d22d

scsi: ufs: Use freq table with devfreq · 092b4558

Bjorn Andersson authored May 17, 2018

devfreq requires that the client operates on actual frequencies, not only 0
and UMAX_INT and as such UFS brok with the introduction of f1d981ea ("PM
/ devfreq: Use the available min/max frequency").

This patch registers the frequencies of the first clock with devfreq and use
these to determine if we're trying to step up or down.

Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> [for devfreq & OPP part]
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

092b4558

scsi: ufs: Extract devfreq registration · deac444f

Bjorn Andersson authored May 17, 2018

Failing to register with devfreq leaves hba->devfreq assigned, which causes
the error path to dereference the ERR_PTR(). Rather than bolting on more
conditionals, move the call of devm_devfreq_add_device() into it's own
function and only update hba->devfreq once it's successfully registered.

The subsequent patch builds upon this to make UFS actually work again, as
it's been broken since f1d981ea ("PM / devfreq: Use the available
min/max frequency")

Also switch to use DEVFREQ_GOV_SIMPLE_ONDEMAND constant.
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

deac444f

scsi: storvsc: Avoid allocating memory for temp cpumasks · 1b25a8c4

Michael Kelley authored May 17, 2018

Current code allocates 240 Kbytes (in typical configs) for each synthetic
SCSI controller to use as temp cpumask variables. Recode to avoid needing
the temp cpumask variables and remove the memory allocation.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1b25a8c4

scsi: zfcp: enhance comments on fc_link_speed and supported_speed · 16dad279

Jens Remus authored May 17, 2018

The comment on fsf_qtcb_bottom_port.supported_speed did read as if the field
can only assume one of two possible values (i.e. 0x1 for 1 GBit/s or 0x2 for
2 GBit/s). This is not true for two reasons: first it is a flag field and
can thus assume any combination and second there are meanwhile more speeds.

Clarify comment on fsf_qtcb_bottom_port.supported_speed and add a comment to
fsf_qtcb_bottom_config.fc_link_speed.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

16dad279

scsi: zfcp: add port speed capabilities · 6e2e4900

Jens Remus authored May 17, 2018

Add port speed capabilities as defined in FC-LS RPSC ELS that have a
counterpart FC_PORTSPEED_* defined in scsi/scsi_transport_fc.h.
Suggested-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Fedor Loshakov <loshakov@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6e2e4900

scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger · 9e156c54

Jens Remus authored May 17, 2018

Otherwise iterating with list_for_each() over the adapter->erp_ready_head
and adapter->erp_running_head lists can lead to an infinite loop. See commit
"zfcp: fix infinite iteration on erp_ready_head list".

The run-time check is only performed for debug kernels which have the kernel
lock validator enabled. Following is an example of the warning that is
reported, if the ERP lock is not held when calling zfcp_dbf_rec_trig():

WARNING: CPU: 0 PID: 604 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x172/0x188
Modules linked in: ...
CPU: 0 PID: 604 Comm: kworker/u128:3 Not tainted 4.16.0-... #1
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Workqueue: zfcp_q_0.0.1906 zfcp_scsi_rport_work
Krnl PSW : 00000000330fdbf9 00000000367e9728 (zfcp_dbf_rec_trig+0x172/0x188)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
Krnl GPRS: 00000000c57a5d99 3288200000000000 0000000000000000 000000006cc82740
           00000000009d09d6 0000000000000000 00000000000000ff 0000000000000000
           0000000000000000 0000000000e1b5fe 000000006de01d38 0000000076130958
           000000006cc82548 000000006de01a98 00000000009d09d6 000000006a6d3c80
Krnl Code: 00000000009d0ad2: eb7ff0b80004        lmg        %r7,%r15,184(%r15)
           00000000009d0ad8: c0f4000d7dd0        brcl       15,b80678
          #00000000009d0ade: a7f40001            brc        15,9d0ae0
          >00000000009d0ae2: a7f4ff7d            brc        15,9d09dc
           00000000009d0ae6: e340f0f00004        lg         %r4,240(%r15)
           00000000009d0aec: eb7ff0b80004        lmg        %r7,%r15,184(%r15)
           00000000009d0af2: 07f4                bcr        15,%r4
           00000000009d0af4: 0707                bcr        0,%r7
Call Trace:
([<00000000009d09d6>] zfcp_dbf_rec_trig+0x66/0x188)
 [<00000000009dd740>] zfcp_scsi_rport_work+0x98/0x190
 [<0000000000169b34>] process_one_work+0x3d4/0x6f8
 [<000000000016a08a>] worker_thread+0x232/0x418
 [<000000000017219e>] kthread+0x166/0x178
 [<0000000000b815ea>] kernel_thread_starter+0x6/0xc
 [<0000000000b815e4>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u128:3/604:
 #0:  ((wq_completion)name){+.+.}, at: [<0000000082af1024>] process_one_work+0x1dc/0x6f8
 #1:  ((work_completion)(&port->rport_work)){+.+.}, at: [<0000000082af1024>] process_one_work+0x1dc/0x6f8
Last Breaking-Event-Address:
 [<00000000009d0ade>] zfcp_dbf_rec_trig+0x16e/0x188
---[ end trace b2f4020572e2c124 ]---
Suggested-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9e156c54

scsi: zfcp: cleanup indentation for posting FC events · 6919298c

Steffen Maier authored May 17, 2018

I just happened to see the function header indentation of
zfcp_fc_enqueue_event() and I picked some more from checkpatch:

$ checkpatch.pl --strict -f drivers/s390/scsi/zfcp_fc.c
...
CHECK: Alignment should match open parenthesis
 #113: FILE: drivers/s390/scsi/zfcp_fc.c:113:
+		fc_host_post_event(adapter->scsi_host, fc_get_event_number(),
+				event->code, event->data);

CHECK: Blank lines aren't necessary before a close brace '}'
 #118: FILE: drivers/s390/scsi/zfcp_fc.c:118:
+
+}
...

The change complements v2.6.36 commit 2d1e547f ("[SCSI] zfcp: Post
events through FC transport class").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6919298c

scsi: zfcp: support SCSI_ADAPTER_RESET via scsi_host sysfs attribute host_reset · 35e9111a

Steffen Maier authored May 17, 2018

Make use of feature introduced with v3.2 commit 29443691 ("[SCSI] scsi:
Added support for adapter and firmware reset").  The common code interface
was introduced for commit 95d31262 ("[SCSI] qla4xxx: Added support for
adapter and firmware reset").

$ echo adapter > /sys/class/scsi_host/host<N>/host_reset

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : scshr_y                SCSI sysfs host_reset yes
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x0000000000000000                     none (invalid)
D_ID           : 0x00000000                             none (invalid)
Adapter status : 0x4500050b
Port status    : 0x00000000                             none (invalid)
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x04                   ZFCP_ERP_ACTION_REOPEN_ADAPTER
ERP need       : 0x04                   ZFCP_ERP_ACTION_REOPEN_ADAPTER

This is the common code equivalent to the zfcp-specific
&dev_attr_adapter_failed.attr in zfcp_sysfs_adapter_attrs.attrs[]:

$ echo 0 > /sys/bus/ccw/drivers/zfcp/<devbusid>/failed

The unsupported case returns EOPNOTSUPP:

$ echo firmware > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Operation not supported

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scshr_n                        SCSI sysfs host_reset no
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0xffffffff                             none (invalid)
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0xffffffa1                     -EOPNOTSUPP==-95
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

For any other invalid value, common code returns EINVAL without invoking
our callback:

$ echo foo > /sys/class/scsi_host/host<N>/host_reset
-bash: echo: write error: Invalid argument
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

35e9111a

scsi: zfcp: explicitly support initiator in scsi_host_template · b24bf22d

Steffen Maier authored May 17, 2018

While the default did already correctly print "Initiator" let's make it
explicit and convert zfcp to the feature.

$ cat /sys/class/scsi_host/host0/supported_mode
Initiator

$ cat /sys/class/scsi_host/host0/active_mode
Initiator

The default worked, because not setting the field has it initialized to zero
== MODE_UNKNOWN. scsi_host_alloc() sets shost->active_mode = MODE_INITIATOR
in this case. The sysfs accessor function show_shost_supported_mode()
assumes MODE_INITIATOR in this case.  This default behavior was introduced
with v2.6.24 commit 7a39ac3f ("[SCSI] make supported_mode default to
initiator.").  The feature flag was introduced with v2.6.24 commit
5dc2b89e ("[SCSI] add supported_mode and active_mode attributes to the
host").  So there was no release where zfcp would have shown "unknown".
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b24bf22d

scsi: zfcp: remove unused return values of ERP trigger functions · 2fdd45fd

Steffen Maier authored May 17, 2018

Since v2.6.27 commit 553448f6 ("[SCSI] zfcp: Message cleanup"), none of
the callers has been interested any more.  Values were not returned
consistently in all ERP trigger functions.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2fdd45fd

scsi: zfcp: zfcp_erp_action_exists() does only check for running · 013af857

Steffen Maier authored May 17, 2018

Simplify its signature to return boolean and rename it to
zfcp_erp_action_is_running() to indicate its actual unmodified semantics.
It has always been used like this since v2.6.0 history commit ea127f97
("[PATCH] s390 (7/7): zfcp host adapter.").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

013af857

scsi: zfcp: remove unused ERP enum values · cd4a186a

Steffen Maier authored May 17, 2018

All constant defines were introduced with v2.6.0 history commit ea127f97
("[PATCH] s390 (7/7): zfcp host adapter.") and refactored into enums with
commit 287ac01a ("[SCSI] zfcp: Cleanup code in zfcp_erp.c").

ZFCP_STATUS_ERP_DISMISSING and ZFCP_ERP_STEP_FSF_XCONFIG were never used.

v2.6.27 commit 287ac01a ("[SCSI] zfcp: Cleanup code in zfcp_erp.c")
removed the use of ZFCP_ERP_ACTION_READY on refactoring
zfcp_erp_action_exists() to now only check adapter->erp_running_head but no
longer adapter->erp_ready_head. The same commit could have changed the
function return type from int to "enum zfcp_erp_act_state".
ZFCP_ERP_ACTION_READY was never used outside of zfcp_erp_action_exists().
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd4a186a

scsi: zfcp: consistently use function name space prefix · d39eda54

Steffen Maier authored May 17, 2018

I've been mixing up
zfcp_task_mgmt_function() [SCSI] and
zfcp_fsf_fcp_task_mgmt()  [FSF]
so often lately that I wanted to fix this.

SCSI changes complement v2.6.27 commit f76af7d7 ("[SCSI] zfcp: Cleanup
of code in zfcp_scsi.c").

While at it, also fixup the other inconsistencies elsewhere.

ERP changes complement v2.6.27 commit 287ac01a ("[SCSI] zfcp: Cleanup
code in zfcp_erp.c") which introduced status_change_set().

FC changes complement v2.6.32 commit 6f53a2d2 ("[SCSI] zfcp: Apply
common naming conventions to zfcp_fc").  by renaming a leftover introduced
with v2.6.27 commit cc8c2829 ("[SCSI] zfcp: Automatically attach remote
ports").

FSF changes fixup v2.6.32 commit a4623c46 ("[SCSI] zfcp: Improve request
allocation through mempools").  which replaced zfcp_fsf_alloc_qtcb()
introduced with v2.6.27 commit c41f8cbd ("[SCSI] zfcp: zfcp_fsf
cleanup.").

SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b1
("[SCSI] zfcp: transport class adaptations").

SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392
("[SCSI] zfcp: Add port_state attribute to sysfs").

SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit
338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target
port is unavailable").
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d39eda54

scsi: zfcp: workqueue: set description for port work items with their WWPN as context · 5c750d58

Steffen Maier authored May 17, 2018

As a prerequisite, complement commit 3d1cb205 ("workqueue: include
workqueue info when printing debug dump of a worker task") to be usable with
kernel modules by exporting the symbol set_worker_desc().  Current built-in
user was introduced with commit ef3b1019 ("writeback: set worker desc to
identify writeback workers in task dumps").

Can help distinguishing work items which do not have adapter scope.
Description is printed out with task dump for debugging on WARN, BUG, panic,
or magic-sysrq [show-task-states(t)].

Example:
$ echo 0 >| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed &
$ echo 't' >| /proc/sysrq-trigger
$ dmesg
sysrq: SysRq : Show State
  task                        PC stack   pid father
...
zfcp_q_0.0.1880 S14640  2165      2 0x02000000
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<0000000000168654>] rescuer_thread+0x33c/0x3a0
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
no locks held by zfcp_q_0.0.1880/2165.
...
kworker/u512:2  D11280  2193      2 0x02000000
Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327)
                                                        ^^^^^^^^^^^^^^^^^^^^^
Call Trace:
([<00000000009df464>] __schedule+0xbf4/0xc78)
 [<00000000009df57c>] schedule+0x94/0xc0
 [<00000000009e50c0>] schedule_timeout+0x488/0x4d0
 [<00000000001e425c>] msleep+0x5c/0x78                  >>test code only<<
 [<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp]
 [<0000000000167154>] process_one_work+0x3b4/0x718
 [<000000000016771c>] worker_thread+0x264/0x408
 [<000000000016f8be>] kthread+0x166/0x178
 [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
 [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
2 locks held by kworker/u512:2/2193:
 #0:  (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
 #1:  ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
...

=============================================
Showing busy workqueues and worker pools:
workqueue zfcp_q_0.0.1880: flags=0x2000a
  pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1
    in-flight: 2193:zfcp_scsi_rport_work [zfcp]
pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311

Work items with adapter scope are already identified by the workqueue name
"zfcp_q_<devbusid>" and the work item function name.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5c750d58

scsi: zfcp: decouple our scsi_eh callbacks from scsi_cmnd · 674595d8

Steffen Maier authored May 17, 2018

Note: zfcp_scsi_eh_host_reset_handler() will be converted in a later patch.

zfcp_scsi_eh_device_reset_handler() now only depends on scsi_device.
zfcp_scsi_eh_target_reset_handler() now only depends on scsi_target.
All derive other objects from these intended callback arguments.

zfcp_scsi_eh_target_reset_handler() is special: The FCP channel requires a
valid LUN handle so we try to find ourselves a stand-in scsi_device as
suggested by Hannes Reinecke. If it cannot find a stand-in scsi device,
trace a record like the following (formatted with zfcpdbf from s390-tools):

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : tr_nosd        target reset, no SCSI device
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x00000000     SCSI ID/target denoting scope
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0x00002003     field re-used for midlayer value: FAILED
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

Actually change the signature of zfcp_task_mgmt_function() used by
zfcp_scsi_eh_device_reset_handler() & zfcp_scsi_eh_target_reset_handler().
Since it was prepared in a previous patch, we only need to delete a local
auto variable which is now the intended argument.
Suggested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

674595d8

scsi: zfcp: decouple TMFs from scsi_cmnd by using fc_block_rport · 42afc652

Steffen Maier authored May 17, 2018

Intentionally retrieve the rport by walking SCSI common code objects
rather than zfcp_sdev->port->rport.

The latter is used for pairing the calls to fc_remote_port_add() and
fc_remote_port_delete(). [see v2.6.31 commit 379d6bf6 ("[SCSI] zfcp:
Add port only once to FC transport class")]

zfcp_scsi_rport_register() sets zfcp_port.rport to what
fc_remote_port_add() returned.
zfcp_scsi_rport_block() sets zfcp_port.rport = NULL after having called
fc_remote_port_delete().

Hence, while an rport is blocked (or in any subsequent state due to
scsi_transport_fc timeouts such as fast_io_fail_tmo or dev_loss_tmo),
zfcp_port.rport is NULL and cannot serve as argument to fc_block_rport().

During zfcp recovery, a just recovered zfcp_port can have the UNBLOCKED
status flag, but an async rport unblocking has only started via
zfcp_scsi_schedule_rport_register() in zfcp_erp_try_rport_unblock()
[see v4.10 commit 6f2ce1c6 ("scsi: zfcp: fix rport unblock race with
LUN recovery")] in zfcp_erp_action_cleanup(). Now zfcp_erp_wait() can
return. This would be sufficient to successfully send a TMF.
But the rport can still be blocked and zfcp_port.rport can still be NULL
until zfcp_port.rport_work was scheduled and has actually called
fc_remote_port_add() and assigned its return value to zfcp_port.rport.
We need an unblocked rport for a successful scsi_eh TUR.

Similarly, for a zfcp_port which has just lost its UNBLOCKED status flag,
the return of zfcp_erp_wait() can race with zfcp_port.rport_work queued
by zfcp_scsi_schedule_rport_block(). Therefore we cannot reliably access
zfcp_port.rport. However, we'd like to get fc_rport_block()'s opinion on
when fast_io_fail_tmo triggered. While we might use
flush_work(&port->rport_work) to sync with the work item, we can simply use
the other way to get an rport pointer.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

42afc652

scsi: zfcp: decouple SCSI setup of TMF from scsi_cmnd · 26f5fa9d

Steffen Maier authored May 17, 2018

Actually change the signature of zfcp_fsf_fcp_task_mgmt().
Since it was prepared in the previous patch, we only need to delete
a local auto variable which is now the intended argument.

Prepare zfcp_fsf_fcp_task_mgmt's caller zfcp_task_mgmt_function()
to have its function body only depend on a scsi_device and derived objects.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

26f5fa9d

scsi: zfcp: decouple FSF request setup of TMF from scsi_cmnd · 39abb11a

Steffen Maier authored May 17, 2018

In zfcp_fsf_fcp_task_mgmt() resolve the still old argument scsi_cmnd into
scsi_device very early and only depend on scsi_device and derived objects in
the function body.

This prepares to later change the function signature replacing the scsi_cmnd
argument with scsi_device.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

39abb11a

scsi: zfcp: split FCP_CMND IU setup between SCSI I/O and TMF again · e0116c91

Steffen Maier authored May 17, 2018

This reverts commit 2443c8b2 ("[SCSI] zfcp: Merge FCP task management
setup with regular FCP command setup"), because this introduced a
dependency on the unsuitable SCSI command for scsi_eh / TMF.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e0116c91

scsi: zfcp: decouple TMF response handler from scsi_cmnd · 266883f2

Steffen Maier authored May 17, 2018

Originally, I planned for TMF handling to have different context data in
fsf_req->data depending on the TMF scope in fcp_cmnd->fc_tm_flags:

 * scsi_device if FCP_TMF_LUN_RESET,
 * zfcp_port if FCP_TMF_TGT_RESET.

However, the FCP channel requires a valid LUN handle so we now use
scsi_device as context data with any TMF for the time being.

Regular SCSI I/O FCP requests continue using scsi_cmnd as req->data.

Hence, the callers of zfcp_fsf_fcp_handler_common() must resolve req->data
and pass scsi_device as common context.  While at it, remove the detour
zfcp_sdev->port->adapter and use the more direct req->adapter as elsewhere
in this function already.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

266883f2

scsi: zfcp: decouple SCSI traces for scsi_eh / TMF from scsi_cmnd · 82212118

Steffen Maier authored May 17, 2018

The SCSI command pointer passed to scsi_eh callbacks is just one arbitrary
command of potentially many that are in the eh queue to be processed.  The
command is only used to indirectly pass the TMF scope in terms of SCSI
ID/target and SCSI LUN for LUN reset.

Hence, zfcp had filled in SCSI trace record fields which do not really
belong to the TMF. This was confusing.

Therefore, refactor the TMF tracing to work without SCSI command.  Since the
FCP channel always requires a valid LUN handle, we use SCSI device as common
context for any TMF (even target reset).  To make it even clearer, we set
all bits to 1 for the fields, which do not belong to the TMF, to indicate
that these fields are invalid.

The old zfcp_dbf_scsi() became zfcp_dbf_scsi_common() to now handle both
SCSI commands and TMFs. The old argument scsi_cmnd is now optional and can
be NULL with TMFs. The new argument scsi_device is mandatory to carry
context, as well as SCSI ID/target and SCSI LUN in case of TMFs.

New example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : [lt]r_....
Request ID     : 0x<reqid>              ID of FSF FCP request with TM flag
                 For cases without FSF request: 0x0 for none (invalid)
SCSI ID        : 0x<scsi_id>            SCSI ID/target denoting scope
SCSI LUN       : 0x<scsi_lun>           SCSI LUN denoting scope
SCSI LUN high  : 0x<scsi_lun_high>      SCSI LUN denoting scope
SCSI result    : 0xffffffff                             none (invalid)
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0x00                   FCP_RSP info code of TMF
FCP rsp IU     : 00000000 00000000 00000100 00000000 ext FCP_RSP IU
                 00000000 00000008                   ext FCP_RSP IU
FCP rsp IU len : 32                                  FCP_RSP IU length
Payload time   : ...
FCP rsp IU all : 00000000 00000000 00000100 00000000 full FCP_RSP IU
                 00000000 00000008 00000000 00000000 full FCP_RSP IU
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

82212118

scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread · 6a765508

Steffen Maier authored May 17, 2018

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : .......
LUN            : 0x...
WWPN           : 0x...
D_ID           : 0x...
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x...
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x0.                   ZFCP_ERP_ACTION_REOPEN_...
ERP need       : 0xc0                   ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6a765508

scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED · 8c3d20aa

Steffen Maier authored May 17, 2018

That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.

Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call.  All ERP
trigger functions finally run through zfcp_erp_action_enqueue().  So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : .......
LUN            : 0x...
WWPN           : 0x...
D_ID           : 0x...
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x...
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x0.                   ZFCP_ERP_ACTION_REOPEN_...
ERP need       : 0xe0                   ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8c3d20aa

scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED · d70aab55

Steffen Maier authored May 17, 2018

For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.

Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:

loose remote port

t   workqueue
[s] zfcp_q_<dev>       IRQ                 zfcperp<dev>

=== ================== =================== ============================

  0                    recv RSCN
                       q p.test_link_work
    block rport
     start fast_io_fail_tmo
    send ADISC ELS
  4                    recv ADISC fail
                       block zfcp_port
                                           port forced reopen
                                           send open port
 12                    recv open port fail
                                           q p.gid_pn_work
                                           zfcp_erp_wakeup
                                           (zfcp_erp_wait would return)
    GID_PN fail

Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.

    port.status |= ERP_FAILED

If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.

    workqueue
    fc_dl_<host>
    ==================
 27 fc_timeout_fail_rport_io
    fc_terminate_rport_io
    zfcp_scsi_terminate_rport_io
    zfcp_erp_port_forced_reopen
    _zfcp_erp_port_forced_reopen
     if (port.status & ERP_FAILED)
      return;

Therefore, write a trace before above early return.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : sctrpi1                SCSI terminate rport I/O
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x<wwpn>
D_ID           : 0x<n_port_id>
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x03                   ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need       : 0xe0                   ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d70aab55

scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return · 96d92704

Steffen Maier authored May 17, 2018

get_device() and its internally used kobject_get() only return NULL if they
get passed NULL as argument. zfcp_get_port_by_wwpn() loops over
adapter->port_list so the iteration variable port is always non-NULL.
Struct device is embedded in struct zfcp_port so &port->dev is always
non-NULL. This is the argument to get_device().  However, if we get an
fc_rport in terminate_rport_io() for which we cannot find a match within
zfcp_get_port_by_wwpn(), the latter can return NULL.  v2.6.30 commit
70932935 ("[SCSI] zfcp: Fix oops when port disappears") introduced an
early return without adding a trace record for this case.  Even if we don't
need recovery in this case, for debugging we should still see that our
callback was invoked originally by scsi_transport_fc.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : sctrpin        SCSI terminate rport I/O, no zfcp port
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x<wwpn>               WWPN
D_ID           : 0x<n_port_id>          N_Port-ID
Adapter status : 0x...
Port status    : 0xffffffff             unknown (-1)
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x03                   ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need       : 0xc0                   ZFCP_ERP_ACTION_NONE
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 70932935 ("[SCSI] zfcp: Fix oops when port disappears")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

96d92704

scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed · 512857a7

Steffen Maier authored May 17, 2018

If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0 by
design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the
half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".

Old example trace record formatted with zfcpdbf from s390-tools:

Tag:           : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN            : 0x<FCP_LUN>
WWPN           : 0x<WWPN>
D_ID           : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x40000000     ZFCP_STATUS_COMMON_RUNNING
                                but not ZFCP_STATUS_COMMON_UNBLOCKED as it
                                was closed on close part of adapter reopen
ERP want       : 0x01
ERP need       : 0x01           misleading

However, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.

We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP
need" as 'not needed' but still keep the information which erp_action type,
that zfcp_erp_required_act() had decided upon, is needed.  0xc_ is chosen to
be visibly different from 0x0_ in "ERP want".

New example trace record formatted with zfcpdbf from s390-tools:

Tag:           : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN            : 0x<FCP_LUN>
WWPN           : 0x<WWPN>
D_ID           : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x40000000
ERP want       : 0x01
ERP need       : 0xc1           would need LUN ERP, but no action set up
                   ^

Before v2.6.38 commit ae0904f6 ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action as
argument and field from the trace.

This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.

See also commit fdbd1c5e ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: ae0904f6 ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

512857a7

scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF · 81979ae6

Steffen Maier authored May 17, 2018

We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of
our eh callback and an actual send/recv of an abort / TMF request.  In order
to see the temporal sequence including any abort / TMF send retries, add a
trace before the above two blocking functions.  This supports problem
determination with scsi_eh and parallel zfcp ERP.

No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to
an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we
now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes
almost the beginning of the callback.

No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case and
that is sufficient.

Example trace records formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : abrt_wt        abort, before zfcp_erp_wait()
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x<scsi_id>
SCSI LUN       : 0x<scsi_lun>
SCSI LUN high  : 0x<scsi_lun_high>
SCSI result    : 0x<scsi_result_of_cmd_to_be_aborted>
SCSI retries   : 0x<retries_of_cmd_to_be_aborted>
SCSI allowed   : 0x<allowed_retries_of_cmd_to_be_aborted>
SCSI scribble  : 0x<req_id_of_cmd_to_be_aborted>
SCSI opcode    : <CDB_of_cmd_to_be_aborted>
FCP rsp inf cod: 0x..                                   none (invalid)
FCP rsp IU     : ...                                    none (invalid)

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_wait        LUN reset, before zfcp_erp_wait()
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x<scsi_id>
SCSI LUN       : 0x<scsi_lun>
SCSI LUN high  : 0x<scsi_lun_high>
SCSI result    : 0x...                                  unrelated
SCSI retries   : 0x..                                   unrelated
SCSI allowed   : 0x..                                   unrelated
SCSI scribble  : 0x...                                  unrelated
SCSI opcode    : ...                                    unrelated
FCP rsp inf cod: 0x..                                   none (invalid)
FCP rsp IU     : ...                                    none (invalid)
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 63caf367 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

81979ae6

scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler · df307816

Steffen Maier authored May 17, 2018

For problem determination we need to see whether and why we were successful
or not. This allows deduction of scsi_eh escalation.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : schrh_r        SCSI host reset handler result
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0xffffffff                             none (invalid)
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0x00002002     field re-used for midlayer value: SUCCESS
                                or in other cases: 0x2009 == FAST_IO_FAIL
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

v2.6.35 commit a1dbfddd ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") introduced the first return with something
other than the previously hardcoded single SUCCESS return path.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: a1dbfddd ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

df307816

scsi: cxlflash: Isolate external module dependencies · cd43c221

Uma Krishnan authored May 11, 2018

Depending on the underlying transport, cxlflash has a dependency on either
the CXL or OCXL drivers, which are enabled via their Kconfig option.
Instead of having a module wide dependency on these config options, it is
better to isolate the object modules that are dependent on the CXL and OCXL
drivers and adjust the module dependencies accordingly.

This commit isolates the object files that are dependent on CXL and/or
OCXL. The cxl/ocxl fops used in the core driver are tucked under an ifdef to
avoid compilation errors.
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd43c221

scsi: cxlflash: Abstract hardware dependent assignments · de5d35af

Uma Krishnan authored May 11, 2018

As a staging cleanup to support transport specific builds of the cxlflash
module, relocate device dependent assignments to header files. This will
avoid littering the core driver with conditional compilation logic.
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

de5d35af

scsi: cxlflash: Add include guards to backend.h · 5e12397a

Uma Krishnan authored May 11, 2018

The new header file, backend.h, that was recently added is missing the
include guards. This commit adds the guards.
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5e12397a

scsi: cxlflash: Use local mutex for AFU serialization · e63a8d88

Matthew R. Ochs authored May 11, 2018

AFUs can only process a single AFU command at a time. This is enforced with
a global mutex situated within the AFU send routine. As this mutex has a
global scope, it has the potential to unnecessarily block commands destined
for other AFUs.

Instead of using a global mutex, transition the mutex to be per-AFU. This
will allow commands to only be blocked by siblings of the same AFU.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e63a8d88

scsi: cxlflash: Acquire semaphore before invoking ioctl services · 32a9ae41

Uma Krishnan authored May 11, 2018

When a superpipe process that makes use of virtual LUNs is terminated or
killed abruptly, there is a possibility that the cxlflash driver could hang
and deprive other operations on the adapter.

The release fop registered to be invoked on a context close, detaches every
LUN associated with the context. The underlying service to detach the LUN
assumes it has been called with the read semaphore held, and releases the
semaphore before any operation that could be time consuming.

When invoked without holding the read semaphore, an opportunity is created
for the semaphore's count to become negative when it is temporarily released
during one of these potential lengthy operations. This negative count
results in subsequent acquisition attempts taking forever, leading to the
hang.

To support the current design point of holding the semaphore on the ioctl()
paths, the release fop should acquire it before invoking any ioctl services.
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

32a9ae41