Commits · 36a7e993eedb2c3f11de3b686b351f75e1edbbb5 · nexedi / linux

05 Jun, 2015 2 commits

NVMe: Memory barrier before queue_count is incremented · 36a7e993

Jon Derrick authored May 27, 2015

Protects against reordering and/or preempting which would allow the
kthread to access the queue descriptor before it is set up
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

36a7e993

NVMe: add sysfs and ioctl controller reset · 4cc06521

Keith Busch authored Jun 05, 2015

We need the ability to perform an nvme controller reset as discussed on
the mailing list thread:

  http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html

This adds a sysfs entry that when written to will reset perform an NVMe
controller reset if the controller was successfully initialized in the
first place.

This also adds locking around resetting the device in the async probe
method so the driver can't schedule two resets.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Brandon Schultz <brandon.schulz@hgst.com>
Cc: David Sariel <david.sariel@pmcs.com>

Updated by Jens to:

1) Merge this with the ioctl reset patch from David Sariel. The ioctl
   path now shares the reset code from the sysfs path.

2) Don't flush work if we fail issuing the reset.
Signed-off-by: Jens Axboe <axboe@fb.com>

4cc06521

02 Jun, 2015 2 commits

null_blk: restart request processing on completion handler · 8b70f45e

Akinobu Mita authored Jun 02, 2015

When irqmode=2 (IRQ completion handler is timer) and queue_mode=1
(Block interface to use is rq), the completion handler should restart
request handling for any pending requests on a queue because request
processing stops when the number of commands are queued more than
hw_queue_depth (null_rq_prep_fn returns BLKPREP_DEFER).

Without this change, the following command cannot finish.

	# modprobe null_blk irqmode=2 queue_mode=1 hw_queue_depth=1
	# fio --name=t --rw=read --size=1g --direct=1 \
	  --ioengine=libaio --iodepth=64 --filename=/dev/nullb0
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

8b70f45e

null_blk: prevent timer handler running on a different CPU where started · 419c21a3

Akinobu Mita authored Jun 02, 2015

When irqmode=2 (IRQ completion handler is timer), timer handler should
be called on the same CPU where the timer has been started.

Since completion_queues are per-cpu and the completion handler only
touches completion_queue for local CPU, we need to prevent the handler
from running on a different CPU where the timer has been started.
Otherwise, the IO cannot be completed until another completion handler
is executed on that CPU.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

419c21a3

01 Jun, 2015 3 commits

NVMe: Remove hctx reliance for multi-namespace · 42483228

Keith Busch authored Jun 01, 2015

The driver needs to track shared tags to support multiple namespaces
that may be dynamically allocated or deleted. Relying on the first
request_queue's hctx's is not appropriate as we cannot clear outstanding
tags for all namespaces using this handle, nor can the driver easily track
all request_queue's hctx as namespaces are attached/detached. Instead,
this patch uses the nvme_dev's tagset to get the shared tag resources
instead of through a request_queue hctx.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

42483228

Merge branch 'for-4.2/core' into for-4.2/drivers · 843e8ddb
Jens Axboe authored Jun 01, 2015

843e8ddb

blk-mq: Shared tag enhancements · f26cdc85

Keith Busch authored Jun 01, 2015

Storage controllers may expose multiple block devices that share hardware
resources managed by blk-mq. This patch enhances the shared tags so a
low-level driver can access the shared resources not tied to the unshared
h/w contexts. This way the LLD can dynamically add and delete disks and
request queues without having to track all the request_queue hctx's to
iterate outstanding tags.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f26cdc85

29 May, 2015 4 commits

block: don't honor chunk sizes for data-less IO · e548ca4e

Jens Axboe authored May 29, 2015

We don't need to honor chunk sizes for IO that doesn't carry any
data.
Signed-off-by: Jens Axboe <axboe@fb.com>

e548ca4e

block: only honor SG gap prevention for merges that contain data · beefa6ba

Jens Axboe authored May 29, 2015

We can safely merge anything that wont generate an SG list entry,
so if the bio is data-less (discard), don't look at potential
SG gaps.
Signed-off-by: Jens Axboe <axboe@fb.com>

beefa6ba

NVMe: End sync requests immediately on failure · 75619bfa

Keith Busch authored May 28, 2015

Do not retry failed sync commands so the original status may be seen
without issuing unnecessary retries.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

75619bfa

NVMe: Use requested sync command timeout · f4ff414a

Keith Busch authored May 28, 2015

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f4ff414a

26 May, 2015 1 commit

block: fix returnvar.cocci warnings · f6454b04

Julia Lawall authored May 26, 2015

Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>

f6454b04

22 May, 2015 12 commits

NVMe: Fix obtaining command result · a0a931d6

Keith Busch authored May 22, 2015

Replaces req->sense_len usage, which is not owned by the LLD, to
req->special to contain the command result for driver created commands,
and sets the result unconditionally on completion.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Signed-off-by: Jens Axboe <axboe@fb.com>

a0a931d6

block, dm: don't copy bios for request clones · 5f1b670d

Christoph Hellwig authored May 22, 2015

Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.

This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence.  With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.

I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

5f1b670d

block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb

Mike Snitzer authored May 22, 2015

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

326e1dbb

nvme: submit internal commands through the block layer · d29ec824

Christoph Hellwig authored May 22, 2015

Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands. This both simplifies the code a lot and allow
for a better structure. For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

d29ec824

nvme: fail SCSI read/write command with unsupported protection bit · 772ce435
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
772ce435

nvme: report the DPOFUA in MODE_SENSE · 90851768

Christoph Hellwig authored May 22, 2015

NVMe device always support the FUA bit, and the SCSI translations
accepts the DPO bit, which doesn't have much of a meaning for us.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

90851768

nvme: simplify and cleanup the READ/WRITE SCSI CDB parsing code · cbbb7a2e
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
cbbb7a2e
nvme: first round at deobsfucating the SCSI translation code · 3726897e
Christoph Hellwig authored May 22, 2015
```
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
```
3726897e

nvme: fix scsi translation error handling · e61b0a86

Christoph Hellwig authored May 22, 2015

Erorr handling for the scsi translation was completely broken, as there
were two different positive error number spaces overlapping.  Fix this
up by removing one of them, and centralizing the generation of the other
positive values in a single place.  Also fix up a few places that didn't
handle the NVMe error codes properly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e61b0a86

nvme: split nvme_trans_send_fw_cmd · b90c48d0

Christoph Hellwig authored May 22, 2015

This function handles two totally different opcodes, so split it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

b90c48d0

nvme: store a struct device pointer in struct nvme_dev · e75ec752

Christoph Hellwig authored May 22, 2015

Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev.  This also happens to be a nice step towards
making some code reusable for non-PCI transports.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

e75ec752

nvme: consolidate synchronous command submission helpers · f705f837

Christoph Hellwig authored May 22, 2015

Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

f705f837

20 May, 2015 9 commits

loop: remove (now) unused 'out' label · 6a927007

Jens Axboe authored May 20, 2015

gcc, righfully, complains:

drivers/block/loop.c:1369:1: warning: label 'out' defined but not used [-Wunused-label]

Kill it.
Signed-off-by: Jens Axboe <axboe@fb.com>

6a927007

s390/block/dasd: remove obsolete while -EBUSY loop · a05e5780

Jarod Wilson authored May 06, 2015

With the mutex_trylock bit gone from blkdev_reread_part(), the retry logic
in dasd_scan_partitions() shouldn't be necessary.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>