- 27 Jan, 2017 9 commits
-
-
Omar Sandoval authored
These statistics _might_ be useful to userspace, but it's better not to commit to an ABI for these yet. Also, the dispatched file in sysfs couldn't be cleared, so make it clearable like the others in debugfs. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
These can be used to debug issues like tag leaks and stuck requests. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
These are very tied to the blk-mq tag implementation, so exposing them to sysfs isn't a great idea. Move the debugging information to debugfs and add basic entries for the number of tags and the number of reserved tags to sysfs. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
This is useful for debugging problems where we've gotten stuck with requests in the software queues. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
The request pointers by themselves aren't super useful. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
These lists are only useful for debugging; they definitely don't belong in sysfs. Putting them in debugfs also removes the limitation of a single page of output. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
hctx->state could come in handy for bugs where the hardware queue gets stuck in the stopped state, and hctx->flags is just useful to know. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 26 Jan, 2017 2 commits
-
-
Jens Axboe authored
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipath dm, since that is a heavy user of atomic request allocations in the map and clone path. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_request(). Initialize alloc_data->flags before passing it in. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 25 Jan, 2017 1 commit
-
-
Jens Axboe authored
If we have a scheduler attached, we have two sets of tags. We don't want to apply our active queue throttling for the scheduler side of tags, that only applies to driver tags since that's the resource we need to dispatch an IO. Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 23 Jan, 2017 3 commits
-
-
Markus Elfring authored
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code place. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Markus Elfring authored
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code places. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Alexander Potapenko authored
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in cfq_init_cfqq(): ================================================================== BUG: KMSAN: use of unitialized memory ... Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:? [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:? [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857 ... origin: [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:? [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:? [< inline >] allocate_slab mm/slub.c:1627 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641 [< inline >] new_slab_objects mm/slub.c:2407 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564 [< inline >] __slab_alloc mm/slub.c:2606 [< inline >] slab_alloc_node mm/slub.c:2669 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850 ... ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) The uninitialized struct cfq_queue is created by kmem_cache_alloc_node() and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class before it's initialized. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 20 Jan, 2017 1 commit
-
-
Jens Axboe authored
Add support for growing the tags associated with a hardware queue, for the scheduler tags. Currently we only support resizing within the limits of the original depth, change that so we can grow it as well by allocating and replacing the existing scheduler tag set. This is similar to how we could increase the software queue depth with the legacy IO stack and schedulers. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
- 19 Jan, 2017 3 commits
-
-
Jens Axboe authored
The run handler we register for the delayed work requires that the queue be stopped, yet we leave that up to the caller. Let's move it into blk_mq_delay_queue() itself, so that the API is sane. This fixes a stall with SCSI, where it calls blk_mq_delay_queue() without having stopped the queue. Hence the queue is never run. Reported-by: Hannes Reinecke <hare@suse.com> Fixes: 70f4db63 ("blk-mq: add blk_mq_delay_queue") Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
We used to pass in NULL for hctx for reserved tags, but we don't do that anymore. Hence the check for whether hctx is NULL or not is now redundant, kill it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation") Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
We already checked that e is NULL, so no point in calling elevator_put() to free it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers") Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 18 Jan, 2017 4 commits
-
-
Jens Axboe authored
There's no potential harm in quiescing the queue, but it also doesn't buy us anything. And we can't run the queue async for policy deactivate, since we could be in the path of tearing the queue down. If we schedule an async run of the queue at that time, we're racing with queue teardown AFTER having we've already torn most of it down. Reported-by: Omar Sandoval <osandov@fb.com> Fixes: 4d199c6f ("blk-cgroup: ensure that we clear the stop bit on quiesced queues") Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
When we resize a struct sbitmap_queue, we update the wakeup batch size, but we don't update the wait count in the struct sbq_wait_states. If we resized down from a size which could use a bigger batch size, these counts could be too large and cause us to miss necessary wakeups. To fix this, update the wait counts when we resize (ensuring some careful memory ordering so that it's safe w.r.t. concurrent clears). This also fixes a theoretical issue where two threads could end up bumping the wait count up by the batch size, which could also potentially lead to hangs. Reported-by: Martin Raiber <martin@urbackup.org> Fixes: e3a2b3f9 ("blk-mq: allow changing of queue depth through sysfs") Fixes: 2971c35f ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
We always do an atomic clear_bit() right before we call sbq_wake_up(), so we can use smp_mb__after_atomic(). While we're here, comment the memory barriers in here a little more. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
If we call blk_mq_quiesce_queue() on a queue, we must remember to pair that with something that clears the stopped by on the queues later on. Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 17 Jan, 2017 12 commits
-
-
Jens Axboe authored
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
This is basically identical to deadline-iosched, except it registers as a MQ capable scheduler. This is still a single queue design. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
This is in preparation for having two sets of tags available. For that we need a static index, and a dynamically assignable one. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
No functional change in this patch, just in preparation for having two types of tags available to the block layer for a single request. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Prep patch for adding an extra tag map for scheduler requests. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
This is in preparation for having another tag set available. Cleanup the parameters, and allow passing in of tags for blk_mq_put_tag(). Signed-off-by: Jens Axboe <axboe@fb.com> [hch: even more cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
It's only used in blk-mq, kill it from the main exported header and kill the symbol export as well. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
We want to use it outside of blk-core.c. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Prep patch for adding MQ ops as well, since doing anon unions with named initializers doesn't work on older compilers. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
-
Alden Tondettar authored
If a GUID Partition Table claims to have more than 2**25 entries, the calculation of the partition table size in alloc_read_gpt_entries() will overflow a 32-bit integer and not enough space will be allocated for the table. Nothing seems to get written out of bounds, but later efi_partition() will read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing information to /proc/partitions and uevents. The problem exists on both 64-bit and 32-bit platforms. Fix the overflow and also print a meaningful debug message if the table size is too large. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 12 Jan, 2017 4 commits
-
-
Josef Bacik authored
The old maintainers email is bouncing and I've rewritten most of this driver in the recent months. Also add linux-block to the mailinglist and remove the old tree, I will send patches through the linux-block tree. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
We never change it, make that clear. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
-
Ming Lei authored
If the last bvec of the 1st bio and the 1st bvec of the next bio are physically contigious, and the latter can be merged to last segment of the 1st bio, we should think they don't violate sg gap(or virt boundary) limit. Both Vitaly and Dexuan reported lots of unmergeable small bios are observed when running mkfs on Hyper-V virtual storage, and performance becomes quite low. This patch fixes that performance issue. The same issue should exist on NVMe, since it sets virt boundary too. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Vlastimil Babka authored
The raw_cmd_copyin() function does a kmalloc() with GFP_USER, although the allocated structure is obviously not mapped to userspace, just copied from/to. In this case GFP_KERNEL is more appropriate, so let's use it, although in the current implementation this does not manifest as any error. Reported-by: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
-
- 08 Jan, 2017 1 commit
-
-
Linus Torvalds authored
-