An error occurred fetching the project authors.
- 18 Jun, 2017 2 commits
-
-
Christoph Hellwig authored
Merge three functions only tail-called by blk_mq_free_request into blk_mq_free_request. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 04 May, 2017 1 commit
-
-
Omar Sandoval authored
Preparation for adding more declarations. Signed-off-by:
Omar Sandoval <osandov@fb.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 26 Apr, 2017 3 commits
-
-
Bart Van Assche authored
Since the blk_mq_debugfs_*register_hctxs() functions register and unregister all attributes under the "mq" directory, rename these into blk_mq_debugfs_*register_mq(). Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Bart Van Assche authored
A later patch will move the call of blk_mq_debugfs_register() to a function to which the queue name is not passed as an argument. To avoid having to add a 'name' argument to multiple callers, let blk_mq_debugfs_register() look up the queue name. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Bart Van Assche authored
A later patch in this series will modify blk_mq_debugfs_register() such that it uses q->kobj.parent to determine the name of a request queue. Hence make sure that that pointer is initialized before blk_mq_debugfs_register() is called. To avoid lock inversion, protect sysfs / debugfs registration with the queue sysfs_lock instead of the global mutex all_q_mutex. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 14 Apr, 2017 1 commit
-
-
Omar Sandoval authored
Wire up the sbitmap_get_shallow() operation to the tag code so that a caller can limit the number of tags available to it. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 07 Apr, 2017 1 commit
-
-
Omar Sandoval authored
While dispatching requests, if we fail to get a driver tag, we mark the hardware queue as waiting for a tag and put the requests on a hctx->dispatch list to be run later when a driver tag is freed. However, blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware queues if using a single-queue scheduler with a multiqueue device. If blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we are processing. This means we end up using the hardware queue of the previous request, which may or may not be the same as that of the current request. If it isn't, the wrong hardware queue will end up waiting for a tag, and the requests will be on the wrong dispatch list, leading to a hang. The fix is twofold: 1. Make sure we save which hardware queue we were trying to get a request for in blk_mq_get_driver_tag() regardless of whether it succeeds or not. 2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a blk_mq_hw_queue to make it clear that it must handle multiple hardware queues, since I've already messed this up on a couple of occasions. This didn't appear in testing with nvme and mq-deadline because nvme has more driver tags than the default number of scheduler tags. However, with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 21 Mar, 2017 1 commit
-
-
Omar Sandoval authored
Currently, statistics are gathered in ~0.13s windows, and users grab the statistics whenever they need them. This is not ideal for both in-tree users: 1. Writeback throttling wants its own dynamically sized window of statistics. Since the blk-stats statistics are reset after every window and the wbt windows don't line up with the blk-stats windows, wbt doesn't see every I/O. 2. Polling currently grabs the statistics on every I/O. Again, depending on how the window lines up, we may miss some I/Os. It's also unnecessary overhead to get the statistics on every I/O; the hybrid polling heuristic would be just as happy with the statistics from the previous full window. This reworks the blk-stats infrastructure to be callback-based: users register a callback that they want called at a given time with all of the statistics from the window during which the callback was active. Users can dynamically bucketize the statistics. wbt and polling both currently use read vs. write, but polling can be extended to further subdivide based on request size. The callbacks are kept on an RCU list, and each callback has percpu stats buffers. There will only be a few users, so the overhead on the I/O completion side is low. The stats flushing is also simplified considerably: since the timer function is responsible for clearing the statistics, we don't have to worry about stale statistics. wbt is a trivial conversion. After the conversion, the windowing problem mentioned above is fixed. For polling, we register an extra callback that caches the previous window's statistics in the struct request_queue for the hybrid polling heuristic to use. Since we no longer have a single stats buffer for the request queue, this also removes the sysfs and debugfs stats entries. To replace those, we add a debugfs entry for the poll statistics. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 08 Mar, 2017 2 commits
-
-
Ming Lei authored
Currently from kobject view, both q->mq_kobj and ctx->kobj can be released during one cycle of blk_mq_register_dev() and blk_mq_unregister_dev(). Actually, sw queue's lifetime is same with its request queue's, which is covered by request_queue->kobj. So we don't need to call kobject_put() for the two kinds of kobject in __blk_mq_unregister_dev(), instead we do that in release handler of request queue. Signed-off-by:
Ming Lei <tom.leiming@gmail.com> Tested-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Ming Lei authored
Both q->mq_kobj and sw queues' kobjects should have been initialized once, instead of doing that each add_disk context. Also this patch removes clearing of ctx in blk_mq_init_cpu_queues() because percpu allocator fills zero to allocated variable. This patch fixes one issue[1] reported from Omar. [1] kernel wearning when doing unbind/bind on one scsi-mq device [ 19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong. [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34 [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 19.350920] Workqueue: events_unbound async_run_entry_fn [ 19.350920] Call Trace: [ 19.350920] dump_stack+0x63/0x83 [ 19.350920] kobject_init+0x77/0x90 [ 19.350920] blk_mq_register_dev+0x40/0x130 [ 19.350920] blk_register_queue+0xb6/0x190 [ 19.350920] device_add_disk+0x1ec/0x4b0 [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 19.350920] async_run_entry_fn+0x48/0x150 [ 19.350920] process_one_work+0x1d0/0x480 [ 19.350920] worker_thread+0x48/0x4e0 [ 19.350920] kthread+0x101/0x140 [ 19.350920] ? process_one_work+0x480/0x480 [ 19.350920] ? kthread_create_on_node+0x60/0x60 [ 19.350920] ret_from_fork+0x2c/0x40 Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by:
Ming Lei <tom.leiming@gmail.com> Tested-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 02 Mar, 2017 1 commit
-
-
Omar Sandoval authored
Nothing is using it anymore. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Tested-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 02 Feb, 2017 1 commit
-
-
Omar Sandoval authored
When I added the blk-mq debugging information to debugfs, I didn't notice that blktrace also creates a "block" directory in debugfs. Make them use the same dentry, now created in the core block code. Based on a patch from Jens. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 27 Jan, 2017 4 commits
-
-
Omar Sandoval authored
This fixes a couple of problems: 1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus. 2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at all. Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option. Fixes: 07e4fead ("blk-mq: create debugfs directory tree") Signed-off-by:
Omar Sandoval <osandov@fb.com> Augment Kconfig description. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Tested-by:
Hannes Reinecke <hare@suse.com>
-
Omar Sandoval authored
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 17 Jan, 2017 4 commits
-
-
Jens Axboe authored
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Prep patch for adding an extra tag map for scheduler requests. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
This is in preparation for having another tag set available. Cleanup the parameters, and allow passing in of tags for blk_mq_put_tag(). Signed-off-by:
Jens Axboe <axboe@fb.com> [hch: even more cleanups] Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
Jens Axboe authored
Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Omar Sandoval <osandov@fb.com>
-
- 09 Dec, 2016 1 commit
-
-
Jens Axboe authored
Takes a list of requests, and dispatches it. Moves any residual requests to the dispatch list. Signed-off-by:
Jens Axboe <axboe@fb.com> Reviewed-by:
Hannes Reinecke <hare@suse.com>
-
- 10 Nov, 2016 1 commit
-
-
Jens Axboe authored
For legacy block, we simply track them in the request queue. For blk-mq, we track them on a per-sw queue basis, which we can then sum up through the hardware queues and finally to a per device state. The stats are tracked in, roughly, 0.1s interval windows. Add sysfs files to display the stats. The feature is off by default, to avoid any extra overhead. In-kernel users of it can turn it on by setting QUEUE_FLAG_STATS in the queue flags. We currently don't turn it on if someone just reads any of the stats files, that is something we could add as well. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 08 Nov, 2016 1 commit
-
-
Christoph Hellwig authored
This will allow SCSI to have a single blk_mq_ops structure that either lets the LLDD map the queues to PCIe MSIx vectors or use the default. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 02 Nov, 2016 1 commit
-
-
Bart Van Assche authored
Multiple functions test the BLK_MQ_S_STOPPED bit so introduce a helper function that performs this test. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Ming Lei <tom.leiming@gmail.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 22 Sep, 2016 1 commit
-
-
Thomas Gleixner authored
Replace the block-mq notifier list management with the multi instance facility in the cpu hotplug state machine. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-block@vger.kernel.org Cc: rt@linutronix.de Cc: Christoph Hellwing <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 17 Sep, 2016 2 commits
-
-
Omar Sandoval authored
Allocating your own per-cpu allocation hint separately makes for an awkward API. Instead, allocate the per-cpu hint as part of the struct sbitmap_queue. There's no point for a struct sbitmap_queue without the cache, but you can still use a bare struct sbitmap. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
This is a generally useful data structure, so make it available to anyone else who might want to use it. It's also a nice cleanup separating the allocation logic from the rest of the tag handling logic. The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only selected by CONFIG_BLOCK for now. This should be a complete noop functionality-wise. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 15 Sep, 2016 2 commits
-
-
Christoph Hellwig authored
This allows drivers specify their own queue mapping by overriding the setup-time function that builds the mq_map. This can be used for example to build the map based on the MSI-X vector mapping provided by the core interrupt layer for PCI devices. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 09 Feb, 2016 1 commit
-
-
Keith Busch authored
The hardware's provided queue count may change at runtime with resource provisioning. This patch allows a block driver to alter the number of h/w queues available when its resource count changes. The main part is a new blk-mq API to request a new number of h/w queues for a given live tag set. The new API freezes all queues using that set, then adjusts the allocated count prior to remapping these to CPUs. The bulk of the rest just shifts where h/w contexts and all their artifacts are allocated and freed. The number of max h/w contexts is capped to the number of possible cpus since there is no use for more than that. As such, all pre-allocated memory for pointers need to account for the max possible rather than the initial number of queues. A side effect of this is that the blk-mq will proceed successfully as long as it can allocate at least one h/w context. Previously it would fail request queue initialization if less than the requested number was allocated. Signed-off-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Jon Derrick <jonathan.derrick@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 01 Dec, 2015 1 commit
-
-
Christoph Hellwig authored
We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 11 Nov, 2015 1 commit
-
-
Jens Axboe authored
It's no longer used outside of blk-mq core. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 09 Oct, 2015 1 commit
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 29 Sep, 2015 1 commit
-
-
Akinobu Mita authored
Notifier callbacks for CPU_ONLINE action can be run on the other CPU than the CPU which was just onlined. So it is possible for the process running on the just onlined CPU to insert request and run hw queue before establishing new mapping which is done by blk_mq_queue_reinit_notify(). This can cause a problem when the CPU has just been onlined first time since the request queue was initialized. At this time ctx->index_hw for the CPU, which is the index in hctx->ctxs[] for this ctx, is still zero before blk_mq_queue_reinit_notify() is called by notifier callbacks for CPU_ONLINE action. For example, there is a single hw queue (hctx) and two CPU queues (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and a request is inserted into ctx1->rq_list and set bit0 in pending bitmap as ctx1->index_hw is still zero. And then while running hw queue, flush_busy_ctxs() finds bit0 is set in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list. But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is ignored. Fix it by ensuring that new mapping is established before onlined cpu starts running. Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by:
Ming Lei <tom.leiming@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 29 Jan, 2015 1 commit
-
-
Ming Lei authored
The kobject memory inside blk-mq hctx/ctx shouldn't have been freed before the kobject is released because driver core can access it freely before its release. We can't do that in all ctx/hctx/mq_kobj's release handler because it can be run before blk_cleanup_queue(). Given mq_kobj shouldn't have been introduced, this patch simply moves mq's release into blk_release_queue(). Reported-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 31 Dec, 2014 1 commit
-
-
Jens Axboe authored
If it's dying, we can't expect new request to complete and come in an wake up other tasks waiting for requests. So after we have marked it as dying, wake up everybody currently waiting for a request. Once they wake, they will retry their allocation and fail appropriately due to the state of the queue. Tested-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 09 Dec, 2014 1 commit
-
-
Ming Lei authored
When one hardware queue has no mapped software queues, it shouldn't have been scheduled. Otherwise WARNING or OOPS can triggered. blk_mq_hw_queue_mapped() helper is introduce for fixing the problem. Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 25 Sep, 2014 2 commits
-
-
Ming Lei authored
These two temporary functions are introduced for holding flush initialization and de-initialization, so that we can introduce 'flush queue' easier in the following patch. And once 'flush queue' and its allocation/free functions are ready, they will be removed for sake of code readability. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Ming Lei authored
It is reasonable to allocate flush req in blk_mq_init_flush(). Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-