- 14 May, 2018 14 commits
-
-
Kent Overstreet authored
Found a bug (with ASAN) where we were passing a bio to bio_copy_data() with bi_next not NULL, when it should have been - a driver had left bi_next set to something after calling bio_endio(). Since the normal case is only copying single bios, split out bio_list_copy_data() to avoid more bugs like this in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Add versions that take bvec_iter args instead of using bio->bi_iter - to be used by bcachefs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Similarly to mempool_init()/mempool_exit(), take a pointer indirection out of allocation/freeing by allowing biosets to be embedded in other structs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Minor performance improvement by getting rid of pointer indirections from allocation/freeing fastpaths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Allows mempools to be embedded in other structs, getting rid of a pointer indirection from allocation fastpaths. mempool_exit() is safe to call on an uninitialized but zeroed mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we have multiple callers of sbq_wake_up(), we can end up in a situation where the wait_cnt will continually go more and more negative. Consider the case where our wake batch is 1, hence wait_cnt will start out as 1. wait_cnt == 1 CPU0 CPU1 atomic_dec_return(), cnt == 0 atomic_dec_return(), cnt == -1 cmpxchg(-1, 0) (succeeds) [wait_cnt now 0] cmpxchg(0, 1) (fails) This ends up with wait_cnt being 0, we'll wakeup immediately next time. Going through the same loop as above again, and we'll have wait_cnt -1. For the case where we have a larger wake batch, the only difference is that the starting point will be higher. We'll still end up with continually smaller batch wakeups, which defeats the purpose of the rolling wakeups. Always reset the wait_cnt to the batch value. Then it doesn't matter who wins the race. But ensure that whomever does win the race is the one that increments the ws index and wakes up our batch count, loser gets to call __sbq_wake_up() again to account his wakeups towards the next active wait state index. Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Always GFP_KERNEL, and keeping it would cause serious complications for the next change. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Fixes: 7c2d748e ("memstick: don't call blk_queue_bounce_limit") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 11 May, 2018 7 commits
-
-
Christoph Hellwig authored
The ps3disk driver already kmaps all pages when copying from/to the internal bounce buffer, so it can accept highmem pages just fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Just kmap the bio single page payload before processing it. (and yes, now highmem on sparc32 anyway, but kmap_(un)map atomic are nops, so this gives the right example) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use kmap_atomic when copying out of a bio_vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Just kmap the single payload page before passing it on to the FTL. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All in-tree host drivers set up a proper dma mask and use the dma-mapping helpers. This means they will be able to deal with any address that we are throwing at them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
DAC960 just sets the block bounce limit to the dma mask, which means that the iommu or swiotlb already take care of the bounce buffering, and the block bouncing can be removed. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
mtip32xx just sets the block bounce limit to the dma mask, which means that the iommu or swiotlb already take care of the bounce buffering, and the block bouncing can be removed. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 10 May, 2018 10 commits
-
-
Omar Sandoval authored
Make sure the user passed the right value to sbitmap_queue_min_shallow_depth(). Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We don't expect the async depth to be smaller than the wake batch count for sbitmap, but just in case, inform sbitmap of what shallow depth kyber may use. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If our shallow depth is smaller than the wake batching of sbitmap, we can introduce hangs. Ensure that sbitmap knows how low we'll go. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
The sbitmap queue wake batch is calculated such that once allocations start blocking, all of the bits which are already allocated must be enough to fulfill the batch counters of all of the waitqueues. However, the shallow allocation depth can break this invariant, since we block before our full depth is being utilized. Add sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth the sbq will use, and update sbq_calc_wake_batch() to take it into account. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
bfqd->sb_shift was attempted used as a cache for the sbitmap queue shift, but we don't need it, as it never changes. Kill it with fire. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
It doesn't change, so don't put it in the per-IO hot path. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Reserved tags are used for error handling, we don't need to care about them for regular IO. The core won't call us for these anyway. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
It's not useful, they are internal and/or error handling recovery commands. Acked-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
When invoked for an I/O request rq, the prepare_request hook of bfq increments reference counters in the destination bfq_queue for rq. In this respect, after this hook has been invoked, rq may still be transformed into a request with no icq attached, i.e., for bfq, a request not associated with any bfq_queue. No further hook is invoked to signal this tranformation to bfq (in general, to the destination elevator for rq). This leads bfq into an inconsistent state, because bfq has no chance to correctly lower these counters back. This inconsistency may in its turn cause incorrect scheduling and hangs. It certainly causes memory leaks, by making it impossible for bfq to free the involved bfq_queue. On the bright side, no transformation can still happen for rq after rq has been inserted into bfq, or merged with another, already inserted, request. Exploiting this fact, this commit addresses the above issue by delaying the preparation of an I/O request to when the request is inserted or merged. This change also gives a performance bonus: a lock-contention point gets removed. To prepare a request, bfq needs to hold its scheduler lock. After postponing request preparation to insertion or merging, no lock needs to be grabbed any longer in the prepare_request hook, while the lock already taken to perform insertion or merging is used to preparare the request as well. Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christophe JAILLET authored
Branch to the right label in the error handling path in order to keep it logical. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 09 May, 2018 8 commits
-
-
SeongJae Park authored
This commit sets QUEUE_FLAG_NONROT and clears up QUEUE_FLAG_ADD_RANDOM to mark the ramdisks as non-rotational device. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
We want this next to blk_account_io_done() for the next change so that we can call ktime_get() only once for both. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
cfq and bfq have some internal fields that use sched_clock() which can trivially use ktime_get_ns() instead. Their timestamp fields in struct request can also use ktime_get_ns(), which resolves the 8 year old comment added by commit 28f4197e ("block: disable preemption before using sched_clock()"). Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
struct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even use the blk-stats interface, so we can provide a separate implementation specific for bios. The helpers work the same way as the blk-stats helpers. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
issue_stat is going to go away, so first make writeback throttling take the containing request, update the internal wbt helpers accordingly, and change rwb->sync_cookie to be the request pointer instead of the issue_stat pointer. No functional change. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Omar Sandoval authored
A few helpers are only used from blk-wbt.c, so move them there, and put wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation for changing how the wbt flags are tracked. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 08 May, 2018 1 commit
-
-
Jens Axboe authored
Throttle discards like we would any background write. Discards should be background activity, so if they are impacting foreground IO, then we will throttle them down. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-