Commits · ff005a066240efb73ae29a2bb9269ae726bc2eae · nexedi / linux

14 May, 2018 4 commits

block: sanitize blk_get_request calling conventions · ff005a06

Christoph Hellwig authored May 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ff005a06

block: fix __get_request documentation · a9a14d36

Christoph Hellwig authored May 09, 2018

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a9a14d36

scsi/osd: remove the gfp argument to osd_start_request · ac613e45

Christoph Hellwig authored May 09, 2018

Always GFP_KERNEL, and keeping it would cause serious complications for
the next change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ac613e45

memstick: remove unused variables · 058147bc

Christoph Hellwig authored May 14, 2018

Fixes: 7c2d748e ("memstick: don't call blk_queue_bounce_limit")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

058147bc

11 May, 2018 7 commits

ps3disk: handle highmem pages · e4f0e0cb

Christoph Hellwig authored May 09, 2018

The ps3disk driver already kmaps all pages when copying from/to the
internal bounce buffer, so it can accept highmem pages just fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e4f0e0cb

jsflash: handle highmem pages · 37a5b5c6

Christoph Hellwig authored May 09, 2018

Just kmap the bio single page payload before processing it.

(and yes, now highmem on sparc32 anyway, but kmap_(un)map atomic are nops,
so this gives the right example)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

37a5b5c6

aoe: handle highmem pages · ad180f6f

Christoph Hellwig authored May 09, 2018

Use kmap_atomic when copying out of a bio_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ad180f6f

mtd_blkdevs: handle highmem pages · 34ab96e6

Christoph Hellwig authored May 09, 2018

Just kmap the single payload page before passing it on to the FTL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

34ab96e6

memstick: don't call blk_queue_bounce_limit · 7c2d748e

Christoph Hellwig authored May 09, 2018

All in-tree host drivers set up a proper dma mask and use the dma-mapping
helpers.  This means they will be able to deal with any address that we
are throwing at them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7c2d748e

DAC960: don't use block layer bounce buffers · 00f0a51f

Christoph Hellwig authored May 09, 2018

DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

00f0a51f

mtip32xx: don't use block layer bounce buffers · 5c26e050

Christoph Hellwig authored May 09, 2018

mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5c26e050

10 May, 2018 10 commits

sbitmap: warn if using smaller shallow depth than was setup · 61445b56

Omar Sandoval authored May 09, 2018

Make sure the user passed the right value to
sbitmap_queue_min_shallow_depth().
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

61445b56

kyber-iosched: update shallow depth when setting up hardware queue · 28820640

Jens Axboe authored May 09, 2018

We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

28820640

bfq-iosched: update shallow depth to smallest one used · 483b7bf2

Jens Axboe authored May 09, 2018

If our shallow depth is smaller than the wake batching of sbitmap,
we can introduce hangs. Ensure that sbitmap knows how low we'll go.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

483b7bf2

sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow() · a3275539

Omar Sandoval authored May 09, 2018

The sbitmap queue wake batch is calculated such that once allocations
start blocking, all of the bits which are already allocated must be
enough to fulfill the batch counters of all of the waitqueues. However,
the shallow allocation depth can break this invariant, since we block
before our full depth is being utilized. Add
sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
the sbq will use, and update sbq_calc_wake_batch() to take it into
account.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a3275539

bfq-iosched: remove unused variable · bd7d4ef6

Jens Axboe authored May 09, 2018

bfqd->sb_shift was attempted used as a cache for the sbitmap queue
shift, but we don't need it, as it never changes. Kill it with fire.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

bd7d4ef6

bfq: calculate shallow depths at init time · f0635b8a

Jens Axboe authored May 09, 2018

It doesn't change, so don't put it in the per-IO hot path.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f0635b8a

bfq-iosched: don't worry about reserved tags in limit_depth · 55141366

Jens Axboe authored May 09, 2018

Reserved tags are used for error handling, we don't need to
care about them for regular IO. The core won't call us for these
anyway.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

55141366

blk-mq: don't call into depth limiting for reserved tags · 17a51199

Jens Axboe authored May 09, 2018

It's not useful, they are internal and/or error handling recovery
commands.
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

17a51199

block, bfq: postpone rq preparation to insert or merge · 18e5a57d

Paolo Valente authored May 04, 2018

When invoked for an I/O request rq, the prepare_request hook of bfq
increments reference counters in the destination bfq_queue for rq. In
this respect, after this hook has been invoked, rq may still be
transformed into a request with no icq attached, i.e., for bfq, a
request not associated with any bfq_queue. No further hook is invoked
to signal this tranformation to bfq (in general, to the destination
elevator for rq). This leads bfq into an inconsistent state, because
bfq has no chance to correctly lower these counters back. This
inconsistency may in its turn cause incorrect scheduling and hangs. It
certainly causes memory leaks, by making it impossible for bfq to free
the involved bfq_queue.

On the bright side, no transformation can still happen for rq after rq
has been inserted into bfq, or merged with another, already inserted,
request. Exploiting this fact, this commit addresses the above issue
by delaying the preparation of an I/O request to when the request is
inserted or merged.

This change also gives a performance bonus: a lock-contention point
gets removed. To prepare a request, bfq needs to hold its scheduler
lock. After postponing request preparation to insertion or merging, no
lock needs to be grabbed any longer in the prepare_request hook, while
the lock already taken to perform insertion or merging is used to
preparare the request as well.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

18e5a57d

mtip32xx: Fix an error handling path in 'mtip_pci_probe()' · 8e3c283f

Christophe JAILLET authored May 10, 2018

Branch to the right label in the error handling path in order to keep it
logical.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8e3c283f

09 May, 2018 8 commits

brd: Mark as non-rotational · 316ba573

SeongJae Park authored May 03, 2018

This commit sets QUEUE_FLAG_NONROT and clears up QUEUE_FLAG_ADD_RANDOM
to mark the ramdisks as non-rotational device.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

316ba573

block: consolidate struct request timestamp fields · 522a7775

Omar Sandoval authored May 09, 2018

Currently, struct request has four timestamp fields:

- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
  used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq

These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

522a7775

block: move blk_stat_add() to __blk_mq_end_request() · 4bc6339a

Omar Sandoval authored May 09, 2018

We want this next to blk_account_io_done() for the next change so that
we can call ktime_get() only once for both.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4bc6339a

block: use ktime_get_ns() instead of sched_clock() for cfq and bfq · 84c7afce

Omar Sandoval authored May 09, 2018

cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e ("block: disable preemption before
using sched_clock()").
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

84c7afce

block: get rid of struct blk_issue_stat · 544ccc8d

Omar Sandoval authored May 09, 2018

struct blk_issue_stat squashes three things into one u64:

- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling

It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

544ccc8d

block: replace bio->bi_issue_stat with bio-specific type · 5238dcf4

Omar Sandoval authored May 09, 2018

struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5238dcf4

block: pass struct request instead of struct blk_issue_stat to wbt · a8a45941

Omar Sandoval authored May 09, 2018

issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a8a45941

block: move some wbt helpers to blk-wbt.c · 934031a1

Omar Sandoval authored May 09, 2018

A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

934031a1

08 May, 2018 4 commits

blk-wbt: throttle discards like background writes · 782f5697

Jens Axboe authored May 07, 2018

Throttle discards like we would any background write. Discards should
be background activity, so if they are impacting foreground IO, then
we will throttle them down.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

782f5697

blk-wbt: pass in enum wbt_flags to get_rq_wait() · 8bea6090

Jens Axboe authored May 07, 2018

This is in preparation for having more write queues, in which
case we would have needed to pass in more information than just
a simple 'is_kswapd' boolean.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8bea6090

blk-wbt: account any writing command as a write · 825843b0

Jens Axboe authored May 03, 2018

We currently special case WRITE and FLUSH, but we should really
just include any command with the write bit set. This ensures
that we account DISCARD.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

825843b0

block: break discard submissions into the user defined size · af097f5d

Jens Axboe authored May 08, 2018

Don't build discards bigger than what the user asked for, if the
user decided to limit the size by writing to 'discard_max_bytes'.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

af097f5d

07 May, 2018 7 commits

loop: remember whether sysfs_create_group() was done · d3349b6b

Tetsuo Handa authored May 04, 2018

syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.

[1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90bSigned-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Renamed sysfs_ready -> sysfs_inited.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d3349b6b

block: Shorten interrupt disabled regions · 50864670

Thomas Gleixner authored May 04, 2018

Commit 9c40cef2 ("sched: Move blk_schedule_flush_plug() out of
__schedule()") moved the blk_schedule_flush_plug() call out of the
interrupt/preempt disabled region in the scheduler. This allows to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug_list().

But it makes more sense to disable interrupts explicitly when the request
queue is locked end reenable them when the request to is unlocked. This
shortens the interrupt disabled section which is important when the plug
list contains requests for more than one queue. The comment which claims
that disabling interrupts around the loop is misleading as the called
functions can reenable interrupts unconditionally anyway and obfuscates the
scope badly:

 local_irq_save(flags);
   spin_lock(q->queue_lock);
   ...
   queue_unplugged(q...);
     scsi_request_fn();
       spin_unlock_irq(q->queue_lock);

-------------------^^^ ????

       spin_lock_irq(q->queue_lock);
     spin_unlock(q->queue_lock);
 local_irq_restore(flags);

Aside of that the detached interrupt disabling is a constant pain for
PREEMPT_RT as it requires patching and special casing when RT is enabled
while with the spin_*_irq() variants this happens automatically.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.deSigned-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

50864670

block: Remove redundant WARN_ON() · 656cb6d0

Anna-Maria Gleixner authored May 04, 2018

Commit 2fff8a92 ("block: Check locking assumptions at runtime") added a
lockdep_assert_held(q->queue_lock) which makes the WARN_ON() redundant
because lockdep will detect and warn about context violations.

The unconditional WARN_ON() does not provide real additional value, so it
can be removed.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

656cb6d0

block: don't disable interrupts during kmap_atomic() · f3a1075e

Sebastian Andrzej Siewior authored May 04, 2018

bounce_copy_vec() disables interrupts around kmap_atomic(). This is a
leftover from the old kmap_atomic() implementation which relied on fixed
mapping slots, so the caller had to make sure that the same slot could not
be reused from an interrupting context.

kmap_atomic() was changed to dynamic slots long ago and commit 1ec9c5dd
("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
removed the slot assignements, but the callers were not checked for now
redundant interrupt disabling.

Remove the conditional interrupt disable.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f3a1075e

Fix typo in comment. · f142f08b

Florian La Roche authored May 06, 2018

CONFIG_PRREMPT -> CONFIG_PREEMPT
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f142f08b

Merge tag 'devicetree-fixes-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 76787cf4

Linus Torvalds authored May 07, 2018

Pull DeviceTree fixes from Rob Herring:

 - fix path to display timing binding

 - fix some typos in interrupt-names and clock-names

 - fix a resource leak on overlay removal

 - add missing documentation for R8A77965 DMA, serial, and net

 - cleanup sunxi pinctrl description

 - add Kieback & Peter GmbH vendor prefix

* tag 'devicetree-fixes-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: panel: lvds: Fix path to display timing bindings
  dt-bindings: mvebu-uart: DT fix s/interrupts-names/interrupt-names/
  dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/
  of: overlay: Stop leaking resources on overlay removal
  dtc: checks: drop warning for missing PCI bridge bus-range
  dt-bindings: dmaengine: rcar-dmac: document R8A77965 support
  dt-bindings: serial: sh-sci: Add support for r8a77965 (H)SCIF
  dt-bindings: net: ravb: Add support for r8a77965 SoC
  dt-bindings: pinctrl: sunxi: Fix reference to driver
  doc: Add vendor prefix for Kieback & Peter GmbH

76787cf4

Linux 4.17-rc4 · 75bc37fe
Linus Torvalds authored May 06, 2018

75bc37fe