1. 08 Dec, 2018 20 commits
    • Ming Lei's avatar
      blk-mq: re-build queue map in case of kdump kernel · 59388702
      Ming Lei authored
      Now almost all .map_queues() implementation based on managed irq
      affinity doesn't update queue mapping and it just retrieves the
      old built mapping, so if nr_hw_queues is changed, the mapping talbe
      includes stale mapping. And only blk_mq_map_queues() may rebuild
      the mapping talbe.
      
      One case is that we limit .nr_hw_queues as 1 in case of kdump kernel.
      However, drivers often builds queue mapping before allocating tagset
      via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set
      as 1 in case of kdump kernel, so wrong queue mapping is used, and
      kernel panic[1] is observed during booting.
      
      This patch fixes the kernel panic triggerd on nvme by rebulding the
      mapping table via blk_mq_map_queues().
      
      [1] kernel panic log
      [    4.438371] nvme nvme0: 16/0/0 default/read/poll queues
      [    4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      [    4.444681] PGD 0 P4D 0
      [    4.445367] Oops: 0000 [#1] SMP NOPTI
      [    4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459
      [    4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [    4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [    4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222
      [    4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6
      [    4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286
      [    4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001
      [    4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001
      [    4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002
      [    4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538
      [    4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001
      [    4.469220] FS:  0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000
      [    4.471554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0
      [    4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    4.477061] PKRU: 55555554
      [    4.477464] Call Trace:
      [    4.478731]  blk_mq_init_allocated_queue+0x36a/0x3ad
      [    4.479595]  blk_mq_init_queue+0x32/0x4e
      [    4.480178]  nvme_validate_ns+0x98/0x623 [nvme_core]
      [    4.480963]  ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core]
      [    4.481685]  ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core]
      [    4.482601]  nvme_scan_work+0x23a/0x29b [nvme_core]
      [    4.483269]  ? _raw_spin_unlock_irqrestore+0x25/0x38
      [    4.483930]  ? try_to_wake_up+0x38d/0x3b3
      [    4.484478]  ? process_one_work+0x179/0x2fc
      [    4.485118]  process_one_work+0x1d3/0x2fc
      [    4.485655]  ? rescuer_thread+0x2ae/0x2ae
      [    4.486196]  worker_thread+0x1e9/0x2be
      [    4.486841]  kthread+0x115/0x11d
      [    4.487294]  ? kthread_park+0x76/0x76
      [    4.487784]  ret_from_fork+0x3a/0x50
      [    4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables
      [    4.489428] Dumping ftrace buffer:
      [    4.489939]    (ftrace buffer empty)
      [    4.490492] CR2: 0000000000000098
      [    4.491052] ---[ end trace 03cd268ad5a86ff7 ]---
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: linux-nvme@lists.infradead.org
      Cc: David Milburn <dmilburn@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59388702
    • Dennis Zhou's avatar
      blkcg: put back rcu lock in blkcg_bio_issue_check() · 4705de73
      Dennis Zhou authored
      I was a little overzealous in removing the rcu_read_lock() call from
      blkcg_bio_issue_check() and it broke blk-throttle. Put it back.
      
      Fixes: e35403a034bf ("blkcg: associate blkg when associating a device")
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4705de73
    • Josef Bacik's avatar
      block: convert io-latency to use rq_qos_wait · d3fcdff1
      Josef Bacik authored
      Now that we have this common helper, convert io-latency over to use it
      as well.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d3fcdff1
    • Josef Bacik's avatar
      block: convert wbt_wait() to use rq_qos_wait() · b6c7b58f
      Josef Bacik authored
      Now that we have rq_qos_wait() in place, convert wbt_wait() over to
      using it with it's specific callbacks.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b6c7b58f
    • Josef Bacik's avatar
      block: add rq_qos_wait to rq_qos · 84f60324
      Josef Bacik authored
      Originally when I split out the common code from blk-wbt into rq_qos I
      left the wbt_wait() where it was and simply copied and modified it
      slightly to work for io-latency.  However they are both basically the
      same thing, and as time has gone on wbt_wait() has ended up much smarter
      and kinder than it was when I copied it into io-latency, which means
      io-latency has lost out on these improvements.
      
      Since they are the same thing essentially except for a few minor things,
      create rq_qos_wait() that replicates what wbt_wait() currently does with
      callbacks that can be passed in for the snowflakes to do their own thing
      as appropriate.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      84f60324
    • Dennis Zhou's avatar
      blkcg: rename blkg_try_get() to blkg_tryget() · 7754f669
      Dennis Zhou authored
      blkg reference counting now uses percpu_ref rather than atomic_t. Let's
      make this consistent with css_tryget. This renames blkg_try_get to
      blkg_tryget and now returns a bool rather than the blkg or %NULL.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7754f669
    • Dennis Zhou's avatar
      blkcg: change blkg reference counting to use percpu_ref · 7fcf2b03
      Dennis Zhou authored
      Every bio is now associated with a blkg putting blkg_get, blkg_try_get,
      and blkg_put on the hot path. Switch over the refcnt in blkg to use
      percpu_ref.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7fcf2b03
    • Dennis Zhou's avatar
      blkcg: remove bio_disassociate_task() · 6f70fb66
      Dennis Zhou authored
      Now that a bio only holds a blkg reference, so clean up is simply
      putting back that reference. Remove bio_disassociate_task() as it just
      calls bio_disassociate_blkg() and call the latter directly.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6f70fb66
    • Dennis Zhou's avatar
      blkcg: remove additional reference to the css · fc5a828b
      Dennis Zhou authored
      The previous patch in this series removed carrying around a pointer to
      the css in blkg. However, the blkg association logic still relied on
      taking a reference on the css to ensure we wouldn't fail in getting a
      reference for the blkg.
      
      Here the implicit dependency on the css is removed. The association
      continues to rely on the tryget logic walking up the blkg tree. This
      streamlines the three ways that association can happen: normal, swap,
      and writeback.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fc5a828b
    • Dennis Zhou's avatar
      blkcg: remove bio->bi_css and instead use bio->bi_blkg · db6638d7
      Dennis Zhou authored
      Prior patches ensured that any bio that interacts with a request_queue
      is properly associated with a blkg. This makes bio->bi_css unnecessary
      as blkg maintains a reference to blkcg already.
      
      This removes the bio field bi_css and transfers corresponding uses to
      access via bi_blkg.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db6638d7
    • Dennis Zhou's avatar
      blkcg: associate writeback bios with a blkg · fd42df30
      Dennis Zhou authored
      One of the goals of this series is to remove a separate reference to
      the css of the bio. This can and should be accessed via bio_blkcg(). In
      this patch, wbc_init_bio() now requires a bio to have a device
      associated with it.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd42df30
    • Dennis Zhou's avatar
      blkcg: associate a blkg for pages being evicted by swap · 6a7f6d86
      Dennis Zhou authored
      A prior patch in this series added blkg association to bios issued by
      cgroups. There are two other paths that we want to attribute work back
      to the appropriate cgroup: swap and writeback. Here we modify the way
      swap tags bios to include the blkg. Writeback will be tackle in the next
      patch.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6a7f6d86
    • Dennis Zhou's avatar
      blkcg: consolidate bio_issue_init() to be a part of core · e439bedf
      Dennis Zhou authored
      bio_issue_init among other things initializes the timestamp for an IO.
      Rather than have this logic handled by policies, this consolidates it to
      be on the init paths (normal, clone, bounce clone).
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e439bedf
    • Dennis Zhou's avatar
      blkcg: associate blkg when associating a device · 5cdf2e3f
      Dennis Zhou authored
      Previously, blkg association was handled by controller specific code in
      blk-throttle and blk-iolatency. However, because a blkg represents a
      relationship between a blkcg and a request_queue, it makes sense to keep
      the blkg->q and bio->bi_disk->queue consistent.
      
      This patch moves association into the bio_set_dev macro(). This should
      cover the majority of cases where the device is set/changed keeping the
      two pointers consistent. Fallback code is added to
      blkcg_bio_issue_check() to catch any missing paths.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5cdf2e3f
    • Dennis Zhou's avatar
      dm: set the static flush bio device on demand · 892ad71f
      Dennis Zhou authored
      The next patch changes the macro bio_set_dev() to associate a bio with a
      blkg based on the device set. However, dm creates a static bio to be
      used as the basis for cloning empty flush bios on creation. The
      bio_set_dev() call in alloc_dev() will cause problems with the next
      patch adding association to bio_set_dev() because the call is before the
      bdev is associated with a gendisk (bd_disk is %NULL). To get around
      this, set the device on the static bio every time and use that to clone
      to the other bios.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      892ad71f
    • Dennis Zhou's avatar
      blkcg: introduce common blkg association logic · 2268c0fe
      Dennis Zhou authored
      There are 3 ways blkg association can happen: association with the
      current css, with the page css (swap), or from the wbc css (writeback).
      
      This patch handles how association is done for the first case where we
      are associating bsaed on the current css. If there is already a blkg
      associated, the css will be reused and association will be redone as the
      request_queue may have changed.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2268c0fe
    • Dennis Zhou's avatar
      blkcg: convert blkg_lookup_create() to find closest blkg · beea9da0
      Dennis Zhou authored
      There are several scenarios where blkg_lookup_create() can fail such as
      the blkcg dying, request_queue is dying, or simply being OOM. Most
      handle this by simply falling back to the q->root_blkg and calling it a
      day.
      
      This patch implements the notion of closest blkg. During
      blkg_lookup_create(), if it fails to create, return the closest blkg
      found or the q->root_blkg. blkg_try_get_closest() is introduced and used
      during association so a bio is always attached to a blkg.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      beea9da0
    • Dennis Zhou's avatar
      blkcg: update blkg_lookup_create() to do locking · b978962a
      Dennis Zhou authored
      To know when to create a blkg, the general pattern is to do a
      blkg_lookup() and if that fails, lock and do the lookup again, and if
      that fails finally create. It doesn't make much sense for everyone who
      wants to do creation to write this themselves.
      
      This changes blkg_lookup_create() to do locking and implement this
      pattern. The old blkg_lookup_create() is renamed to
      __blkg_lookup_create().  If a call site wants to do its own error
      handling or already owns the queue lock, they can use
      __blkg_lookup_create(). This will be used in upcoming patches.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b978962a
    • Dennis Zhou's avatar
      blkcg: fix ref count issue with bio_blkcg() using task_css · 0fe061b9
      Dennis Zhou authored
      The bio_blkcg() function turns out to be inconsistent and consequently
      dangerous to use. The first part returns a blkcg where a reference is
      owned by the bio meaning it does not need to be rcu protected. However,
      the third case, the last line, is problematic:
      
      	return css_to_blkcg(task_css(current, io_cgrp_id));
      
      This can race against task migration and the cgroup dying. It is also
      semantically different as it must be called rcu protected and is
      susceptible to failure when trying to get a reference to it.
      
      This patch adds association ahead of calling bio_blkcg() rather than
      after. This makes association a required and explicit step along the
      code paths for calling bio_blkcg(). In blk-iolatency, association is
      moved above the bio_blkcg() call to ensure it will not return %NULL.
      
      BFQ uses the old bio_blkcg() function, but I do not want to address it
      in this series due to the complexity. I have created a private version
      documenting the inconsistency and noting not to use it.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0fe061b9
    • Jens Axboe's avatar
      blk-mq: remove QUEUE_FLAG_POLL from default MQ flags · 6e0de611
      Jens Axboe authored
      We only support polling if we have poll queues now, but the flag is
      being set by default. Remove the default QUEUE_FLAG_POLL setting, we'll
      set it in blk_mq_init_allocated_queue() if we have poll queues available
      for this device.
      
      Fixes: 6544d229 ("block: enable polling by default if a poll map is initalized")
      Reported-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e0de611
  2. 04 Dec, 2018 16 commits
  3. 03 Dec, 2018 1 commit
  4. 02 Dec, 2018 3 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc5 · 25956467
      Linus Torvalds authored
      25956467
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 6a512726
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Volume is a little higher than usual due to a set of gpio fixes for
        Davinci platforms that's been around a while, still seemed appropriate
        to not hold off until next merge window.
      
        Besides that it's the usual mix of minor fixes, mostly corrections of
        small stuff in device trees.
      
        Major stability-related one is the removal of a regulator from DT on
        Rock960, since DVFS caused undervoltage. I expect it'll be restored
        once they figure out the underlying issue"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
        MAINTAINERS: Remove unused Qualcomm SoC mailing list
        ARM: davinci: dm644x: set the GPIO base to 0
        ARM: davinci: da830: set the GPIO base to 0
        ARM: davinci: dm355: set the GPIO base to 0
        ARM: davinci: dm646x: set the GPIO base to 0
        ARM: davinci: dm365: set the GPIO base to 0
        ARM: davinci: da850: set the GPIO base to 0
        gpio: davinci: restore a way to manually specify the GPIO base
        ARM: davinci: dm644x: define gpio interrupts as separate resources
        ARM: davinci: dm355: define gpio interrupts as separate resources
        ARM: davinci: dm646x: define gpio interrupts as separate resources
        ARM: davinci: dm365: define gpio interrupts as separate resources
        ARM: davinci: da8xx: define gpio interrupts as separate resources
        ARM: dts: at91: sama5d2: use the divided clock for SMC
        ARM: dts: imx51-zii-rdu1: Remove EEPROM node
        ARM: dts: rockchip: Remove @0 from the veyron memory node
        arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
        arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP
        arm64: dts: sdm845-mtp: Reserve reserved gpios
        arm64: dts: ti: k3-am654: Fix wakeup_uart reg address
        ...
      6a512726
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 292974c5
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - A revert of a previous commit as it is no longer necessary and has
         shown to cause problems in some memory hotplug cases.
      
       - Some small fixes and a minor cleanup.
      
       - A patch for adding better diagnostic data in a very rare failure
         case.
      
      * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        pvcalls-front: fixes incorrect error handling
        Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
        xen: xlate_mmu: add missing header to fix 'W=1' warning
        xen/x86: add diagnostic printout to xen_mc_flush() in case of error
        x86/xen: cleanup includes in arch/x86/xen/spinlock.c
      292974c5