1. 12 Jun, 2023 18 commits
  2. 09 Jun, 2023 1 commit
  3. 07 Jun, 2023 14 commits
  4. 05 Jun, 2023 7 commits
    • Li Nan's avatar
      blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost · 8d211554
      Li Nan authored
      adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled
      when unlock. DEADLOCK might happen if we have held other locks and disabled
      IRQ before invoking it.
      
      Fix it by using spin_lock_irqsave() instead, which can keep IRQ state
      consistent with before when unlock.
      
        ================================
        WARNING: inconsistent lock state
        5.10.0-02758-g8e5f91fd772f #26 Not tainted
        --------------------------------
        inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
        kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes:
        ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
        ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
        {IN-HARDIRQ-W} state was registered at:
          __lock_acquire+0x3d7/0x1070
          lock_acquire+0x197/0x4a0
          __raw_spin_lock_irqsave
          _raw_spin_lock_irqsave+0x3b/0x60
          bfq_idle_slice_timer_body
          bfq_idle_slice_timer+0x53/0x1d0
          __run_hrtimer+0x477/0xa70
          __hrtimer_run_queues+0x1c6/0x2d0
          hrtimer_interrupt+0x302/0x9e0
          local_apic_timer_interrupt
          __sysvec_apic_timer_interrupt+0xfd/0x420
          run_sysvec_on_irqstack_cond
          sysvec_apic_timer_interrupt+0x46/0xa0
          asm_sysvec_apic_timer_interrupt+0x12/0x20
        irq event stamp: 837522
        hardirqs last  enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore
        hardirqs last  enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40
        hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq
        hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50
        softirqs last  enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec
        softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&bfqd->lock);
          <Interrupt>
            lock(&bfqd->lock);
      
         *** DEADLOCK ***
      
        3 locks held by kworker/2:3/388:
         #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0
         #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0
         #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
         #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
      
        stack backtrace:
        CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        Workqueue: kthrotld blk_throtl_dispatch_work_fn
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x107/0x167
         print_usage_bug
         valid_state
         mark_lock_irq.cold+0x32/0x3a
         mark_lock+0x693/0xbc0
         mark_held_locks+0x9e/0xe0
         __trace_hardirqs_on_caller
         lockdep_hardirqs_on_prepare.part.0+0x151/0x360
         trace_hardirqs_on+0x5b/0x180
         __raw_spin_unlock_irq
         _raw_spin_unlock_irq+0x24/0x40
         spin_unlock_irq
         adjust_inuse_and_calc_cost+0x4fb/0x970
         ioc_rqos_merge+0x277/0x740
         __rq_qos_merge+0x62/0xb0
         rq_qos_merge
         bio_attempt_back_merge+0x12c/0x4a0
         blk_mq_sched_try_merge+0x1b6/0x4d0
         bfq_bio_merge+0x24a/0x390
         __blk_mq_sched_bio_merge+0xa6/0x460
         blk_mq_sched_bio_merge
         blk_mq_submit_bio+0x2e7/0x1ee0
         __submit_bio_noacct_mq+0x175/0x3b0
         submit_bio_noacct+0x1fb/0x270
         blk_throtl_dispatch_work_fn+0x1ef/0x2b0
         process_one_work+0x83e/0x13f0
         process_scheduled_works
         worker_thread+0x7e3/0xd80
         kthread+0x353/0x470
         ret_from_fork+0x1f/0x30
      
      Fixes: b0853ab4 ("blk-iocost: revamp in-period donation snapbacks")
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8d211554
    • Christoph Hellwig's avatar
      block: mark early_lookup_bdev as __init · 2577f53f
      Christoph Hellwig authored
      early_lookup_bdev is now only used during the early boot code as it
      should, so mark it __init to not waste run time memory on it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230531125535.676098-25-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2577f53f
    • Christoph Hellwig's avatar
      mtd: block2mtd: don't call early_lookup_bdev after the system is running · 8d03187e
      Christoph Hellwig authored
      early_lookup_bdev is supposed to only be called from the early boot
      code, but mdtblock_early_get_bdev is called as a general fallback when
      lookup_bdev fails, which is problematic because early_lookup_bdev
      bypasses all normal path based permission checking, and might cause
      problems with certain container environments renaming devices.
      
      Switch to only call early_lookup_bdev when block2mtd is built-in and the
      system state in not running yet.
      
      Note that this strictly speaking changes the kernel ABI as the PARTUUID=
      and PARTLABEL= style syntax is now not available during a running
      systems.  They never were intended for that, but this breaks things
      we'll have to figure out a way to make them available again.  But if
      avoidable in any way I'd rather avoid that.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/r/20230531125535.676098-24-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8d03187e
    • Christoph Hellwig's avatar
      mtd: block2mtd: factor the early block device open logic into a helper · b2baa574
      Christoph Hellwig authored
      Simplify add_device a bit by splitting out the cumbersome early boot logic
      into a separate helper.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/r/20230531125535.676098-23-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2baa574
    • Christoph Hellwig's avatar
      PM: hibernate: don't use early_lookup_bdev in resume_store · 1e8c813b
      Christoph Hellwig authored
      resume_store is a sysfs attribute written during normal kernel runtime,
      and it should not use the early_lookup_bdev API that bypasses all normal
      path based permission checking, and might cause problems with certain
      container environments renaming devices.
      
      Switch to lookup_bdev, which does a normal path lookup instead, and fall
      back to trying to parse a numeric dev_t just like early_lookup_bdev did.
      
      Note that this strictly speaking changes the kernel ABI as the PARTUUID=
      and PARTLABEL= style syntax is now not available during a running
      systems.  They never were intended for that, but this breaks things
      we'll have to figure out a way to make them available again.  But if
      avoidable in any way I'd rather avoid that.
      
      Fixes: 421a5fa1 ("PM / hibernate: use name_to_dev_t to parse resume")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarRafael J. Wysocki <rafael@kernel.org>
      Link: https://lore.kernel.org/r/20230531125535.676098-22-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1e8c813b
    • Christoph Hellwig's avatar
      dm: only call early_lookup_bdev from early boot context · 7a126d5b
      Christoph Hellwig authored
      early_lookup_bdev is supposed to only be called from the early boot
      code, but dm_get_device calls it as a general fallback when lookup_bdev
      fails, which is problematic because early_lookup_bdev bypasses all normal
      path based permission checking, and might cause problems with certain
      container environments renaming devices.
      
      Switch to only call early_lookup_bdev when dm is built-in and the system
      state in not running yet.  This means it is still available when tables
      are constructed by dm-init.c from the kernel command line, but not
      otherwise.
      
      Note that this strictly speaking changes the kernel ABI as the PARTUUID=
      and PARTLABEL= style syntax is now not available during a running
      systems.  They never were intended for that, but this breaks things
      we'll have to figure out a way to make them available again.  But if
      avoidable in any way I'd rather avoid that.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMike Snitzer <snitzer@kernel.org>
      Link: https://lore.kernel.org/r/20230531125535.676098-21-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7a126d5b
    • Christoph Hellwig's avatar
      dm: remove dm_get_dev_t · d4a28d7d
      Christoph Hellwig authored
      Open code dm_get_dev_t in the only remaining caller, and propagate the
      exact error code from lookup_bdev and early_lookup_bdev.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230531125535.676098-20-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4a28d7d