1. 09 Oct, 2020 9 commits
  2. 08 Oct, 2020 10 commits
  3. 07 Oct, 2020 1 commit
  4. 06 Oct, 2020 6 commits
  5. 05 Oct, 2020 5 commits
  6. 29 Sep, 2020 1 commit
  7. 28 Sep, 2020 1 commit
    • Xianting Tian's avatar
      blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps() · 8229cca8
      Xianting Tian authored
      We found blk_mq_alloc_rq_maps() takes more time in kernel space when
      testing nvme device hot-plugging. The test and anlysis as below.
      
      Debug code,
      1, blk_mq_alloc_rq_maps():
              u64 start, end;
              depth = set->queue_depth;
              start = ktime_get_ns();
              pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
                              current->pid, current->comm, current->nvcsw, current->nivcsw,
                              set->queue_depth, set->nr_hw_queues);
              do {
                      err = __blk_mq_alloc_rq_maps(set);
                      if (!err)
                              break;
      
                      set->queue_depth >>= 1;
                      if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
                              err = -ENOMEM;
                              break;
                      }
              } while (set->queue_depth);
              end = ktime_get_ns();
              pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
                              current->pid, current->comm,
                              current->nvcsw, current->nivcsw, end - start);
      
      2, __blk_mq_alloc_rq_maps():
              u64 start, end;
              for (i = 0; i < set->nr_hw_queues; i++) {
                      start = ktime_get_ns();
                      if (!__blk_mq_alloc_rq_map(set, i))
                              goto out_unwind;
                      end = ktime_get_ns();
                      pr_err("hw queue %d init cost time %lld ns\n", i, end - start);
              }
      
      Test nvme hot-plugging with above debug code, we found it totally cost more
      than 3ms in kernel space without being scheduled out when alloc rqs for all
      16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost
      time will be increased with hw queue number and queue depth increasing. And
      in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try
      "queue_depth >>= 1", more time will be consumed.
      	[  428.428771] nvme nvme0: pci function 10000:01:00.0
      	[  428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002)
      	[  428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A
      	[  428.428809] nvme 10000:01:00.0: PCI INT A: no GSI
      	[  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1
      	[  432.593404] hw queue 0 init cost time 22883 ns
      	[  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns
      	[  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
      	[  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16
      	[  432.596203] hw queue 0 init cost time 242630 ns
      	[  432.596441] hw queue 1 init cost time 235913 ns
      	[  432.596659] hw queue 2 init cost time 216461 ns
      	[  432.596877] hw queue 3 init cost time 215851 ns
      	[  432.597107] hw queue 4 init cost time 228406 ns
      	[  432.597336] hw queue 5 init cost time 227298 ns
      	[  432.597564] hw queue 6 init cost time 224633 ns
      	[  432.597785] hw queue 7 init cost time 219954 ns
      	[  432.597937] hw queue 8 init cost time 150930 ns
      	[  432.598082] hw queue 9 init cost time 143496 ns
      	[  432.598231] hw queue 10 init cost time 147261 ns
      	[  432.598397] hw queue 11 init cost time 164522 ns
      	[  432.598542] hw queue 12 init cost time 143401 ns
      	[  432.598692] hw queue 13 init cost time 148934 ns
      	[  432.598841] hw queue 14 init cost time 147194 ns
      	[  432.598991] hw queue 15 init cost time 148942 ns
      	[  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns
      	[  432.602611]  nvme0n1: p1
      
      So use this patch to trigger schedule between each hw queue init, to avoid
      other threads getting stuck. It is not in atomic context when executing
      __blk_mq_alloc_rq_maps(), so it is safe to call cond_resched().
      Signed-off-by: default avatarXianting Tian <tian.xianting@h3c.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8229cca8
  8. 25 Sep, 2020 7 commits
    • Tejun Heo's avatar
      iocost: consider iocgs with active delays for debt forgiveness · bec02dbb
      Tejun Heo authored
      An iocg may have 0 debt but non-zero delay. The current debt forgiveness
      logic doesn't act on such iocgs. This can lead to unexpected behaviors - an
      iocg with a little bit of debt will have its delay canceled through debt
      forgiveness but one w/o any debt but active delay will have to wait out
      until its delay decays out.
      
      This patch updates the debt handling logic so that it treats delays the same
      as debts. If either debt or delay is active, debt forgiveness logic kicks in
      and acts on both the same way.
      
      Also, avoid turning the debt and delay directly to zero as that can confuse
      state transitions.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bec02dbb
    • Tejun Heo's avatar
      iocost: add iocg_forgive_debt tracepoint · c5a6561b
      Tejun Heo authored
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5a6561b
    • Tejun Heo's avatar
      iocost: reimplement debt forgiveness using average usage · c7af2a00
      Tejun Heo authored
      Debt forgiveness logic was counting the number of consecutive !busy periods
      as the trigger condition. While this usually works, it can easily be thrown
      off by temporary fluctuations especially on configurations w/ short periods.
      
      This patch reimplements debt forgiveness so that:
      
      * Use the average usage over the forgiveness period instead of counting
        consecutive periods.
      
      * Debt is reduced at around the target rate (1/2 every 100ms) regardless of
        ioc period duration.
      
      * Usage threshold is raised to 50%. Combined with the preceding changes and
        the switch to average usage, this makes debt forgivness a lot more
        effective at reducing the amount of unnecessary idleness.
      
      * Constants are renamed with DFGV_ prefix.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c7af2a00
    • Tejun Heo's avatar
      iocost: recalculate delay after debt reduction · d9517841
      Tejun Heo authored
      Debt sets the initial delay duration which is decayed over time. The current
      debt reduction halved the debt but didn't change the delay. It prevented
      future debts from increasing delay but didn't do anything to lower the
      existing delay, limiting the mechanism's ability to reduce unnecessary
      idling.
      
      Reset iocg->delay to 0 after debt reduction so that iocg_kick_waitq()
      recalculates new delay value based on the reduced debt amount.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d9517841
    • Tejun Heo's avatar
      iocost: replace nr_shortages cond in ioc_forgive_debts() with busy_level one · 33a1fe6d
      Tejun Heo authored
      Debt reduction was blocked if any iocg was short on budget in the past
      period to avoid reducing debts while some iocgs are saturated. However, this
      ends up unnecessarily blocking debt reduction due to temporary local
      imbalances when the device is generally being underutilized, while also
      failing to block when the underlying device is overwhelmed and the usage
      becomes low from high latency.
      
      Given that debt accumulation mostly happens with swapout bursts which can
      significantly deteriorate the underlying device's latency response, the
      current logic is not great.
      
      Let's replace it with ioc->busy_level based condition so that we block debt
      reduction when the underlying device is being saturated. ioc_forgive_debts()
      call is moved after busy_level determination.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      33a1fe6d
    • Tejun Heo's avatar
      iocost: factor out ioc_forgive_debts() · ab8df828
      Tejun Heo authored
      Debt reduction logic is going to be improved and expanded. Factor it out
      into ioc_forgive_debts() and generalize the comment a bit. No functional
      change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ab8df828
    • Konstantin Khlebnikov's avatar
      dm: add support for REQ_NOWAIT and enable it for linear target · 6abc4946
      Konstantin Khlebnikov authored
      Add DM target feature flag DM_TARGET_NOWAIT which advertises that
      target works with REQ_NOWAIT bios.
      
      Add dm_table_supports_nowait() and update dm_table_set_restrictions()
      to set/clear QUEUE_FLAG_NOWAIT accordingly.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6abc4946