1. 09 May, 2020 19 commits
  2. 07 May, 2020 2 commits
  3. 05 May, 2020 1 commit
    • Tejun Heo's avatar
      iocost: protect iocg->abs_vdebt with iocg->waitq.lock · 0b80f986
      Tejun Heo authored
      abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
      is and controls the activation of use_delay mechanism. Once a cgroup goes
      over budget from forced IOs, it has to pay it back with its future budget.
      The progress guarantee on debt paying comes from the iocg being active -
      active iocgs are processed by the periodic timer, which ensures that as time
      passes the debts dissipate and the iocg returns to normal operation.
      
      However, both iocg activation and vdebt handling are asynchronous and a
      sequence like the following may happen.
      
      1. The iocg is in the process of being deactivated by the periodic timer.
      
      2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
         without anything because it still sees that the iocg is already active.
      
      3. The iocg is deactivated.
      
      4. The bio from #2 is over budget but needs to be forced. It increases
         abs_vdebt and goes over the threshold and enables use_delay.
      
      5. IO control is enabled for the iocg's subtree and now IOs are attributed
         to the descendant cgroups and the iocg itself no longer issues IOs.
      
      This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
      further IOs which can activate it. This can end up unduly punishing all the
      descendants cgroups.
      
      The usual throttling path has the same issue - the iocg must be active while
      throttled to ensure that future event will wake it up - and solves the
      problem by synchronizing the throttling path with a spinlock. abs_vdebt
      handling is another form of overage handling and shares a lot of
      characteristics including the fact that it isn't in the hottest path.
      
      This patch fixes the above and other possible races by strictly
      synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarVlad Dmitriev <vvd@fb.com>
      Cc: stable@vger.kernel.org # v5.4+
      Fixes: e1518f63 ("blk-iocost: Don't let merges push vtime into the future")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b80f986
  4. 04 May, 2020 7 commits
  5. 30 Apr, 2020 6 commits
    • Tejun Heo's avatar
      iocost_monitor: drop string wrap around numbers when outputting json · 21f3cfea
      Tejun Heo authored
      Wrapping numbers in strings is used by some to work around bit-width issues in
      some enviroments. The problem isn't innate to json and the workaround seems to
      cause more integration problems than help. Let's drop the string wrapping.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      21f3cfea
    • Tejun Heo's avatar
      iocost_monitor: exit successfully if interval is zero · f4fe3ea6
      Tejun Heo authored
      This is to help external tools to decide whether iocost_monitor has all its
      requirements met or not based on the exit status of an -i0 run.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f4fe3ea6
    • Tejun Heo's avatar
      blk-iocost: account for IO size when testing latencies · cd006509
      Tejun Heo authored
      On each IO completion, iocost decides whether the IO met or missed its latency
      target. Currently, the targets are fixed numbers per IO type. While this can be
      good enough for loose latency targets way higher than typical completion
      latencies, the effect of IO size makes it difficult to tighten the latency
      target - a target adequate for 4k IOs might be too tight for 512k IOs and
      vice-versa.
      
      iocost already has all the necessary information to account for different IO
      sizes when testing whether the latency target is met as iocost can calculate the
      size vtime cost of a given IO. This patch updates the completion path to
      calculate the size vtime cost of the IO, deduct the nsec equivalent from the
      observed latency and use the adjusted value to decide whether the target is met.
      
      This makes latency targets independent from IO size and enables determining
      adequate latency targets with fixed size fio runs.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Andy Newell <newella@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd006509
    • Tejun Heo's avatar
      blk-iocost: switch to fixed non-auto-decaying use_delay · 54c52e10
      Tejun Heo authored
      The use_delay mechanism was introduced by blk-iolatency to hold memory
      allocators accountable for the reclaim and other shared IOs they cause. The
      duration of the delay is dynamically balanced between iolatency increasing the
      value on each target miss and it auto-decaying as time passes and threads get
      delayed on it.
      
      While this works well for iolatency, iocost's control model isn't compatible
      with it. There is no repeated "violation" events which can be balanced against
      auto-decaying. iocost instead knows how much a given cgroup is over budget and
      wants to prevent that cgroup from issuing IOs while over budget. Until now,
      iocost has been adding the cost of force-issued IOs. However, this doesn't
      reflect the amount which is already over budget and is simply not enough to
      counter the auto-decaying allowing anon-memory leaking low priority cgroup to
      go over its alloted share of IOs.
      
      As auto-decaying doesn't make much sense for iocost, this patch introduces a
      different mode of operation for use_delay - when blkcg_set_delay() are used
      insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
      is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
      delay duration synchronized to the budget overage amount.
      
      With this change, iocost can effectively police cgroups which generate
      significant amount of force-issued IOs.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      54c52e10
    • Christoph Hellwig's avatar
      block: remove the bd_openers checks in blk_drop_partitions · 10c70d95
      Christoph Hellwig authored
      When replacing the bd_super check with a bd_openers I followed a logical
      conclusion, which turns out to be utterly wrong.  When a block device has
      bd_super sets it has a mount file system on it (although not every
      mounted file system sets bd_super), but that also implies it doesn't even
      have partitions to start with.
      
      So instead of trying to come up with a logical check for all openers,
      just remove the check entirely.
      
      Fixes: d3ef5536 ("block: fix busy device checking in blk_drop_partitions")
      Fixes: cb6b771b ("block: fix busy device checking in blk_drop_partitions again")
      Reported-by: default avatarMichal Koutný <mkoutny@suse.com>
      Reported-by: default avatarYang Xu <xuyang2018.jy@cn.fujitsu.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      10c70d95
    • Jens Axboe's avatar
      Merge branch 'nvme-5.7' of git://git.infradead.org/nvme into block-5.7 · 47ed39e0
      Jens Axboe authored
      Pull NVMe fix from Christoph.
      
      * 'nvme-5.7' of git://git.infradead.org/nvme:
        nvme: prevent double free in nvme_alloc_ns() error handling
      47ed39e0
  6. 29 Apr, 2020 5 commits