1. 12 Oct, 2017 1 commit
  2. 10 Oct, 2017 5 commits
  3. 09 Oct, 2017 3 commits
    • Paolo Valente's avatar
      block, bfq: fix unbalanced decrements of burst size · 99fead8d
      Paolo Valente authored
      The commit "block, bfq: decrease burst size when queues in burst
      exit" introduced the decrement of burst_size on the removal of a
      bfq_queue from the burst list. Unfortunately, this decrement can
      happen to be performed even when burst size is already equal to 0,
      because of unbalanced decrements. A description follows of the cause
      of these unbalanced decrements, namely a wrong assumption, and of the
      way how this wrong assumption leads to unbalanced decrements.
      
      The wrong assumption is that a bfq_queue can exit only if the process
      associated with the bfq_queue has exited. This is false, because a
      bfq_queue, say Q, may exit also as a consequence of a merge with
      another bfq_queue. In this case, Q exits because the I/O of its
      associated process has been redirected to another bfq_queue.
      
      The decrement unbalance occurs because Q may then be re-created after
      a split, and added back to the current burst list, *without*
      incrementing burst_size. burst_size is not incremented because Q is
      not a new bfq_queue added to the burst list, but a bfq_queue only
      temporarily removed from the list, and, before the commit "bfq-sq,
      bfq-mq: decrease burst size when queues in burst exit", burst_size was
      not decremented when Q was removed.
      
      This commit addresses this issue by just checking whether the exiting
      bfq_queue is a merged bfq_queue, and, in that case, not decrementing
      burst_size. Unfortunately, this still leaves room for unbalanced
      decrements, in the following rarer case: on a split, the bfq_queue
      happens to be inserted into a different burst list than that it was
      removed from when merged. If this happens, the number of elements in
      the new burst list becomes higher than burst_size (by one). When the
      bfq_queue then exits, it is of course not in a merged state any
      longer, thus burst_size is decremented, which results in an unbalanced
      decrement.  To handle this sporadic, unlucky case in a simple way,
      this commit also checks that burst_size is larger than 0 before
      decrementing it.
      
      Finally, this commit removes an useless, extra check: the check that
      the bfq_queue is sync, performed before checking whether the bfq_queue
      is in the burst list. This extra check is redundant, because only sync
      bfq_queues can be inserted into the burst list.
      
      Fixes: 7cb04004 ("block, bfq: decrease burst size when queues in burst exit")
      Reported-by: default avatarPhilip Müller <philm@manjaro.org>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarAngelo Ruocco <angeloruocco90@gmail.com>
      Tested-by: default avatarPhilip Müller <philm@manjaro.org>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      99fead8d
    • Luca Miccio's avatar
      block,bfq: Disable writeback throttling · b5dc5d4d
      Luca Miccio authored
      Similarly to CFQ, BFQ has its write-throttling heuristics, and it
      is better not to combine them with further write-throttling
      heuristics of a different nature.
      So this commit disables write-back throttling for a device if BFQ
      is used as I/O scheduler for that device.
      Signed-off-by: default avatarLuca Miccio <lucmiccio@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b5dc5d4d
    • Yafang Shao's avatar
      writeback: schedule periodic writeback with sysctl · 94af5846
      Yafang Shao authored
      After disable periodic writeback by writing 0 to
      dirty_writeback_centisecs, the handler wb_workfn() will not be
      entered again until the dirty background limit reaches or
      sync syscall is executed or no enough free memory available or
      vmscan is triggered.
      
      So the periodic writeback can't be enabled by writing a non-zero
      value to dirty_writeback_centisecs.
      As it can be disabled by sysctl, it should be able to enable by
      sysctl as well.
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      94af5846
  4. 06 Oct, 2017 3 commits
  5. 05 Oct, 2017 1 commit
  6. 04 Oct, 2017 4 commits
    • Jens Axboe's avatar
      sysctl: remove /proc/sys/vm/nr_pdflush_threads · b35bd0d9
      Jens Axboe authored
      This tunable has been obsolete since 2.6.32, and writes to the
      file have been failing and complaining in dmesg since then:
      
      nr_pdflush_threads exported in /proc is scheduled for removal
      
      That was 8 years ago. Remove the file ABI obsolete notice, and
      the sysfs file.
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b35bd0d9
    • Jens Axboe's avatar
      writeback: eliminate work item allocation in bd_start_writeback() · 85009b4f
      Jens Axboe authored
      Handle start-all writeback like we do periodic or kupdate
      style writeback - by marking the bdi_writeback as needing a full
      flush, and simply waking the thread. This eliminates the need to
      allocate and queue a specific work item just for this purpose.
      
      After this change, we truly only ever have one of them running at
      any point in time. We mark the need to start all flushes, and the
      writeback thread will clear it once it has processed the request.
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      85009b4f
    • Jens Axboe's avatar
      blk-mq: document the need to have STARTED and COMPLETED share a byte · fc13457f
      Jens Axboe authored
      For memory ordering guarantees on stores, we need to ensure that
      these two bits share the same byte of storage in the unsigned
      long. Add a comment as to why, and a BUILD_BUG_ON() to ensure that
      we don't violate this requirement.
      Suggested-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fc13457f
    • Peter Zijlstra's avatar
      blk-mq: attempt to fix atomic flag memory ordering · a7af0af3
      Peter Zijlstra authored
      Attempt to untangle the ordering in blk-mq. The patch introducing the
      single smp_mb__before_atomic() is obviously broken in that it doesn't
      clearly specify a pairing barrier and an obtained guarantee.
      
      The comment is further misleading in that it hints that the
      deadline store and the COMPLETE store also need to be ordered, but
      AFAICT there is no such dependency. However what does appear to be
      important is the clear happening _after_ the store, and that worked by
      pure accident.
      
      This clarifies blk_mq_start_request() -- we should not get there with
      STARTING set -- this simplifies the code and makes the barrier usage
      sane (the old code could be read to allow not having _any_ atomic after
      the barrier, in which case the barrier hasn't got anything to order). We
      then also introduce the missing pairing barrier for it.
      
      Also down-grade the barrier to smp_wmb(), this is cheaper for
      PowerPC/ARM and doesn't cost anything extra on x86.
      
      And it documents the STARTING vs COMPLETE ordering. Although I've not
      been entirely successful in reverse engineering the blk-mq state
      machine so there might still be more funnies around timeout vs
      requeue.
      
      If I got anything wrong, feel free to educate me by adding comments to
      clarify things ;-)
      
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Fixes: 538b7534 ("blk-mq: request deadline must be visible before marking rq as started")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a7af0af3
  7. 03 Oct, 2017 17 commits
  8. 01 Oct, 2017 1 commit
  9. 30 Sep, 2017 2 commits
  10. 26 Sep, 2017 3 commits
    • Shaohua Li's avatar
      block: fix a build error · 0b508bc9
      Shaohua Li authored
      The code is only for blkcg not for all cgroups
      
      Fixes: d4478e92 ("block/loop: make loop cgroup aware")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b508bc9
    • Corentin Labbe's avatar
      block: cryptoloop - Fix build warning · 9979d545
      Corentin Labbe authored
      This patch fix the following build warning:
      drivers/block/cryptoloop.c:46:8: warning: variable 'cipher' set but not used [-Wunused-but-set-variable]
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9979d545
    • Shaohua Li's avatar
      block/loop: make loop cgroup aware · d4478e92
      Shaohua Li authored
      loop block device handles IO in a separate thread. The actual IO
      dispatched isn't cloned from the IO loop device received, so the
      dispatched IO loses the cgroup context.
      
      I'm ignoring buffer IO case now, which is quite complicated.  Making the
      loop thread aware cgroup context doesn't really help. The loop device
      only writes to a single file. In current writeback cgroup
      implementation, the file can only belong to one cgroup.
      
      For direct IO case, we could workaround the issue in theory. For
      example, say we assign cgroup1 5M/s BW for loop device and cgroup2
      10M/s. We can create a special cgroup for loop thread and assign at
      least 15M/s for the underlayer disk. In this way, we correctly throttle
      the two cgroups. But this is tricky to setup.
      
      This patch tries to address the issue. We record bio's css in loop
      command. When loop thread is handling the command, we then use the API
      provided in patch 1 to set the css for current task. The bio layer will
      use the css for new IO (from patch 3).
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4478e92