1. 24 Jul, 2018 5 commits
  2. 23 Jul, 2018 10 commits
  3. 22 Jul, 2018 2 commits
    • Ming Lei's avatar
      blk-mq: fail the request in case issue failure · 8824f622
      Ming Lei authored
      Inside blk_mq_try_issue_list_directly(), if the request is issued as
      failed, we shouldn't try to do it again, otherwise the warning in
      blk_mq_start_request() will be triggered. This change is aligned to
      behaviour of other ways of request issue & dispatch.
      
      Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: LKP <lkp@01.org>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8824f622
    • Josef Bacik's avatar
      blk-rq-qos: make depth comparisons unsigned · 22f17952
      Josef Bacik authored
      With the change to use UINT_MAX I broke the depth check as any value of
      inflight (ie 0) would be less than (int)UINT_MAX.  Fix this by changing
      everything to unsigned int to match the depth.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      22f17952
  4. 18 Jul, 2018 6 commits
    • Tejun Heo's avatar
      blkcg: Track DISCARD statistics and output them in cgroup io.stat · 636620b6
      Tejun Heo authored
      Add tracking of REQ_OP_DISCARD ios to the per-cgroup io.stat.  Two
      fields, dbytes and dios, to respectively count the total bytes and
      number of discards are added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Andy Newell <newella@fb.com>
      Cc: Michael Callahan <michaelcallahan@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      636620b6
    • Michael Callahan's avatar
      block: Track DISCARD statistics and output them in stat and diskstat · bdca3c87
      Michael Callahan authored
      Add tracking of REQ_OP_DISCARD ios to the partition statistics and
      append them to the various stat files in /sys as well as
      /proc/diskstats.  These are tracked with the same four stats as reads
      and writes:
      
      Number of discard ios completed.
      Number of discard ios merged
      Number of discard sectors completed
      Milliseconds spent on discard requests
      
      This is done via adding a new STAT_DISCARD define to genhd.h and then
      using it to index that stat field for discard requests.
      
      tj: Refreshed on top of v4.17 and other previous updates.
      Signed-off-by: default avatarMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Andy Newell <newella@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bdca3c87
    • Michael Callahan's avatar
      block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3
      Michael Callahan authored
      Add and use a new op_stat_group() function for indexing partition stat
      fields rather than indexing them by rq_data_dir() or bio_data_dir().
      This function works similarly to op_is_sync() in that it takes the
      request::cmd_flags or bio::bi_opf flags and determines which stats
      should et updated.
      
      In addition, the second parameter to generic_start_io_acct() and
      generic_end_io_acct() is now a REQ_OP rather than simply a read or
      write bit and it uses op_stat_group() on the parameter to determine
      the stat group.
      
      Note that the partition in_flight counts are not part of the per-cpu
      statistics and as such are not indexed via this function.  It's now
      indexed by op_is_write().
      
      tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
      Signed-off-by: default avatarMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ddcf35d3
    • Michael Callahan's avatar
      block: Define and use STAT_READ and STAT_WRITE · dbae2c55
      Michael Callahan authored
      Add defines for STAT_READ and STAT_WRITE for indexing the partition
      stat entries. This clarifies some fs/ code which has hardcoded 1 for
      STAT_WRITE and will make it easier to extend the stats with additional
      fields.
      
      tj: Refreshed on top of v4.17.
      Signed-off-by: default avatarMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dbae2c55
    • Michael Callahan's avatar
      block: Add part_stat_read_accum to read across field entries. · 59767fbd
      Michael Callahan authored
      Add a part_stat_read_accum macro to genhd.h to read and sum across
      field entries.  For example to sum up the number read and write
      sectors completed.  In addition to being ar reasonable cleanup by
      itself this will make it easier to add new stat fields in the future.
      
      tj: Refreshed on top of v4.17.
      Signed-off-by: default avatarMichael Callahan <michaelcallahan@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59767fbd
    • Tejun Heo's avatar
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo authored
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @op with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3f289dcb
  5. 17 Jul, 2018 2 commits
  6. 16 Jul, 2018 2 commits
    • Josef Bacik's avatar
      blk-iolatency: truncate our current time · 71e9690b
      Josef Bacik authored
      In our longer tests we noticed that some boxes would degrade to the
      point of uselessness.  This is because we truncate the current time when
      saving it in our bio, but I was using the raw current time to subtract
      from.  So once the box had been up a certain amount of time it would
      appear as if our IO's were taking several years to complete.  Fix this
      by truncating the current time so it matches the issue time.  Verified
      this worked by running with this patch for a week on our test tier.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      71e9690b
    • Josef Bacik's avatar
      blk-iolatency: don't change the latency window · d607eefa
      Josef Bacik authored
      Early versions of these patches had us waiting for seconds at a time
      during submission, so we had to adjust the timing window we monitored
      for latency.  Now we don't do things like that so this is unnecessary
      code.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d607eefa
  7. 13 Jul, 2018 11 commits
  8. 12 Jul, 2018 2 commits
    • Helge Deller's avatar
      block: skd: Use %pad printk format for dma_addr_t values · ea870bb2
      Helge Deller authored
      Use the existing %pad printk format to print dma_addr_t values.
      This avoids the following warnings when compiling on the parisc64 platform:
      
      drivers/block/skd_main.c: In function 'skd_preop_sg_list':
      drivers/block/skd_main.c:660:4: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 6 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ea870bb2
    • Christoph Hellwig's avatar
      bsg: remove read/write support · 28519c89
      Christoph Hellwig authored
      The code poses a security risk due to user memory access in ->release
      and had an API that can't be used reliably.  As far as we know it was
      never used for real, but if that turns out wrong we'll have to revert
      this commit and come up with a band aid.
      
      Jann Horn did look software archives for users of this interface,
      and the only users found were example code in sg3_utils, and optional
      support in an optional module of the tgt user space iscsi target,
      which looks like a proof of concept extension of the /dev/sg
      read/write support.
      
      Tony Battersby chimes in that the code is basically unsafe to use in
      general:
      
        The read/write interface on /dev/bsg is impossible to use safely
        because the list of completed commands is per-device (bd->done_list)
        rather than per-fd like it is with /dev/sg.  So if program A and
        program B are both using the write/read interface on the same bsg
        device, then their command responses will get mixed up, and program
        A will read() some command results from program B and vice versa.
        So no, I don't use read/write on /dev/bsg.  From a security standpoint,
        it should definitely be fixed or removed.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      28519c89