1. 27 Jul, 2018 5 commits
    • Tang Junhui's avatar
      bcache: fix I/O significant decline while backend devices registering · 94f71c16
      Tang Junhui authored
      I attached several backend devices in the same cache set, and produced lots
      of dirty data by running small rand I/O writes in a long time, then I
      continue run I/O in the others cached devices, and stopped a cached device,
      after a mean while, I register the stopped device again, I see the running
      I/O in the others cached devices dropped significantly, sometimes even
      jumps to zero.
      
      In currently code, bcache would traverse each keys and btree node to count
      the dirty data under read locker, and the writes threads can not get the
      btree write locker, and when there is a lot of keys and btree node in the
      registering device, it would last several seconds, so the write I/Os in
      others cached device are blocked and declined significantly.
      
      In this patch, when a device registering to a ache set, which exist others
      cached devices with running I/Os, we get the amount of dirty data of the
      device in an incremental way, and do not block other cached devices all the
      time.
      
      Patch v2: Rename some variables and macros name as Coly suggested.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      94f71c16
    • Tang Junhui's avatar
      bcache: calculate the number of incremental GC nodes according to the total of btree nodes · 7f4a59de
      Tang Junhui authored
      This patch base on "[PATCH] bcache: finish incremental GC".
      
      Since incremental GC would stop 100ms when front side I/O comes, so when
      there are many btree nodes, if GC only processes constant (100) nodes each
      time, GC would last a long time, and the front I/Os would run out of the
      buckets (since no new bucket can be allocated during GC), and I/Os be
      blocked again.
      
      So GC should not process constant nodes, but varied nodes according to the
      number of btree nodes. In this patch, GC is divided into constant (100)
      times, so when there are many btree nodes, GC can process more nodes each
      time, otherwise GC will process less nodes each time (but no less than
      MIN_GC_NODES).
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7f4a59de
    • Tang Junhui's avatar
      bcache: finish incremental GC · 5c25c4fc
      Tang Junhui authored
      In GC thread, we record the latest GC key in gc_done, which is expected
      to be used for incremental GC, but in currently code, we didn't realize
      it. When GC runs, front side IO would be blocked until the GC over, it
      would be a long time if there is a lot of btree nodes.
      
      This patch realizes incremental GC, the main ideal is that, when there
      are front side I/Os, after GC some nodes (100), we stop GC, release locker
      of the btree node, and go to process the front side I/Os for some times
      (100 ms), then go back to GC again.
      
      By this patch, when we doing GC, I/Os are not blocked all the time, and
      there is no obvious I/Os zero jump problem any more.
      
      Patch v2: Rename some variables and macros name as Coly suggested.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5c25c4fc
    • Tang Junhui's avatar
      bcache: simplify the calculation of the total amount of flash dirty data · 99a27d59
      Tang Junhui authored
      Currently we calculate the total amount of flash only devices dirty data
      by adding the dirty data of each flash only device under registering
      locker. It is very inefficient.
      
      In this patch, we add a member flash_dev_dirty_sectors in struct cache_set
      to record the total amount of flash only devices dirty data in real time,
      so we didn't need to calculate the total amount of dirty data any more.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      99a27d59
    • Markus Stockhausen's avatar
      readahead: stricter check for bdi io_pages · dc30b96a
      Markus Stockhausen authored
      ondemand_readahead() checks bdi->io_pages to cap the maximum pages
      that need to be processed. This works until the readit section. If
      we would do an async only readahead (async size = sync size) and
      target is at beginning of window we expand the pages by another
      get_next_ra_size() pages. Btrace for large reads shows that kernel
      always issues a doubled size read at the beginning of processing.
      Add an additional check for io_pages in the lower part of the func.
      The fix helps devices that hard limit bio pages and rely on proper
      handling of max_hw_read_sectors (e.g. older FusionIO cards). For
      that reason it could qualify for stable.
      
      Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
      Cc: stable@vger.kernel.org
      Signed-off-by: Markus Stockhausen stockhausen@collogia.de
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dc30b96a
  2. 26 Jul, 2018 2 commits
  3. 25 Jul, 2018 2 commits
    • Juergen Gross's avatar
      xen/blkfront: remove unused macros · d3df0ac0
      Juergen Gross authored
      Remove some macros not used anywhere.
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d3df0ac0
    • Jens Axboe's avatar
      Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block · eca53cb6
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "Highlights:
      
       - massively improved tracepoints (Keith Busch)
       - support for larger inline data in the RDMA host and target
         (Steve Wise)
       - RDMA setup/teardown path fixes and refactor (Sagi Grimberg)
       - Command Supported and Effects log support for the NVMe target
         (Chaitanya Kulkarni)
       - buffered I/O support for the NVMe target (Chaitanya Kulkarni)
      
       plus the usual set of cleanups and small enhancements."
      
      * 'nvme-4.19' of git://git.infradead.org/nvme:
        nvmet: don't use uuid_le type
        nvmet: check fileio lba range access boundaries
        nvmet: fix file discard return status
        nvme-rdma: centralize admin/io queue teardown sequence
        nvme-rdma: centralize controller setup sequence
        nvme-rdma: unquiesce queues when deleting the controller
        nvme-rdma: mark expected switch fall-through
        nvme: add disk name to trace events
        nvme: add controller name to trace events
        nvme: use hw qid in trace events
        nvme: cache struct nvme_ctrl reference to struct nvme_request
        nvmet-rdma: add an error flow for post_recv failures
        nvmet-rdma: add unlikely check in the fast path
        nvmet-rdma: support max(16KB, PAGE_SIZE) inline data
        nvme-rdma: support up to 4 segments of inline data
        nvmet: add buffered I/O support for file backed ns
        nvmet: add commands supported and effects log page
        nvme: move init of keep_alive work item to controller initialization
        nvme.h: resync with nvme-cli
      eca53cb6
  4. 24 Jul, 2018 18 commits
  5. 23 Jul, 2018 10 commits
  6. 22 Jul, 2018 2 commits
    • Ming Lei's avatar
      blk-mq: fail the request in case issue failure · 8824f622
      Ming Lei authored
      Inside blk_mq_try_issue_list_directly(), if the request is issued as
      failed, we shouldn't try to do it again, otherwise the warning in
      blk_mq_start_request() will be triggered. This change is aligned to
      behaviour of other ways of request issue & dispatch.
      
      Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: LKP <lkp@01.org>
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8824f622
    • Josef Bacik's avatar
      blk-rq-qos: make depth comparisons unsigned · 22f17952
      Josef Bacik authored
      With the change to use UINT_MAX I broke the depth check as any value of
      inflight (ie 0) would be less than (int)UINT_MAX.  Fix this by changing
      everything to unsigned int to match the depth.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      22f17952
  7. 18 Jul, 2018 1 commit