1. 06 Oct, 2014 7 commits
    • Joe Thornber's avatar
      dm bufio: update last_accessed when relinking a buffer · eb76faf5
      Joe Thornber authored
      The 'last_accessed' member of the dm_buffer structure was only set when
      the the buffer was created.  This led to each buffer being discarded
      after dm_bufio_max_age time even if it was used recently.  In practice
      this resulted in all thinp metadata being evicted soon after being read
      -- this is particularly problematic for metadata intensive workloads
      like multithreaded small random IO.
      
      'last_accessed' is now updated each time the buffer is moved to the head
      of the LRU list, so the buffer is now properly discarded if it was not
      used in dm_bufio_max_age time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.2+
      eb76faf5
    • Heinz Mauelshagen's avatar
      dm raid: add discard support for RAID levels 4, 5 and 6 · 48cf06bc
      Heinz Mauelshagen authored
      In case of RAID levels 4, 5 and 6 we have to verify each RAID members'
      ability to zero data on discards to avoid stripe data corruption -- if
      discard_zeroes_data is not set for each RAID member discard support must
      be disabled.  But given the uncertainty of whether or not a RAID member
      properly supports zeroing data on discard we require the user to
      explicitly allow discard support on RAID levels 4, 5, and 6 by setting
      a dm-raid module paramter, e.g.: dm-raid.devices_handle_discard_safely=Y
      Otherwise, discards could cause data corruption on RAID4/5/6.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      48cf06bc
    • Heinz Mauelshagen's avatar
      dm raid: add discard support for RAID levels 1 and 10 · 75b8e04b
      Heinz Mauelshagen authored
      Discard support is not enabled for RAID levels 4, 5, and 6 at this time
      due to concerns about unreliable discard_zeroes_data support on some
      hardware.  Otherwise, discards could cause stripe data corruption
      (classic example of bad apples spoiling the bunch).
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      75b8e04b
    • Benjamin Marzinski's avatar
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski authored
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      86f1152b
    • Benjamin Marzinski's avatar
      dm mpath: stop queueing IO when no valid paths exist · 1f271972
      Benjamin Marzinski authored
      'queue_io' is set so that IO is queued while paths are being
      initialized.  Clear queue_io in __choose_pgpath if there are no valid
      paths, since there are obviously no paths that can be initialized.
      Otherwise IOs to the device will back up.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      1f271972
    • Junichi Nomura's avatar
      dm: use bioset_create_nobvec() · 3d8aab2d
      Junichi Nomura authored
      Since DM core uses bio_clone_fast() for both bio-based and request-based
      DM devices there is no need for DM's bioset to have a bvec mempool.
      
      With this patch, on arch with 4KB page for example, memory usage will be
      reduced by 64KB for each bio-based DM device and 1MB for each
      request-based DM device.
      
      For example, when you create 10,000 bio-based DM devices and 1,000
      request-based DM devices, memory usage of biovec under no load is:
        # grep biovec /proc/slabinfo
      
        biovec-256        418068 418068   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      With this patch series applied, the usage becomes:
        # grep biovec /proc/slabinfo
      
        biovec-256           116    116   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example.
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      3d8aab2d
    • Junichi Nomura's avatar
      dm: remove nr_iovecs parameter from alloc_tio() · 99778273
      Junichi Nomura authored
      alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio.
      alloc_tio() takes the number of bvecs to allocate for the clone-bio.
      However, with v3.14's immutable biovec changes DM now uses
      __bio_clone_fast() and no longer needs to allocate bvecs.
      
      In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively
      0.  __clone_and_map_simple_bio() looked like it was passing non-zero
      nr_iovecs, but its value was always within the range of inline bvecs and
      no allocation actually happened.  If allocation happened, the BUG_ON() in
      __bio_clone_fast() would've triggered.
      
      Remove the nr_iovecs parameter from alloc_tio() to prevent possible
      future bio_alloc_bioset() mis-use of a new bioset interface that will no
      longer allow bvecs to be allocated.
      
      Also fix extra whitespace before the __bio_clone_fast() call in
      __clone_and_map_simple_bio().
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      99778273
  2. 03 Oct, 2014 2 commits
  3. 01 Oct, 2014 1 commit
  4. 30 Sep, 2014 1 commit
  5. 27 Sep, 2014 14 commits
  6. 25 Sep, 2014 10 commits
  7. 22 Sep, 2014 5 commits
    • Christoph Hellwig's avatar
      scsi: move blk_mq_start_request call earlier · fe052529
      Christoph Hellwig authored
      Some ATA drivers need the dma drain size workaround, and thus need to
      call blk_mq_start_request before the S/G mapping.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fe052529
    • Christoph Hellwig's avatar
      block: fix blk_abort_request on blk-mq · 90415837
      Christoph Hellwig authored
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      
      Moved blk_mq_rq_timed_out() definition to the private blk-mq.h header.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      90415837
    • Ming Lei's avatar
      blk-timeout: fix blk_add_timer · 5e940aaa
      Ming Lei authored
      Commit 8cb34819cdd5d(blk-mq: unshared timeout handler) introduces
      blk-mq's own timeout handler, and removes following line:
      
      	blk_queue_rq_timed_out(q, blk_mq_rq_timed_out);
      
      which then causes blk_add_timer() to bypass adding the timer,
      since blk-mq no longer has q->rq_timed_out_fn defined.
      
      This patch fixes the problem by bypassing the check for blk-mq,
      so that both request deadlines are still set and the rolling
      timer updated.
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5e940aaa
    • Jens Axboe's avatar
      blk-mq: limit memory consumption if a crash dump is active · aedcd72f
      Jens Axboe authored
      It's not uncommon for crash dump kernels to be limited to 128MB or
      something low in that area. This is normally not a problem for
      devices as we don't use that much memory, but for some shared SCSI
      setups with huge queue depths, it can potentially fill most of
      memory with tons of request allocations. blk-mq does scale back
      when it fails to allocate memory, but it scales back just enough
      so that blk-mq succeeds. This could still leave the system with
      not enough memory to make any real progress.
      
      Check if we are in a kdump environment and limit the hardware
      queues and tag depth.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      aedcd72f
    • Ming Lei's avatar
      blk-mq: remove unnecessary blk_clear_rq_complete() · 2edd2c74
      Ming Lei authored
      This patch removes two unnecessary blk_clear_rq_complete(),
      the REQ_ATOM_COMPLETE flag is cleared inside blk_mq_start_request(),
      so:
      
      	- The blk_clear_rq_complete() in blk_flush_restore_request()
      	needn't because the request will be freed later, and clearing
      	it here may open a small race window with timeout.
      
      	- The blk_clear_rq_complete() in blk_mq_requeue_request() isn't
      	necessary too, even though REQ_ATOM_STARTED is cleared in
      	__blk_mq_requeue_request(), in theory it still may cause a small
      	race window with timeout since the two clear_bit() may be
      	reordered.
      Signed-off-by: default avatarMing Lei <ming.lei@canoical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2edd2c74