1. 13 Jan, 2011 40 commits
    • NeilBrown's avatar
      dm: raid456 basic support · 9d09e663
      NeilBrown authored
      This patch is the skeleton for the DM target that will be
      the bridge from DM to MD (initially RAID456 and later RAID1).  It
      provides a way to use device-mapper interfaces to the MD RAID456
      drivers.
      
      As with all device-mapper targets, the nominal public interfaces are the
      constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
      and STATUSTYPE_TABLE).  The CTR table looks like the following:
      
      1: <s> <l> raid \
      2:	<raid_type> <#raid_params> <raid_params> \
      3:	<#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
      
      Line 1 contains the standard first three arguments to any device-mapper
      target - the start, length, and target type fields.  The target type in
      this case is "raid".
      
      Line 2 contains the arguments that define the particular raid
      type/personality/level, the required arguments for that raid type, and
      any optional arguments.  Possible raid types include: raid4, raid5_la,
      raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc.  (again, raid1 is
      planned for the future.)  The list of required and optional parameters
      is the same for all the current raid types.  The required parameters are
      positional, while the optional parameters are given as key/value pairs.
      The possible parameters are as follows:
       <chunk_size>		Chunk size in sectors.
       [[no]sync]		Force/Prevent RAID initialization
       [rebuild <idx>]	Rebuild the drive indicated by the index
       [daemon_sleep <ms>]	Time between bitmap daemon work to clear bits
       [min_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_write_behind <value>]		See '-write-behind=' (man mdadm)
       [stripe_cache <sectors>]		Stripe cache size for higher RAIDs
      
      Line 3 contains the list of devices that compose the array in
      metadata/data device pairs.  If the metadata is stored separately, a '-'
      is given for the metadata device position.  If a drive has failed or is
      missing at creation time, a '-' can be given for both the metadata and
      data drives for a given position.
      
      Examples:
      # RAID4 - 4 data drives, 1 parity
      # No metadata devices specified to hold superblock/bitmap info
      # Chunk size of 1MiB
      # (Lines separated for easy reading)
      0 1960893648 raid \
      	raid4 1 2048 \
      	5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      # RAID4 - 4 data drives, 1 parity (no metadata devices)
      # Chunk size of 1MiB, force RAID initialization,
      #	min recovery rate at 20 kiB/sec/disk
      0 1960893648 raid \
              raid4 4 2048 min_recovery_rate 20 sync\
              5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      Performing a 'dmsetup table' should display the CTR table used to
      construct the mapping (with possible reordering of optional
      parameters).
      
      Performing a 'dmsetup status' will yield information on the state and
      health of the array.  The output is as follows:
      1: <s> <l> raid \
      2:	<raid_type> <#devices> <1 health char for each dev> <resync_ratio>
      
      Line 1 is standard DM output.  Line 2 is best shown by example:
      	0 1960893648 raid raid4 5 AAAAA 2/490221568
      Here we can see the RAID type is raid4, there are 5 devices - all of
      which are 'A'live, and the array is 2/490221568 complete with recovery.
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      9d09e663
    • NeilBrown's avatar
      dm: per target unplug callback support · 99d03c14
      NeilBrown authored
      Add per-target unplug callback support.
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      99d03c14
    • NeilBrown's avatar
      dm: introduce target callbacks and congestion callback · 9d357b07
      NeilBrown authored
      DM currently implements congestion checking by checking on congestion
      in each component device.  For raid456 we need to also check if the
      stripe cache is congested.
      
      Add per-target congestion checker callback support.
      
      Extending the target_callbacks structure with additional callback
      functions allows for establishing multiple callbacks per-target (a
      callback is also needed for unplug).
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      9d357b07
    • Chandra Seetharaman's avatar
      dm mpath: delay activate_path retry on SCSI_DH_RETRY · 4e2d19e4
      Chandra Seetharaman authored
      This patch adds a user-configurable 'pg_init_delay_msecs' feature.  Use
      this feature to specify the number of milliseconds to delay before
      retrying scsi_dh_activate, when SCSI_DH_RETRY is returned.
      
      SCSI Device Handlers return SCSI_DH_IMM_RETRY if we could retry
      activation immediately and SCSI_DH_RETRY in cases where it is better to
      retry after some delay.
      
      Currently we immediately retry scsi_dh_activate irrespective of
      SCSI_DH_IMM_RETRY and SCSI_DH_RETRY.
      
      The 'pg_init_delay_msecs' feature may be provided during table create or
      load, e.g.:
          dmsetup create --table "0 20971520 multipath 3 queue_if_no_path \
      	pg_init_delay_msecs 2500 ..." mpatha
      
      The default for 'pg_init_delay_msecs' is 2000 milliseconds.
      Maximum configurable delay is 60000 milliseconds.  Specifying a
      'pg_init_delay_msecs' of 0 will cause immediate retry.
      Signed-off-by: default avatarNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: default avatarChandra Seetharaman <sekharan@us.ibm.com>
      Acked-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4e2d19e4
    • Kiyoshi Ueda's avatar
      dm: remove superfluous irq disablement in dm_request_fn · 052189a2
      Kiyoshi Ueda authored
      This patch changes spin_lock_irq() to spin_lock() in dm_request_fn().
      This patch is just a clean-up and no functional change.
      
      The spin_lock_irq() was leftover from the early request-based dm code,
      where map_request() used to enable interrupts.
      Since current map_request() never enables interrupts, we can change it
      to spin_lock() to match the prior spin_unlock().
      
      Auditing through the dm and block-layer code called from
      map_request(), I confirmed all functions save/restore interrupt
      status, so no function returning with interrupts enabled.
      Also I haven't observed any problem on my test environment which
      uses scsi and lpfc driver after heavy I/O testing with occasional
      path down/up.
      
      Added BUG_ON() to detect breakage in future.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      052189a2
    • Dan Carpenter's avatar
      dm log: use PTR_ERR value instead of ENOMEM · dbc883f1
      Dan Carpenter authored
      It's nicer to return the PTR_ERR() value instead of just returning
      -ENOMEM.  In the current code the PTR_ERR() value is always equal to
      -ENOMEM so this doesn't actually affect anything, but still...
      
      In addition, dm_dirty_log_create() doesn't check for a specific -ENOMEM
      return.  So this change is safe relative to potential for a non -ENOMEM
      return in the future.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Acked-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      dbc883f1
    • Mike Snitzer's avatar
      dm snapshot: avoid storing private suspended state · b83b2f29
      Mike Snitzer authored
      Use dm_suspended() rather than having each snapshot target maintain a
      private 'suspended' flag in struct dm_snapshot.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      b83b2f29
    • Tejun Heo's avatar
      dm snapshot: persistent make metadata_wq multithreaded · 239c8dd5
      Tejun Heo authored
      metadata_wq serves on-stack work items from chunk_io().  Even if
      multiple chunk_io() are simultaneously in progress, each is
      independent and queued only once, so multithreaded workqueue can be
      safely used.
      
      Switch metadata_wq to multithread and flush the work item instead of
      the workqueue in chunk_io().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      239c8dd5
    • Tejun Heo's avatar
      dm: use non reentrant workqueues if equivalent · 9c4376de
      Tejun Heo authored
      kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
      serve only a single work item from the dm instance, so non-reentrant
      workqueues would provide the same ordering guarantees as ordered ones
      while allowing CPU affinity and use of the workqueues for other
      purposes.  Switch them to non-reentrant workqueues.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      9c4376de
    • Tejun Heo's avatar
      dm: convert workqueues to alloc_ordered · 4d4d66ab
      Tejun Heo authored
      Convert all create[_singlethread]_work() users to the new
      alloc[_ordered]_workqueue().  This conversion is mechanical and
      doesn't introduce any behavior change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4d4d66ab
    • Tejun Heo's avatar
      dm stripe: switch from local workqueue to system_wq · f521f074
      Tejun Heo authored
      kstriped only serves sc->kstriped_ws which runs dm_table_event().
      This doesn't need to be executed from an ordered workqueue w/ rescuer.
      Drop kstriped and use the system_wq instead.  While at it, rename
      kstriped_ws to trigger_event so that it's consistent with other dm
      modules.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      f521f074
    • Tejun Heo's avatar
      dm: dont use flush_scheduled_work · d5ffa387
      Tejun Heo authored
      flush_scheduled_work() is being deprecated.  Flush the used work
      directly instead.  In all dm targets, the only work which uses
      system_wq is ->trigger_event.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      d5ffa387
    • Tejun Heo's avatar
      dm snapshot: remove unused dm_snapshot queued_bios_work · fecec20e
      Tejun Heo authored
      dm_snapshot->queued_bios_work isn't used.  Remove ->queued_bios[_work]
      from dm_snapshot structure, the flush_queued_bios work function and
      ksnapd workqueue.
      
      The DM snapshot changes that were going to use the ksnapd workqueue were
      either superseded (fix for origin write races) or never completed
      (deallocation of invalid snapshot's memory via workqueue).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      fecec20e
    • Milan Broz's avatar
      dm ioctl: suppress needless warning messages · 810b4923
      Milan Broz authored
      The device-mapper should not send warning messages to syslog
      if a device is not found. This can be done by userspace
      according to the returned dm-ioctl error code.
      
      So move these messages to debug level and use rate limiting
      to not flood syslog.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      810b4923
    • Milan Broz's avatar
      dm crypt: add loop aes iv generator · 34745785
      Milan Broz authored
      This patch adds a compatible implementation of the block
      chaining mode used by the Loop-AES block device encryption
      system (http://loop-aes.sourceforge.net/) designed
      by Jari Ruusu.
      
      It operates on full 512 byte sectors and uses CBC
      with an IV derived from the sector number, the data and
      optionally extra IV seed.
      
      This means that after CBC decryption the first block of sector
      must be tweaked according to decrypted data.
      
      Loop-AES can use three encryption schemes:
       version 1: is plain aes-cbc mode (already compatible)
       version 2: uses 64 multikey scheme with own IV generator
       version 3: the same as version 2 with additional IV seed
                  (it uses 65 keys, last key is used as IV seed)
      
      The IV generator is here named lmk (Loop-AES multikey)
      and for the cipher specification looks like: aes:64-cbc-lmk
      
      Version 2 and 3 is recognised according to length
      of provided multi-key string (which is just hexa encoded
      "raw key" used in original Loop-AES ioctl).
      
      Configuration of the device and decoding key string will
      be done in userspace (cryptsetup).
      (Loop-AES stores keys in gpg encrypted file, raw keys are
      output of simple hashing of lines in this file).
      
      Based on an implementation by Max Vozeler:
        http://article.gmane.org/gmane.linux.kernel.cryptoapi/3752/Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      CC: Max Vozeler <max@hinterhof.net>
      34745785
    • Milan Broz's avatar
      dm crypt: add multi key capability · d1f96423
      Milan Broz authored
      This patch adds generic multikey handling to be used
      in following patch for Loop-AES mode compatibility.
      
      This patch extends mapping table to optional keycount and
      implements generic multi-key capability.
      
      With more keys defined the <key> string is divided into
      several <keycount> sections and these are used for tfms.
      
      The tfm is used according to sector offset
      (sector 0->tfm[0], sector 1->tfm[1], sector N->tfm[N modulo keycount])
      (only power of two values supported for keycount here).
      
      Because of tfms per-cpu allocation, this mode can be take
      a lot of memory on large smp systems.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: Max Vozeler <max@hinterhof.net>
      d1f96423
    • Milan Broz's avatar
      dm crypt: add post iv call to iv generator · 2dc5327d
      Milan Broz authored
      IV (initialisation vector) can in principle depend not only
      on sector but also on plaintext data (or other attributes).
      
      Change IV generator interface to work directly with dmreq
      structure to allow such dependence in generator.
      
      Also add post() function which is called after the crypto
      operation.
      
      This allows tricky modification of decrypted data or IV
      internals.
      
      In asynchronous mode the post() can be called after
      ctx->sector count was increased so it is needed
      to add iv_sector copy directly to dmreq structure.
      (N.B. dmreq always include only one sector in scatterlists)
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      2dc5327d
    • Milan Broz's avatar
      dm crypt: use io thread for reads only if mempool exhausted · 20c82538
      Milan Broz authored
      If there is enough memory, code can directly submit bio
      instead queing this operation in separate thread.
      
      Try to alloc bio clone with GFP_NOWAIT and only if it
      fails use separate queue (map function cannot block here).
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      20c82538
    • Andi Kleen's avatar
      dm crypt: scale to multiple cpus · c0297721
      Andi Kleen authored
      Currently dm-crypt does all the encryption work for a single dm-crypt
      mapping in a single workqueue. This does not scale well when multiple
      CPUs are submitting IO at a high rate. The single CPU running the single
      thread cannot keep up with the encryption and encrypted IO performance
      tanks.
      
      This patch changes the crypto workqueue to be per CPU. This means
      that as long as the IO submitter (or the interrupt target CPUs
      for reads) runs on different CPUs the encryption work will be also
      parallel.
      
      To avoid a bottleneck on the IO worker I also changed those to be
      per-CPU threads.
      
      There is still some shared data, so I suspect some bouncing
      cache lines. But I haven't done a detailed study on that yet.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      c0297721
    • Milan Broz's avatar
      dm crypt: simplify compatible table output · 7dbcd137
      Milan Broz authored
      Rename cc->cipher_mode to cc->cipher_string and store the whole of the cipher
      information so it can easily be printed when processing the DM_DEV_STATUS ioctl.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7dbcd137
    • Jonathan Brassow's avatar
      dm log userspace: add version number to comms · 86a54a48
      Jonathan Brassow authored
      This patch adds a 'version' field to the 'dm_ulog_request'
      structure.
      
      The 'version' field is taken from a portion of the unused
      'padding' field in the 'dm_ulog_request' structure.  This
      was done to avoid changing the size of the structure and
      possibly disrupting backwards compatibility.
      
      The version number will help notify user-space daemons
      when a change has been made to the kernel/userspace
      log API.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      86a54a48
    • Jonathan Brassow's avatar
      dm log userspace: group clear and mark requests · 085ae065
      Jonathan Brassow authored
      Allow the device-mapper log's 'mark' and 'clear' requests to be
      grouped and processed in a batch.  This can significantly reduce the
      amount of traffic going between the kernel and userspace (where the
      processing daemon resides).
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      085ae065
    • Jonathan Brassow's avatar
      dm log userspace: split flush queue · 909cc4fb
      Jonathan Brassow authored
      Split the 'flush_list', which contained a mix of both 'mark' and 'clear'
      requests, into two distinct lists ('mark_list' and 'clear_list').
      
      The device mapper log implementations (used by various DM targets) are
      allowed to cache 'mark' and 'clear' requests until a 'flush' is
      received.  Until now, these cached requests were kept in the same list.
      They will now be put into distinct lists to facilitate group processing
      of these requests (in the next patch).
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      909cc4fb
    • Mikulas Patocka's avatar
      dm kcopyd: delay unplugging · 8d35d3e3
      Mikulas Patocka authored
      Make kcopyd merge more I/O requests by using device unplugging.
      
      Without this patch, each I/O request is dispatched separately to the device.
      If the device supports tagged queuing, there are many small requests sent
      to the device. To improve performance, this patch will batch as many requests
      as possible, allowing the queue to merge consecutive requests, and send them
      to the device at once.
      
      In my tests (15k SCSI disk), this patch improves sequential write throughput:
      
        Sequential write throughput (chunksize of 4k, 32k, 512k)
        unpatched: 15.2, 18.5, 17.5 MB/s
        patched:   14.4, 22.6, 23.0 MB/s
      
      In most common uses (snapshot or two-way mirror), kcopyd is only used for
      two devices, one for reading and the other for writing, thus this optimization
      is implemented only for two devices. The optimization may be extended to n-way
      mirrors with some code complexity increase.
      
      We keep track of two block devices to unplug (one for read and the
      other for write) and unplug them when exiting "do_work" thread.  If
      there are more devices used (in theory it could happen, in practice it
      is rare), we unplug immediately.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      8d35d3e3
    • Jonathan Brassow's avatar
      dm log userspace: trap all failed log construction errors · 4a038677
      Jonathan Brassow authored
      When constructing a mirror log, it is possible for the initial request
      to fail for other reasons besides -ESRCH.  These must be handled too.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4a038677
    • Milan Broz's avatar
      dm crypt: set key size early · 69a8cfcd
      Milan Broz authored
      Simplify key size verification (hexadecimal string) and
      set key size early in constructor.
      
      (Patch required by later changes.)
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      69a8cfcd
    • Milan Broz's avatar
      dm: remove dm_mutex after bkl conversion · 4a1aeb98
      Milan Broz authored
      This patch replaces dm_mutex with _minor_lock in dm_blk_close()
      and then removes it.
      
      During the BKL conversion, commit 6e9624b8
      (block: push down BKL into .open and .release) pushed lock_kernel()
      down into dm_blk_open/close calls.
      Commit 2a48fc0a
      (block: autoconvert trivial BKL users to private mutex) converted it to a
      local mutex, but _minor_lock is sufficient.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4a1aeb98
    • Mike Snitzer's avatar
      dm raid1: support discard · 5fc2ffea
      Mike Snitzer authored
      Enable discard support in the DM mirror target.
      Also change an existing use of 'bvec' to 'addr' in the union.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      5fc2ffea
    • Peter Jones's avatar
      dm ioctl: allow rename to fill empty uuid · 84c89557
      Peter Jones authored
      Allow the uuid of a mapped device to be set after device creation.
      Previously the uuid (which is optional) could only be set by
      DM_DEV_CREATE.  If no uuid was supplied it could not be set later.
      
      Sometimes it's necessary to create the device before the uuid is known,
      and in such cases the uuid must be filled in after the creation.
      
      This patch extends DM_DEV_RENAME to accept a uuid accompanied by
      a new flag DM_UUID_FLAG.  This can only be done once and if no
      uuid was previously supplied.  It cannot be used to change an
      existing uuid.
      
      DM_VERSION_MINOR is also bumped to 19 to indicate this interface
      extension is available.
      Signed-off-by: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      84c89557
    • Mikulas Patocka's avatar
      dm io: remove BIO_RW_SYNCIO flag from kcopyd · d9bf0b50
      Mikulas Patocka authored
      Remove the REQ_SYNC flag to improve write throughput when writing
      to the origin with a snapshot on the same device (using the CFQ I/O
      scheduler).
      
      Sequential write throughput (chunksize of 4k, 32k, 512k)
        unpatched:  8.5,  8.6,  9.3 MB/s
        patched:   15.2, 18.5, 17.5 MB/s
      
      Snapshot exception reallocations are triggered by writes that are
      usually async, so mark the associated dm_io_request as async as well.
      This helps when using the CFQ I/O scheduler because it has separate
      queues for sync and async I/O.  Async is optimized for throughput; sync
      for latency.  With this change we're consciously favoring throughput over
      latency.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      d9bf0b50
    • Mike Snitzer's avatar
      dm mpath: disable blk_abort_queue · 09c9d4c9
      Mike Snitzer authored
      Revert commit 224cb3e9
        dm: Call blk_abort_queue on failed paths
      
      Multipath began to use blk_abort_queue() to allow for
      lower latency path deactivation.  This was found to
      cause list corruption:
      
         the cmd gets blk_abort_queued/timedout run on it and the scsi eh
         somehow is able to complete and run scsi_queue_insert while
         scsi_request_fn is still trying to process the request.
      
         https://www.redhat.com/archives/dm-devel/2010-November/msg00085.htmlSigned-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: stable@kernel.org
      09c9d4c9
    • Mike Snitzer's avatar
      dm: dont take i_mutex to change device size · c217649b
      Mike Snitzer authored
      No longer needlessly hold md->bdev->bd_inode->i_mutex when changing the
      size of a DM device.  This additional locking is unnecessary because
      i_size_write() is already protected by the existing critical section in
      dm_swap_table().  DM already has a reference on md->bdev so the
      associated bd_inode may be changed without lifetime concerns.
      
      A negative side-effect of having held md->bdev->bd_inode->i_mutex was
      that a concurrent DM device resize and flush (via fsync) would deadlock.
      Dropping md->bdev->bd_inode->i_mutex eliminates this potential for
      deadlock.  The following reproducer no longer deadlocks:
        https://www.redhat.com/archives/dm-devel/2009-July/msg00284.htmlSigned-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Cc: stable@kernel.org
      c217649b
    • Linus Torvalds's avatar
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · 581548db
      Linus Torvalds authored
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
        [IA64] Fix format warning in arch/ia64/kernel/acpi.c
      581548db
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 · 03a4491f
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
        firewire: ohci: fix compilation on arches without PAGE_KERNEL_RO
      03a4491f
    • Linus Torvalds's avatar
      Merge branch 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block · 7b0cb1bd
      Linus Torvalds authored
      * 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block:
        cciss: reinstate proper FIFO order of command queue list
        floppy: replace NO_GEOM macro with a function
      7b0cb1bd
    • Linus Torvalds's avatar
      Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block · 275220f0
      Linus Torvalds authored
      * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
        block: ensure that completion error gets properly traced
        blktrace: add missing probe argument to block_bio_complete
        block cfq: don't use atomic_t for cfq_group
        block cfq: don't use atomic_t for cfq_queue
        block: trace event block fix unassigned field
        block: add internal hd part table references
        block: fix accounting bug on cross partition merges
        kref: add kref_test_and_get
        bio-integrity: mark kintegrityd_wq highpri and CPU intensive
        block: make kblockd_workqueue smarter
        Revert "sd: implement sd_check_events()"
        block: Clean up exit_io_context() source code.
        Fix compile warnings due to missing removal of a 'ret' variable
        fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
        block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
        cfq-iosched: don't check cfqg in choose_service_tree()
        fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
        cdrom: export cdrom_check_events()
        sd: implement sd_check_events()
        sr: implement sr_check_events()
        ...
      275220f0
    • Linus Torvalds's avatar
      Merge branch 'for-linus/i2c-2638' of git://git.fluff.org/bjdooks/linux · fe3c560b
      Linus Torvalds authored
      * 'for-linus/i2c-2638' of git://git.fluff.org/bjdooks/linux:
        i2c-bfin-twi: move setup to the earlier subsys initcall
        i2c-bfin-twi: handle faulty slave devices better
        i2c-mv64xxx: send repeated START between messages in xfer
        i2c-nomadik: fix regression on adapter name
        i2c-omap: Set latency requirements only once for several messages
        i2c-eg20t: add driver for Intel EG20T
        i2c-ocores: add some device tree documentation
        i2c-ocores: Use devres for resource allocation
        i2c-ocores: Adapt for device tree
        i2c-iop3xx: add iomem annotation
      fe3c560b
    • Linus Torvalds's avatar
      Merge branch 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · d2005603
      Linus Torvalds authored
      * 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
        ARM: mach-shmobile: Kill off unused !gpio_is_valid() case
        ARM: mach-shmobile: sh7372 Enable SDIO IRQs for Mackerel
        ARM: mach-shmobile: sh7377 Enable SDIO IRQs
        ARM: mach-shmobile: sh7367 Enable SDIO IRQs
        ARM: mach-shmobile: sh7372 Enable SDIO IRQs
        ARM: mach-shmobile: mackerel: Add touchscreen ST1232 support
        ARM: mach-shmobile: ap4eb: SCIF port for earlyprintk when using zboot
        ARM: mach-shmobile: mackerel: SCIF port for earlyprintk when using zboot
        ARM: mach-shmobile: mackerel: Add support get_cd in CN23
      d2005603
    • Linus Torvalds's avatar
      Merge branch 'sh-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 86f6f9b6
      Linus Torvalds authored
      * 'sh-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (31 commits)
        sh: Add support for AP-SH4AD-0A board.
        sh: Add support for AP-SH4A-3A board.
        sh: Add a new mach type for alpha project boards.
        serial: sh-sci: build fixes.
        sh: sh7372 SH4AL-DSP probe support
        sh: sh7366 Enable SDIO IRQs
        sh: sh7343 Enable SDIO IRQs
        sh: mach-ecovec24: enable runtime PM for SDHI
        sh: sh7723 / ap325rxa enable SDIO IRQs
        sh: sh7722 Enable SDIO IRQs
        sh: sh7724 Enable SDIO IRQs
        sh: Fix up legacy PTEA space attribute mapping.
        sh: Stub out legacy PCC pgprot encoding for X2 TLBs.
        sh: constify prefetch pointers.
        sh: Add a machvec callback for early memblock reservations.
        sh: update sh7757lcr_defconfig
        sh: add PVR probing for SH7757 3rd cut
        sh: Use device_initcall() instead of __initcall()
        sh: intc - convert board specific landisk code
        sh: Move init_landisk_IRQ to header file
        ...
      86f6f9b6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6 · d33a6291
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6: (29 commits)
        video: move SH_MIPI_DSI/SH_LCD_MIPI_DSI to the top of menu
        fbdev: Implement simple blanking in pseudocolor modes for vt8500lcdfb
        video: imx: Update the manufacturer's name
        nuc900fb: don't treat NULL clk as an error
        s3c2410fb: don't treat NULL clk as an error
        video: tidy up modedb formatting.
        video: matroxfb: Correct video option in comments and kernel config help.
        fbdev: sh_mobile_hdmi: simplify pointer handling
        fbdev: sh_mobile_hdmi: framebuffer notifiers have to be registered
        fbdev: sh_mobile_hdmi: add command line option to use the preferred EDID mode
        OMAP: DSS2: Introduce omap_channel as an omap_dss_device parameter, add new overlay manager.
        OMAP: DSS2: Use dss_features to handle DISPC bits removed on OMAP4
        OMAP: DSS2: LCD2 Channel Changes for DISPC
        OMAP: DSS2: Change remaining DISPC functions for new omap_channel argument
        OMAP: DSS2: Introduce omap_channel argument to DISPC functions used by interface drivers
        OMAP: DSS2: Represent DISPC register defines with channel as parameter
        OMAP: DSS2: Add dss_features for omap4 and overlay manager related features
        OMAP: DSS2: Clean up DISPC color mode validation checks
        OMAP: DSS2: Add back authors of panel-generic.c based drivers
        OMAP: DSS2: remove generic DPI panel driver duplicated panel drivers
        ...
      d33a6291