1. 08 Nov, 2013 40 commits
    • Bartlomiej Zolnierkiewicz's avatar
      skd: fix unregister_blkdev() placement · a073ae95
      Bartlomiej Zolnierkiewicz authored
      register_blkdev() is called before pci_register_driver() in skd_init()
      so unregister_blkdev() should be called after pci_unregister_driver()
      in skd_exit(). Fix it.
      
      Cc: Akhil Bhansali <abhansali@stec-inc.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a073ae95
    • Mike Snitzer's avatar
      skd: more removal of bio-based code · 38d4a1bb
      Mike Snitzer authored
      Remove skd_flush_cmd structure and skd_flush_slab.
      Remove skd_end_request wrapper around skd_end_request_blk.
      Remove skd_requeue_request, use blk_requeue_request directly.
      Cleanup some comments (remove "bio" info) and whitespace.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      38d4a1bb
    • Jens Axboe's avatar
      skd: cleanup the skd_*() function block wrapping · 6a5ec65b
      Jens Axboe authored
      Just call the block functions directly, don't wrap them
      in skd helpers. With only one queueing model enabled, there's
      no point in doing that.
      
      Also kill the ->start_time and ->bio from the skd_request_context,
      we don't use those anymore.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6a5ec65b
    • Jens Axboe's avatar
      skd: rip out bio path · fcd37eb3
      Jens Axboe authored
      The skd driver has a selectable rq or bio based queueing model.
      For 3.14, we want to turn this into a single blk-mq interface
      instead. With the immutable biovecs being merged in 3.13, the
      bio model would need patches to even work. So rip it out, with
      a conversion pending for blk-mq in the next release.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fcd37eb3
    • Wei Yongjun's avatar
      skd: fix error return code in skd_pci_probe() · 1762b57f
      Wei Yongjun authored
      Fix to return -ENOMEM in the skd construct error handling
      case instead of 0, as done elsewhere in this function.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1762b57f
    • Heiko Carstens's avatar
      s390/dasd: hold request queue sysfs lock when calling elevator_init() · ef089941
      Heiko Carstens authored
      "elevator: Fix a race in elevator switching and md device initialization"
      changed the semantics of elevator_init() in a way that now enforces to hold
      the corresponding request queue's sysfs_lock when calling elevator_init()
      to fix a race.
      The patch did not convert the s390 dasd device driver which is the only
      device driver which also calls elevator_init(). So add the missing locking.
      
      Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef089941
    • Stephen M. Cameron's avatar
      cciss: return 0 from driver probe function on success, not 1 · b88fac63
      Stephen M. Cameron authored
      A return value of 1 is interpreted as an error
      Signed-off-by: default avatarStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b88fac63
    • rchinthekindi's avatar
      skd: Replaced custom debug PRINTKs with pr_debug · 2e44b427
      rchinthekindi authored
      Replaced DPRINTK() and VPRINTK() with pr_debug().
      Signed-off-by: default avatarRamprasad C <ramprasad.chinthekindi@hgst.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e44b427
    • Akhil Bhansali's avatar
      skd: Fix checkpatch ERRORS and removed unused functions · f721bb0d
      Akhil Bhansali authored
      This patch fixes checkpatch.pl errors for assignment in if condition.
      It also removes unused readq / readl function calls.
      
      As Andrew had disabled the compilation of drivers for 32 bit,
      I have modified format specifiers in few VPRINTKs to avoid warnings
      during 64 bit compilation.
      Signed-off-by: default avatarAkhil Bhansali <abhansali@stec-inc.com>
      Reviewed-by: default avatarRamprasad Chinthekindi <rchinthekindi@stec-inc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f721bb0d
    • Philip J Kelleher's avatar
      rsxx: Fix possible kernel panic with invalid config. · 8c49a77c
      Philip J Kelleher authored
      This patch fixes a possible Kernel Panic on driver load if
      the configuration on the card is messed up or not yet set.
      The driver could possible give a 32 bit unsigned all Fs to
      the kernel as the device's block size.
      
      Now we only write the block size to the kernel if the
      configuration from the card is valid.
      
      Also, driver version is being updated.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8c49a77c
    • Philip J Kelleher's avatar
      rsxx: Disallow discards from being unmapped. · e35f38bf
      Philip J Kelleher authored
      This patch fixes a bug in which discards were always
      calling pci_unmap_page. Discards should never call the
      pci_unmap_page function call because they are never mapped.
      
      This caused a race condition on PowerPC systems when issuing
      discards, writes, and reads all at the same time. The
      pci_map_page function would eventually map logical address
      0 for a read or write. Discards are always assigned a DMA
      address of 0 because they are never mapped. So if
      pci_map_page mapped address 0 for a DMA and a discard was
      "unmapped" then the address would be freed and would cause
      an EEH event to occur when Hardware accesses the address.
      
      This was injected/uncovered in commit:
      b347f9cf0bc8d42ee95ba1d3837fd93045ab336b
      
      The pci_dma_mapping_error function declares -1 a DMA_ERROR
      not 0 like initially thought So before we would never unmap
      discards because they were considered NULL.
      
      This patch should fall on top of commit id:
      fc1967bb08a6184ed44ef990e1dd4389901b809c
      
      Also, the driver version is being up dated.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e35f38bf
    • Lars Ellenberg's avatar
      drbd: avoid to shrink max_bio_size due to peer re-configuration · 35f47ef1
      Lars Ellenberg authored
      For a long time, the receiving side has spread "too large" incoming
      requests over multiple bios.  No need to shrink our max_bio_size
      (max_hw_sectors) if the peer is reconfigured to use a different storage.
      
      The problem manifests itself if we are not the top of the device stack
      (DRBD is used a LVM PV).
      
      A hardware reconfiguration on the peer may cause the supported
      max_bio_size to shrink, and the connection handshake would now
      unnecessarily shrink the max_bio_size on the active node.
      
      There is no way to notify upper layers that they have to "re-stack"
      their limits. So they won't notice at all, and may keep submitting bios
      that are suddenly considered "too large for device".
      
      We already check for compatibility and ignore changes on the peer,
      the code only was masked out unless we have a fully established connection.
      We just need to allow it a bit earlier during the handshake.
      
      Also consider max_hw_sectors in our merge bvec function, just in case.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      35f47ef1
    • Lars Ellenberg's avatar
      drbd: fix decoding of bitmap vli rle for device sizes > 64 TB · d2da5b0c
      Lars Ellenberg authored
      Symptoms: disconnect after bitmap exchange due to
      bitmap overflow (e:49731075554) while decoding bm RLE packet
      
      In the decoding step of the variable length integer run length encoding
      there was potentially an uncatched bitshift by wordsize (variable >> 64).
      
      The result of which is "undefined" :(
      (only "sometimes" the result is the desired 0)
      
      Fix: don't do any bit shift magic for shift == 64, just assign.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d2da5b0c
    • Philipp Reisner's avatar
      drbd: Fix adding of new minors with freshly created meta data · 57737adc
      Philipp Reisner authored
      Online adding of new minors with freshly created meta data
      to an resource with an established connection failed, with a
      wrong state transition on one side on one side of the new minor.
      
      Freshly created meta-data has a la_size (last agreed size) of 0.
      When we online add such devices, the code wrongly got into
      the code path for resyncing new storage that was added while
      the disk was detached.
      
      Fixed that by making the GREW from ZERO a special case.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      57737adc
    • Philipp Reisner's avatar
      drbd: Fix an connection drop issue after enabling allow-two-primaries · b874d231
      Philipp Reisner authored
      Since drbd-8.4.0 it is possible to change the allow-two-primaries
      network option while the connection is established.
      
      The sequence code used to partially order packets from the
      data socket with packets from the meta-data socket, still assued
      that the allow-two-primaries option is constant while the
      connection is established.
      
      I.e.
      On a node that has the RESOLVE_CONFLICTS bits set, after enabling
      allow-two-primaries, when receiving the next data packet it timed out
      while waiting for the necessary packets on the data socket to arrive
      (wait_for_and_update_peer_seq() function).
      
      Fixed that by always tracking the sequence number, but only waiting
      for it if allow-two-primaries is set.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b874d231
    • Lars Ellenberg's avatar
      drbd: fix NULL pointer deref in module init error path · 69babf05
      Lars Ellenberg authored
      If we want to iterate over the (as of yet still empty) list in the
      cleanup path, we need to initialize the list before the first goto fail.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      69babf05
    • Jens Axboe's avatar
      block: disable cpqarray in Kconfig · 7badfb1c
      Jens Axboe authored
      Mike writes:
      
      "cpqarray hasn't been used in over 12 years. It's doubtful that anyone
       still uses the board. It's time the driver was removed from the mainline
       kernel.  The only updates these days are minor and mostly done by people
       outside of HP."
      
      If nobody yells, we'll remove it from the kernel tree completely
      for 3.15.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7badfb1c
    • Akhil Bhansali's avatar
      Add support for sTec's pci-e flash card Kronos · e67f86b3
      Akhil Bhansali authored
      Signed-off-by: default avatarAkhil Bhansali <abhansali@stec-inc.com>
      Signed-off-by: default avatarRamprasad Chinthekindi <rchinthekindi@stec-inc.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      
      Folded patch, contributions to clean up this driver from:
      
      Jens Axboe
      Dan Carpenter
      Andrew Morton
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e67f86b3
    • Philip J Kelleher's avatar
      rsxx: Kernel Panic caused by mapping Discards · 0317cd6d
      Philip J Kelleher authored
      This fixes a kernel panic injected by commit id
      8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
      are getting mapped through the pci_map_page function call.
      
      The driver will now start verifying that a dma is not a
      discard before issuing a the pci_map_page function call.
      
      Also, we are updating the driver version.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0317cd6d
    • David Milburn's avatar
      mtip32xx: dynamically allocate buffer in debugfs functions · c8afd0dc
      David Milburn authored
      Dynamically allocate buf to prevent warnings:
      
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_device_status’:
      drivers/block/mtip32xx/mtip32xx.c:2823: warning: the frame size of 1056 bytes is larger than 1024 bytes
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_registers’:
      drivers/block/mtip32xx/mtip32xx.c:2894: warning: the frame size of 1056 bytes is larger than 1024 bytes
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_flags’:
      drivers/block/mtip32xx/mtip32xx.c:2917: warning: the frame size of 1056 bytes is larger than 1024 bytes
      Signed-off-by: default avatarDavid Milburn <dmilburn@redhat.com>
      Acked-by: default avatarAsai Thambi S P <asamymuthupa@micron.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8afd0dc
    • Asai Thambi S P's avatar
      mtip32xx: Add SRSI support · 8f8b8995
      Asai Thambi S P authored
      This patch add support for SRSI(Surprise Removal Surprise Insertion).
      
      Approach:
      ---------
      Surprise Removal:
      -----------------
      On surprise removal of the device, gendisk, request queue, device index, sysfs
      entries, etc are retained as long as device is in use - mounted filesystem,
      device opened by an application, etc. The service thread breaks out of the main
      while loop, waits for pci remove to exit, and then waits for device to become
      free. When there no holders of the device, service thread cleans up the block
      and device related stuff and returns.
      
      Surprise Insertion:
      -------------------
      No change, this scenario follows the normal pci probe() function flow.
      Signed-off-by: default avatarAsai Thambi S P <asamymuthupa@micron.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8f8b8995
    • Philip J Kelleher's avatar
      rsxx: Moving pci_map_page to prevent overflow. · 1b21f5b2
      Philip J Kelleher authored
      The pci_map_page function has been moved into our
      issued workqueue to prevent an us running out of
      mappable addresses on non-HWWD PCIe x8 slots. The
      maximum amount that can possible be mapped at one
      time now is: 255 dmas X 4 dma channels X 4096 Bytes.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1b21f5b2
    • Philip J Kelleher's avatar
      rsxx: Handling failed pci_map_page on PowerPC and double free. · e5feab22
      Philip J Kelleher authored
      The rsxx driver was not checking the correct value during a
      pci_map_page failure. Fixing this also uncovered a
      double free if the bio was returned before it was
      broken up into indiviadual 4k dmas, that is also
      fixed here.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5feab22
    • Mikulas Patocka's avatar
      loop: fix crash when using unassigned loop device · ef7e7c82
      Mikulas Patocka authored
      When the loop module is loaded, it creates 8 loop devices /dev/loop[0-7].
      The devices have no request routine and thus, when they are used without
      being assigned, a crash happens.
      
      For example, these commands cause crash (assuming there are no used loop
      devices):
      
      Kernel Fault: Code=26 regs=000000007f420980 (Addr=0000000000000010)
      CPU: 1 PID: 50 Comm: kworker/1:1 Not tainted 3.11.0 #1
      Workqueue: ksnaphd do_metadata [dm_snapshot]
      task: 000000007fcf4078 ti: 000000007f420000 task.ti: 000000007f420000
      [  116.319988]
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001001111111100001111 Not tainted
      r00-03  000000ff0804ff0f 00000000408bf5d0 00000000402d8204 000000007b7ff6c0
      r04-07  00000000408a95d0 000000007f420950 000000007b7ff6c0 000000007d06c930
      r08-11  000000007f4205c0 0000000000000001 000000007f4205c0 000000007f4204b8
      r12-15  0000000000000010 0000000000000000 0000000000000000 0000000000000000
      r16-19  000000001108dd48 000000004061cd7c 000000007d859800 000000000800000f
      r20-23  0000000000000000 0000000000000008 0000000000000000 0000000000000000
      r24-27  00000000ffffffff 000000007b7ff6c0 000000007d859800 00000000408a95d0
      r28-31  0000000000000000 000000007f420950 000000007f420980 000000007f4208e8
      sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000303000
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  117.549988]
      IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d82fc 00000000402d8300
       IIR: 53820020    ISR: 0000000000000000  IOR: 0000000000000010
       CPU:        1   CR30: 000000007f420000 CR31: ffffffffffffffff
       ORIG_R28: 0000000000000001
       IAOQ[0]: generic_make_request+0x11c/0x1a0
       IAOQ[1]: generic_make_request+0x120/0x1a0
       RP(r2): generic_make_request+0x24/0x1a0
      Backtrace:
       [<00000000402d83f0>] submit_bio+0x70/0x140
       [<0000000011087c4c>] dispatch_io+0x234/0x478 [dm_mod]
       [<0000000011087f44>] sync_io+0xb4/0x190 [dm_mod]
       [<00000000110883bc>] dm_io+0x2c4/0x310 [dm_mod]
       [<00000000110bfcd0>] do_metadata+0x28/0xb0 [dm_snapshot]
       [<00000000401591d8>] process_one_work+0x160/0x460
       [<0000000040159bc0>] worker_thread+0x300/0x478
       [<0000000040161a70>] kthread+0x118/0x128
       [<0000000040104020>] end_fault_vector+0x20/0x28
       [<0000000040177220>] task_tick_fair+0x420/0x4d0
       [<00000000401aa048>] invoke_rcu_core+0x50/0x60
       [<00000000401ad5b8>] rcu_check_callbacks+0x210/0x8d8
       [<000000004014aaa0>] update_process_times+0xa8/0xc0
       [<00000000401ab86c>] rcu_process_callbacks+0x4b4/0x598
       [<0000000040142408>] __do_softirq+0x250/0x2c0
       [<00000000401789d0>] find_busiest_group+0x3c0/0xc70
      [  119.379988]
      Kernel panic - not syncing: Kernel Fault
      Rebooting in 1 seconds..
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef7e7c82
    • Vegard Nossum's avatar
      xen/blkback: fix reference counting · ea5ec76d
      Vegard Nossum authored
      If the permission check fails, we drop a reference to the blkif without
      having taken it in the first place. The bug was introduced in commit
      604c499c (xen/blkback: Check device
      permissions before allowing OP_DISCARD).
      
      Cc: stable@vger.kernel.org
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ea5ec76d
    • Roger Pau Monne's avatar
      xen-blkfront: improve aproximation of required grants per request · c47206e2
      Roger Pau Monne authored
      Improve the calculation of required grants to process a request by
      using nr_phys_segments instead of always assuming a request is going
      to use all posible segments.
      
      nr_phys_segments contains the number of scatter-gather DMA addr+len
      pairs, which is basically what we put at every granted page.
      for_each_sg iterates over the DMA addr+len pairs and uses a grant
      page for each of them.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c47206e2
    • Roger Pau Monne's avatar
      xen-blkfront: revoke foreign access for grants not mapped by the backend · fbe363c4
      Roger Pau Monne authored
      There's no need to keep the foreign access in a grant if it is not
      persistently mapped by the backend. This allows us to free grants that
      are not mapped by the backend, thus preventing blkfront from hoarding
      all grants.
      
      The main effect of this is that blkfront will only persistently map
      the same grants as the backend, and it will always try to use grants
      that are already mapped by the backend. Also the number of persistent
      grants in blkfront is the same as in blkback (and is controlled by the
      value in blkback).
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarMatt Wilson <msw@amazon.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fbe363c4
    • Michael Opdenacker's avatar
      mg_disk: remove deprecated IRQF_DISABLED · 370d6686
      Michael Opdenacker authored
      This patch proposes to remove the use of the IRQF_DISABLED flag
      
      It's a NOOP since 2.6.35 and it will be removed one day.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      370d6686
    • Duan Jiong's avatar
      block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · c7d1ba41
      Duan Jiong authored
      This patch fixes coccinelle error regarding usage of IS_ERR and
      PTR_ERR instead of PTR_ERR_OR_ZERO.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c7d1ba41
    • Duan Jiong's avatar
      block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · 8616ebb1
      Duan Jiong authored
      This patch fixes coccinelle error regarding usage of IS_ERR and
      PTR_ERR instead of PTR_ERR_OR_ZERO.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8616ebb1
    • Geert Uytterhoeven's avatar
      block: Do not call sector_div() with a 64-bit divisor · 97597dc0
      Geert Uytterhoeven authored
      do_div() (called by sector_div() if CONFIG_LBDAF=y) is meant for divisions
      of 64-bit number by 32-bit numbers.  Passing 64-bit divisor types caused
      issues in the past on 32-bit platforms, cfr. commit
      ea077b1b ("m68k: Truncate base in
      do_div()").
      
      As queue_limits.max_discard_sectors and .discard_granularity are unsigned
      int, max_discard_sectors and granularity should be unsigned int.
      As bdev_discard_alignment() returns int, alignment should be int.
      Now 2 calls to sector_div() can be replaced by 32-bit arithmetic:
        - The 64-bit modulo operation can become a 32-bit modulo operation,
        - The 64-bit division and multiplication can be replaced by a 32-bit
          modulo operation and a subtraction.
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      97597dc0
    • Chen Gang's avatar
      kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup() · f8c5e944
      Chen Gang authored
      do_blk_trace_setup() will fully initialize 'buts.name', so can remove
      the related memcpy(). And also use BLKTRACE_BDEV_SIZE and ARRAY_SIZE
      instead of hard code number '32'.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f8c5e944
    • Kent Overstreet's avatar
      block: Consolidate duplicated bio_trim() implementations · 6678d83f
      Kent Overstreet authored
      Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
      we should know better than this.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6678d83f
    • Kent Overstreet's avatar
      block: Use rw_copy_check_uvector() · e0ce0eac
      Kent Overstreet authored
      No need for silly open coding - and struct sg_iovec has exactly the same
      layout as struct iovec...
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e0ce0eac
    • Alireza Haghdoost's avatar
      block: Enable sysfs nomerge control for I/O requests in the plug list · 23779fbc
      Alireza Haghdoost authored
      This patch enables the sysfs to control I/O request merge
      functionality in the plug list. While this control has been
      implemented for the request queue, it was dismissed in the plug list.
      Therefore, block layer merges requests together (or attempt to merge)
      even if the merge capability was disable using sysfs nomerge parameter
      value 2.
      
      This limitation is directly affects functionality of io_submit()
      system call. The system call enables user to submit a bunch of IO
      requests from user space using struct iocb **ios input argument.
      However, the unconditioned merging functionality in the plug list
      potentially merges these requests together down the road. Therefore,
      there is no way to distinguish between an application sending bunch of
      sequential IOs and an application sending one big IO. Ultimately, all
      requests generated by the former app merge within the plug list
      together and looks similar to the second app.
      
      While the merging functionality is a desirable feature to improve the
      performance of IO subsystem for some applications, it is not useful
      for other application like ours at all.
      Signed-off-by: default avatarAlireza Haghdoost <alireza@cs.umn.edu>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      
      Coding style modified.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      23779fbc
    • Mike Snitzer's avatar
      block: properly stack underlying max_segment_size to DM device · d82ae52e
      Mike Snitzer authored
      Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
      (65536) even if the underlying device(s) have a larger value -- this is
      due to blk_stack_limits() using min_not_zero() when stacking the
      max_segment_size limit.
      
      1073741824
      
      before patch:
      65536
      
      after patch:
      1073741824
      Reported-by: default avatarLukasz Flis <l.flis@cyfronet.pl>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.3+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d82ae52e
    • Tomoki Sekiyama's avatar
      elevator: acquire q->sysfs_lock in elevator_change() · 7c8a3679
      Tomoki Sekiyama authored
      Add locking of q->sysfs_lock into elevator_change() (an exported function)
      to ensure it is held to protect q->elevator from elevator_init(), even if
      elevator_change() is called from non-sysfs paths.
      sysfs path (elv_iosched_store) uses __elevator_change(), non-locking
      version, as the lock is already taken by elv_iosched_store().
      Signed-off-by: default avatarTomoki Sekiyama <tomoki.sekiyama@hds.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7c8a3679
    • Tomoki Sekiyama's avatar
      elevator: Fix a race in elevator switching and md device initialization · eb1c160b
      Tomoki Sekiyama authored
      The soft lockup below happens at the boot time of the system using dm
      multipath and the udev rules to switch scheduler.
      
      [  356.127001] BUG: soft lockup - CPU#3 stuck for 22s! [sh:483]
      [  356.127001] RIP: 0010:[<ffffffff81072a7d>]  [<ffffffff81072a7d>] lock_timer_base.isra.35+0x1d/0x50
      ...
      [  356.127001] Call Trace:
      [  356.127001]  [<ffffffff81073810>] try_to_del_timer_sync+0x20/0x70
      [  356.127001]  [<ffffffff8118b08a>] ? kmem_cache_alloc_node_trace+0x20a/0x230
      [  356.127001]  [<ffffffff810738b2>] del_timer_sync+0x52/0x60
      [  356.127001]  [<ffffffff812ece22>] cfq_exit_queue+0x32/0xf0
      [  356.127001]  [<ffffffff812c98df>] elevator_exit+0x2f/0x50
      [  356.127001]  [<ffffffff812c9f21>] elevator_change+0xf1/0x1c0
      [  356.127001]  [<ffffffff812caa50>] elv_iosched_store+0x20/0x50
      [  356.127001]  [<ffffffff812d1d09>] queue_attr_store+0x59/0xb0
      [  356.127001]  [<ffffffff812143f6>] sysfs_write_file+0xc6/0x140
      [  356.127001]  [<ffffffff811a326d>] vfs_write+0xbd/0x1e0
      [  356.127001]  [<ffffffff811a3ca9>] SyS_write+0x49/0xa0
      [  356.127001]  [<ffffffff8164e899>] system_call_fastpath+0x16/0x1b
      
      This is caused by a race between md device initialization by multipathd and
      shell script to switch the scheduler using sysfs.
      
       - multipathd:
         SyS_ioctl -> do_vfs_ioctl -> dm_ctl_ioctl -> ctl_ioctl -> table_load
         -> dm_setup_md_queue -> blk_init_allocated_queue -> elevator_init
          q->elevator = elevator_alloc(q, e); // not yet initialized
      
       - sh -c 'echo deadline > /sys/$DEVPATH/queue/scheduler':
         elevator_switch (in the call trace above)
          struct elevator_queue *old = q->elevator;
          q->elevator = elevator_alloc(q, new_e);
          elevator_exit(old);                 // lockup! (*)
      
       - multipathd: (cont.)
          err = e->ops.elevator_init_fn(q);   // init fails; q->elevator is modified
      
      (*) When del_timer_sync() is called, lock_timer_base() will loop infinitely
      while timer->base == NULL. In this case, as timer will never initialized,
      it results in lockup.
      
      This patch introduces acquisition of q->sysfs_lock around elevator_init()
      into blk_init_allocated_queue(), to provide mutual exclusion between
      initialization of the q->scheduler and switching of the scheduler.
      
      This should fix this bugzilla:
      https://bugzilla.redhat.com/show_bug.cgi?id=902012Signed-off-by: default avatarTomoki Sekiyama <tomoki.sekiyama@hds.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eb1c160b
    • Christoph Lameter's avatar
      block: Replace __get_cpu_var uses · 170d800a
      Christoph Lameter authored
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	this_cpu_inc(y)
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      170d800a
    • Mikulas Patocka's avatar
      bdi: test bdi_init failure · 8077c0d9
      Mikulas Patocka authored
      There were two places where return value from bdi_init was not tested.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8077c0d9