1. 13 Jan, 2016 2 commits
    • Arnd Bergmann's avatar
      null_blk: use sector_div instead of do_div · e93d12ae
      Arnd Bergmann authored
      Dividing a sector_t number should be done using sector_div rather than do_div
      to optimize the 32-bit sector_t case, and with the latest do_div optimizations,
      we now get a compile-time warning for this:
      
      arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int *}' but argument is of type 'sector_t * {aka long unsigned int *}'
      drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast
      
      This changes the newly added code to use sector_div. It is a simplified version
      of the original patch, as Linus Torvalds pointed out that we should not be using
      an expensive division function in the first place.
      
      This version was suggested by Matias Bjorling.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Matias Bjorling <m@bjorling.me>
      Fixes: b2b7e001 ("null_blk: register as a LightNVM device")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e93d12ae
    • Jens Axboe's avatar
      Merge branch 'stable/for-jens-4.5' of... · 038a75af
      Jens Axboe authored
      Merge branch 'stable/for-jens-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-4.5/drivers
      
      Konrad writes:
      
      The pull is based on converting the backend driver into an multiqueue
      driver and exposing more than one queue to the frontend. As such we had
      to modify the frontend and also fix a bunch of bugs around this.
      
      The original work is based on Arianna Avanzini's work as an OPW intern.
      Bob took over the work and had been massaging it for quite some time.
      
      Also included are are features to 64KB page support for ARM and various
      bug-fixes.
      038a75af
  2. 08 Jan, 2016 1 commit
  3. 04 Jan, 2016 21 commits
    • Konrad Rzeszutek Wilk's avatar
      xen/blkfront: Fix crash if backend doesn't follow the right states. · c31ecf6c
      Konrad Rzeszutek Wilk authored
      We have split the setting up of all the resources in two steps:
      1) talk_to_blkback  - which figures out the num_ring_pages (from
         the default value of zero), sets up shadow and so
      2) blkfront_connect - does the real part of filling out the
         internal structures.
      
      The problem is if we bypass the 1) step and go straight to 2)
      and call blkfront_setup_indirect where we use the macro
      BLK_RING_SIZE - which returns an negative value (because
      sz is zero  - since num_ring_pages is zero - since it has never
      been set).
      
      We can fix this by making sure that we always have called
      talk_to_blkback before going to blkfront_connect.
      
      Or we could set in blkfront_probe info->nr_ring_pages = 1
      to have a default value. But that looks odd - as we haven't
      actually negotiated any ring size.
      
      This patch changes XenbusStateConnected state to detect if
      we haven't done the initial handshake - and if so continue
      on as if were in XenbusStateInitWait state.
      
      We also roll the error recovery (freeing the structure) into
      talk_to_blkback error path - which is safe since that function
      is only called from blkback_changed.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c31ecf6c
    • Bob Liu's avatar
      xen/blkback: Fix two memory leaks. · 93bb277f
      Bob Liu authored
      This patch fixs two memleaks:
        backtrace:
          [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
          [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0
          [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230
          [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130
          [<ffffffff81511716>] driver_probe_device+0x166/0x2c0
          [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0
          [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90
          [<ffffffff81511ab7>] __device_attach+0xc7/0x120
          [<ffffffff81511b23>] device_initial_probe+0x13/0x20
          [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0
          [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0
          [<ffffffff8150f47e>] device_register+0x1e/0x30
          [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170
          [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0
          [<ffffffff8146b1bb>] backend_changed+0x1b/0x20
          [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
      unreferenced object 0xffff880007ba8ef8 (size 224):
      
        backtrace:
          [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
          [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0
          [<ffffffff81534d87>] frontend_changed+0x2c7/0x580
          [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0
          [<ffffffff8146b2c0>] frontend_changed+0x10/0x20
          [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
          [<ffffffff810d3e97>] kthread+0xd7/0xf0
          [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      unreferenced object 0xffff8800048dcd38 (size 224):
      
      The first leak is caused by not put() the be->blkif reference
      which we had gotten in xen_blkif_alloc(), while the second is
      us not freeing blkif->rings in the right place.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reported-and-Tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      93bb277f
    • Bob Liu's avatar
      xen/blkback: make st_ statistics per ring · db6fbc10
      Bob Liu authored
      Make st_* statistics per ring and the VBD sysfs would iterate over all the
      rings.
      
      Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
      are torn down, so it's safe.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Aligned the variables on the same column.
      db6fbc10
    • Julien Grall's avatar
      xen/blkfront: Handle non-indirect grant with 64KB pages · 6cc56833
      Julien Grall authored
      The minimal size of request in the block framework is always PAGE_SIZE.
      It means that when 64KB guest is support, the request will at least be
      64KB.
      
      Although, if the backend doesn't support indirect descriptor (such as QDISK
      in QEMU), a ring request is only able to accommodate 11 segments of 4KB
      (i.e 44KB).
      
      The current frontend is assuming that an I/O request will always fit in
      a ring request. This is not true any more when using 64KB page
      granularity and will therefore crash during boot.
      
      On ARM64, the ABI is completely neutral to the page granularity used by
      the domU. The guest has the choice between different page granularity
      supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
      This can't be enforced by the hypervisor and therefore it's possible to
      run guests using different page granularity.
      
      So we can't mandate the block backend to support indirect descriptor
      when the frontend is using 64KB page granularity and have to fix it
      properly in the frontend.
      
      The solution exposed below is based on modifying directly the frontend
      guest rather than asking the block framework to support smaller size
      (i.e < PAGE_SIZE). This is because the change is the block framework are
      not trivial as everything seems to relying on a struct *page (see [1]).
      Although, it may be possible that someone succeed to do it in the future
      and we would therefore be able to use it.
      
      Given that a block request may not fit in a single ring request, a
      second request is introduced for the data that cannot fit in the first
      one. This means that the second ring request should never be used on
      Linux if the page size is smaller than 44KB.
      
      To achieve the support of the extra ring request, the block queue size
      is divided by two. Therefore, the ring will always contain enough space
      to accommodate 2 ring requests. While this will reduce the overall
      performance, it will make the implementation more contained. The way
      forward to get better performance is to implement in the backend either
      indirect descriptor or multiple grants ring.
      
      Note that the parameters blk_queue_max_* helpers haven't been updated.
      The block code will set the mimimum size supported and we may be able
      to support directly any change in the block framework that lower down
      the minimal size of a request.
      
      [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.htmlSigned-off-by: default avatarJulien Grall <julien.grall@citrix.com>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6cc56833
    • Julien Grall's avatar
      xen-blkfront: Introduce blkif_ring_get_request · 2e073969
      Julien Grall authored
      The code to get a request is always the same. Therefore we can factorize
      it in a single function.
      Signed-off-by: default avatarJulien Grall <julien.grall@citrix.com>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e073969
    • Jiri Kosina's avatar
      xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() · a6e7af12
      Jiri Kosina authored
      xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
      every attempt to purge the LRU. This operation can't ever succeed though,
      as the kthread hasn't marked itself as freezable.
      
      Before (hopefully eventually) kthread freezing gets converted to fileystem
      freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
      generate I/O during suspend).
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a6e7af12
    • Konrad Rzeszutek Wilk's avatar
      xen/blkback: Free resources if connect_ring failed. · 2d0382fa
      Konrad Rzeszutek Wilk authored
      With the multi-queue support we could fail at setting up
      some of the rings and fail the connection. That meant that
      all resources tied to rings[0..n-1] (where n is the ring
      that failed to be setup). Eventually the frontend will switch
      to the states and we will call xen_blkif_disconnect.
      
      However we do not want to be at the mercy of the frontend
      deciding when to change states. This allows us to do the
      cleanup right away and freeing resources.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2d0382fa
    • Konrad Rzeszutek Wilk's avatar
      xen/blocks: Return -EXX instead of -1 · bde21f73
      Konrad Rzeszutek Wilk authored
      Lets return sensible values instead of -1.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bde21f73
    • Bob Liu's avatar
      xen/blkback: make pool of persistent grants and free pages per-queue · d4bf0065
      Bob Liu authored
      Make pool of persistent grants and free pages per-queue/ring instead of
      per-device to get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Results:
      iops1: After patch "xen/blkfront: make persistent grants per-queue".
      iops2: After this patch.
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
      Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)
      
      With 4 queues after this commit we can get ~75% increase in IOPS, and
      performance won't drop if increasing queue numbers.
      
      Please find the respective chart in this link:
      https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d4bf0065
    • Bob Liu's avatar
      xen/blkback: get the number of hardware queues/rings from blkfront · d62d8600
      Bob Liu authored
      Backend advertises "multi-queue-max-queues" to front, also get the negotiated
      number from "multi-queue-num-queues" written by blkfront.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d62d8600
    • Konrad Rzeszutek Wilk's avatar
      xen/blkback: pseudo support for multi hardware queues/rings · 2fb1ef4f
      Konrad Rzeszutek Wilk authored
      Preparatory patch for multiple hardware queues (rings). The number of
      rings is unconditionally set to 1, larger number will be enabled in
      "xen/blkback: get the number of hardware queues/rings from blkfront".
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Align variables in the structures.
      2fb1ef4f
    • Bob Liu's avatar
      xen/blkback: separate ring information out of struct xen_blkif · 59795700
      Bob Liu authored
      Split per ring information to an new structure "xen_blkif_ring", so that one vbd
      device can be associated with one or more rings/hardware queues.
      
      Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
      may have multi backend threads.
      
      This patch is a preparation for supporting multi hardware queues/rings.
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Align the variables in the structure.
      59795700
    • Peng Fan's avatar
      xen/blkfront: correct setting for xen_blkif_max_ring_order · 45fc8264
      Peng Fan authored
      According to this piece code:
      "
           pr_info("Invalid max_ring_order (%d), will use default max: %d.\n",
                    xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER);
      "
      if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER,
      need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER,
      but not 0.
      Signed-off-by: default avatarPeng Fan <van.freenix@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      45fc8264
    • Bob Liu's avatar
      xen/blkfront: make persistent grants pool per-queue · 73716df7
      Bob Liu authored
      Make persistent grants per-queue/ring instead of per-device, so that we can
      drop the 'dev_lock' and get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73716df7
    • Bob Liu's avatar
      xen/blkfront: Remove duplicate setting of ->xbdev. · 75f070b3
      Bob Liu authored
      We do the same exact operations a bit earlier in the
      function.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      75f070b3
    • Konrad Rzeszutek Wilk's avatar
    • Bob Liu's avatar
      xen/blkfront: negotiate number of queues/rings to be used with backend · 28d949bc
      Bob Liu authored
      The max number of hardware queues for xen/blkfront is set by parameter
      'max_queues'(default 4), while it is also capped by the max value that the
      xen/blkback exposes through XenStore key 'multi-queue-max-queues'.
      
      The negotiated number is the smaller one and would be written back to xenstore
      as "multi-queue-num-queues", blkback needs to read this negotiated number.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      28d949bc
    • Bob Liu's avatar
      xen/blkfront: split per device io_lock · 11659569
      Bob Liu authored
      After patch "xen/blkfront: separate per ring information out of device
      info", per-ring data is protected by a per-device lock ('io_lock').
      
      This is not a good way and will effect the scalability, so introduce a
      per-ring lock ('ring_lock').
      
      The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
      ->persistent_gnts_c which are shared by all rings.
      
      Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc
      so setting ->persistent_gnts_c to zero is not needed.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      11659569
    • Bob Liu's avatar
      xen/blkfront: pseudo support for multi hardware queues/rings · 3df0e505
      Bob Liu authored
      Preparatory patch for multiple hardware queues (rings). The number of
      rings is unconditionally set to 1, larger number will be enabled in
      patch "xen/blkfront: negotiate number of queues/rings to be used with backend"
      so as to make review easier.
      
      Note that blkfront_gather_backend_features does not call
      blkfront_setup_indirect anymore (as that needs to be done per ring).
      That means that in blkif_recover/blkif_connect we have to do it in a loop
      (bounded by nr_rings).
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3df0e505
    • Bob Liu's avatar
      xen/blkfront: separate per ring information out of device info · 81f35161
      Bob Liu authored
      Split per ring information to a new structure "blkfront_ring_info".
      
      A ring is the representation of a hardware queue, every vbd device can associate
      with one or more rings depending on how many hardware queues/rings to be used.
      
      This patch is a preparation for supporting real multi hardware queues/rings.
      
      We also add a backpointer to 'struct blkfront_info' (dev_info) which
      is not needed (we could use containers_of) but further patch
      ("xen/blkfront: pseudo support for multi hardware queues/rings")
      will make allocation of 'blkfront_ring_info' dynamic.
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      81f35161
    • Bob Liu's avatar
      xen/blkif: document blkif multi-queue/ring extension · eb5df87f
      Bob Liu authored
      Document the multi-queue/ring feature in terms of XenStore keys to be written by
      the backend and by the frontend.
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      eb5df87f
  4. 31 Dec, 2015 8 commits
    • Kent Overstreet's avatar
      bcache: Change refill_dirty() to always scan entire disk if necessary · 627ccd20
      Kent Overstreet authored
      Previously, it would only scan the entire disk if it was starting from
      the very start of the disk - i.e. if the previous scan got to the end.
      
      This was broken by refill_full_stripes(), which updates last_scanned so
      that refill_dirty was never triggering the searched_from_start path.
      
      But if we change refill_dirty() to always scan the entire disk if
      necessary, regardless of what last_scanned was, the code gets cleaner
      and we fix that bug too.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      627ccd20
    • Stefan Bader's avatar
      bcache: prevent crash on changing writeback_running · 8d16ce54
      Stefan Bader authored
      Added a safeguard in the shutdown case. At least while not being
      attached it is also possible to trigger a kernel bug by writing into
      writeback_running. This change  adds the same check before trying to
      wake up the thread for that case.
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8d16ce54
    • Gabriel de Perthuis's avatar
      bcache: allows use of register in udev to avoid "device_busy" error. · d7076f21
      Gabriel de Perthuis authored
      Allows to use register, not register_quiet in udev to avoid "device_busy" error.
      The initial patch proposed at https://lkml.org/lkml/2013/8/26/549 by Gabriel de Perthuis
      <g2p.code@gmail.com> does not unlock the mutex and hangs the kernel.
      
      See http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594 for the discussion.
      
      Cc: Denis Bychkov <manover@gmail.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Eric Wheeler <bcache@linux.ewheeler.net>
      Cc: Gabriel de Perthuis <g2p.code@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d7076f21
    • Zheng Liu's avatar
      bcache: unregister reboot notifier if bcache fails to unregister device · 2ecf0cdb
      Zheng Liu authored
      In bcache_init() function it forgot to unregister reboot notifier if
      bcache fails to unregister a block device.  This commit fixes this.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2ecf0cdb
    • Al Viro's avatar
      bcache: fix a leak in bch_cached_dev_run() · 4d4d8573
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4d4d8573
    • Zheng Liu's avatar
      bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device · fecaee6f
      Zheng Liu authored
      This bug can be reproduced by the following script:
      
        #!/bin/bash
      
        bcache_sysfs="/sys/fs/bcache"
      
        function clear_cache()
        {
        	if [ ! -e $bcache_sysfs ]; then
        		echo "no bcache sysfs"
        		exit
        	fi
      
        	cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}')
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach"
        	sleep 5
        	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach"
        }
      
        for ((i=0;i<10;i++)); do
        	clear_cache
        done
      
      The warning messages look like below:
      [  275.948611] ------------[ cut here ]------------
      [  275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P        W
      ---------------   )
      [  275.979253] Hardware name: Tecal RH2285
      [  275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache'
      [  276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.072643] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.089315] Call Trace:
      [  276.105801]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.122650]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.139361]  [<ffffffff81205c08>] ? sysfs_add_one+0xb8/0xd0
      [  276.156012]  [<ffffffff8120609b>] ? sysfs_do_create_link+0x12b/0x170
      [  276.172682]  [<ffffffff81206113>] ? sysfs_create_link+0x13/0x20
      [  276.189282]  [<ffffffffa03bda21>] ? bcache_device_link+0xc1/0x110 [bcache]
      [  276.205993]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.222794]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.239680]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.256594]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.273364]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.290133]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.306368]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]---
      [  276.338241] ------------[ cut here ]------------
      [  276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720
      bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P        W  ---------------   )
      [  276.386017] Hardware name: Tecal RH2285
      [  276.401430] Couldn't create device <-> cache set symlinks
      [  276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
      bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
      i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
      pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
      [  276.465477] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
      [  276.482169] Call Trace:
      [  276.498610]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
      [  276.515405]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
      [  276.532059]  [<ffffffffa03bda3f>] ? bcache_device_link+0xdf/0x110 [bcache]
      [  276.548808]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
      [  276.565569]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
      [  276.582418]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
      [  276.599341]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
      [  276.616142]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
      [  276.632607]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
      [  276.648671]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
      [  276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]---
      
      We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach()
      function when we attach a backing device first time.  After detaching this
      backing device, this flag will be true and sysfs_remove_link() isn't called in
      bcache_device_unlink().  Then when we attach this backing device again,
      sysfs_create_link() will return EEXIST error in bcache_device_link().
      
      So the fix is trival and we clear this flag in bcache_device_link().
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      fecaee6f
    • Kent Overstreet's avatar
      bcache: Add a cond_resched() call to gc · c5f1e5ad
      Kent Overstreet authored
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c5f1e5ad
    • Zheng Liu's avatar
      bcache: fix a livelock when we cause a huge number of cache misses · 2ef9ccbf
      Zheng Liu authored
      Subject :	[PATCH v2] bcache: fix a livelock in btree lock
      Date :	Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM)
      
      This commit tries to fix a livelock in bcache.  This livelock might
      happen when we causes a huge number of cache misses simultaneously.
      
      When we get a cache miss, bcache will execute the following path.
      
      ->cached_dev_make_request()
        ->cached_dev_read()
          ->cached_lookup()
            ->bch->btree_map_keys()
              ->btree_root()  <------------------------
                ->bch_btree_map_keys_recurse()        |
                  ->cache_lookup_fn()                 |
                    ->cached_dev_cache_miss()         |
                      ->bch_btree_insert_check_key() -|
                        [If btree->seq is not equal to seq + 1, we should return
                         EINTR and traverse btree again.]
      
      In bch_btree_insert_check_key() function we first need to check upgrade
      flag (op->lock == -1), and when this flag is true we need to release
      read btree->lock and try to take write btree->lock.  During taking and
      releasing this write lock, btree->seq will be monotone increased in
      order to prevent other threads modify this in cache miss (see btree.h:74).
      But if there are some cache misses caused by some requested, we could
      meet a livelock because btree->seq is always changed by others.  Thus no
      one can make progress.
      
      This commit will try to take write btree->lock if it encounters a race
      when we traverse btree.  Although it sacrifice the scalability but we
      can ensure that only one can modify the btree.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Joshua Schmid <jschmid@suse.com>
      Cc: Zhu Yanhai <zhu.yanhai@gmail.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2ef9ccbf
  5. 23 Dec, 2015 1 commit
  6. 22 Dec, 2015 1 commit
    • Shraddha Barke's avatar
      block: sx8.c: Replace timeval with ktime_t · 8182503d
      Shraddha Barke authored
      32-bit systems using 'struct timeval' will break in the year 2038,
      in order to avoid that replace the code with more appropriate types.
      This patch replaces timeval with 64 bit ktime_t which is y2038 safe.
      Since st->timestamp is only interested in seconds, directly using
      time64_t here. Function ktime_get_seconds is used since it uses
      monotonic instead of real time and thus will not cause overflow.
      Signed-off-by: default avatarShraddha Barke <shraddha.6596@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8182503d
  7. 25 Nov, 2015 6 commits