1. 19 Sep, 2016 5 commits
    • Gabriel Krisman Bertazi's avatar
      UBUNTU: SAUCE: nvme: Don't suspend admin queue that wasn't created · d2b59eef
      Gabriel Krisman Bertazi authored
      Pending 4.8-rc merge.
      BugLink: http://bugs.launchpad.net/bugs/1602724
      
      This fixes a regression in my previous commit c21377f8 ("nvme:
      Suspend all queues before deletion"), which provoked an Oops in the
      removal path when removing a device that became IO incapable very early
      at probe (i.e. after a failed EEH recovery).
      
      Turns out, if the error occurred very early at the probe path, before
      even configuring the admin queue, we might try to suspend the
      uninitialized admin queue, accessing bad memory.
      
      Fixes: c21377f8 ("nvme: Suspend all queues before deletion")
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Reviewed-by: default avatarJay Freyensee <james_p_freyensee@linux.intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d2b59eef
    • Jens Axboe's avatar
      blk-mq: don't overwrite rq->mq_ctx · f463371e
      Jens Axboe authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      We do this in a few places, if the CPU is offline. This isn't allowed,
      though, since on multi queue hardware, we can't just move a request
      from one software queue to another, if they map to different hardware
      queues. The request and tag isn't valid on another hardware queue.
      
      This can happen if plugging races with CPU offlining. But it does
      no harm, since it can only happen in the window where we are
      currently busy freezing the queue and flushing IO, in preparation
      for redoing the software <-> hardware queue mappings.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit e57690fe)
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f463371e
    • Jens Axboe's avatar
      blk-mq: improve warning for running a queue on the wrong CPU · 8b6f6135
      Jens Axboe authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      __blk_mq_run_hw_queue() currently warns if we are running the queue on a
      CPU that isn't set in its mask. However, this can happen if a CPU is
      being offlined, and the workqueue handling will place the work on CPU0
      instead. Improve the warning so that it only triggers if the batch cpu
      in the hardware queue is currently online.  If it triggers for that
      case, then it's indicative of a flow problem in blk-mq, so we want to
      retain it for that case.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit 0e87e58b)
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8b6f6135
    • Gabriel Krisman Bertazi's avatar
      blk-mq: Allow timeouts to run while queue is freezing · 702b11d1
      Gabriel Krisman Bertazi authored
      BugLink: http://bugs.launchpad.net/bugs/1620317
      
      In case a submitted request gets stuck for some reason, the block layer
      can prevent the request starvation by starting the scheduled timeout work.
      If this stuck request occurs at the same time another thread has started
      a queue freeze, the blk_mq_timeout_work will not be able to acquire the
      queue reference and will return silently, thus not issuing the timeout.
      But since the request is already holding a q_usage_counter reference and
      is unable to complete, it will never release its reference, preventing
      the queue from completing the freeze started by first thread.  This puts
      the request_queue in a hung state, forever waiting for the freeze
      completion.
      
      This was observed while running IO to a NVMe device at the same time we
      toggled the CPU hotplug code. Eventually, once a request got stuck
      requiring a timeout during a queue freeze, we saw the CPU Hotplug
      notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
      the trace below.
      
      [c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
      [c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
      [c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
      [c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
      [c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
      [c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
      [c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
      [c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
      [c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
      [c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
      [c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
      [c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
      [c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
      [c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
      [c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
      [c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
      [c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
      [c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
      [c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
      [c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4
      
      The fix is to allow the timeout work to execute in the window between
      dropping the initial refcount reference and the release of the last
      reference, which actually marks the freeze completion.  This can be
      achieved with percpu_refcount_tryget, which does not require the counter
      to be alive.  This way the timeout work can do it's job and terminate a
      stuck request even during a freeze, returning its reference and avoiding
      the deadlock.
      
      Allowing the timeout to run is just a part of the fix, since for some
      devices, we might get stuck again inside the device driver's timeout
      handler, should it attempt to allocate a new request in that path -
      which is a quite common action for Abort commands, which need to be sent
      after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
      from inside the timeout handler, which will fail during a freeze, since
      it also tries to acquire a queue reference.
      
      I considered a similar change to blk_mq_alloc_request as a generic
      solution for further device driver hangs, but we can't do that, since it
      would allow new requests to disturb the freeze process.  I thought about
      creating a new function in the block layer to support unfreezable
      requests for these occasions, but after working on it for a while, I
      feel like this should be handled in a per-driver basis.  I'm now
      experimenting with changes to the NVMe timeout path, but I'm open to
      suggestions of ways to make this generic.
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-block@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      (cherry picked from commit 71f79fb3)
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      702b11d1
    • Luke Dashjr's avatar
      btrfs: bugfix: handle FS_IOC32_{GETFLAGS, SETFLAGS, GETVERSION} in btrfs_ioctl · 50603667
      Luke Dashjr authored
      BugLink: http://bugs.launchpad.net/bugs/1619918
      
      32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
      be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
      fail.
      Signed-off-by: default avatarLuke Dashjr <luke-jr+git@utopios.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      (back ported from commit 4c63c245)
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      
       Conflicts:
      	fs/btrfs/ctree.h
      Acked-by: default avatarBrad Figg <brad.figg@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      50603667
  2. 16 Sep, 2016 35 commits