1. 27 Oct, 2017 6 commits
  2. 23 Oct, 2017 2 commits
  3. 20 Oct, 2017 5 commits
    • James Smart's avatar
      nvme-fc: correct io timeout behavior · 134aedc9
      James Smart authored
      The transport io timeout behavior wasn't quite correct. It ignored
      that the io error handler is supposed to be synchronous so it possibly
      allowed the blk request to be restarted while the io associated was
      still aborting. Timeouts on reserved commands, those used for
      association create, were never timing out thus they hung out forever.
      
      To correct:
      If an io is times out while a remoteport is not connected, just
      restart the io timer. The lack of connectivity will simultaneously
      be resetting the controller, so the reset path will abort and terminate
      the io.
      
      If an io is times out while it was marked for transport abort, just
      reset the io timer. The abort process is underway and will complete
      the io.
      
      Otherwise, if an io times out, abort the io. If the abort was
      unsuccessful (unlikely) give up and return not handled.
      
      If the abort was successful, as the abort process is underway it will
      terminate the io, so rather than synchronously waiting, just restart
      the io timer.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      134aedc9
    • James Smart's avatar
      nvme-fc: correct io termination handling · 0a02e39f
      James Smart authored
      The io completion handling for i/o's that are failing due to
      to a transport error or association termination had issues, causing
      io failures (DNR set so retries didn't kick in) or long stalls.
      
      Change the io completion handler for the following items:
      
      When an io has been completed due to a transport abort (based on an
      exchange error) or when marked as aborted as part of an association
      termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
      NVME_SC_ABORTED. By default, do not set DNR on the status so that a
      retry can be attempted after association recreate.
      
      In cases where an io is failed (non-successful nvme status including
      aborted), if the controller is being deleted (blk_queue_dying) or
      the io was part of the ios used for association creation (ctrl state
      is NEW or RECONNECTING), then additionally set the DNR bit so the io
      will not be retried. If the failed io was part of association creation,
      the failure will tear down the partially completioned association and
      typically restart a new reconnect attempt (another create association
      later).
      
      Rearranged code flow to remove a largely unneeded local variable.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      0a02e39f
    • Chaitanya Kulkarni's avatar
      nvme-pci: add SGL support · a7a7cbe3
      Chaitanya Kulkarni authored
      This adds SGL support for NVMe PCIe driver, based on an earlier patch
      from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch
      refactors the original code and adds new module parameter sgl_threshold
      to determine whether to use SGL or PRP for IOs.
      
      The usage of SGLs is controlled by the sgl_threshold module parameter,
      which allows to conditionally use SGLs if average request segment
      size (avg_seg_size) is greater than sgl_threshold. In the original patch,
      the decision of using SGLs was dependent only on the IO size,
      with the new approach we consider not only IO size but also the
      number of physical segments present in the IO.
      
      We calculate avg_seg_size based on request payload bytes and number
      of physical segments present in the request.
      
      For e.g.:-
      
      1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k
      avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
      
      2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k
      avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.
      
      3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k
      avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a7a7cbe3
    • Christoph Hellwig's avatar
      nvme: use ida_simple_{get,remove} for the controller instance · 9843f685
      Christoph Hellwig authored
      Switch to the ida_simple_* helpers instead of opencoding them.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      9843f685
    • Roy Shterman's avatar
      nvmet: Change max_nsid in subsystem due to ns_disable if needed · ba2dec35
      Roy Shterman authored
      In case we disable namespaces which has the nsid like
      subsystem max_nsid we need to search for the next largest nsid
      in this subsystem. If the subsystem don't has more namespaces
      we set it to 0, else we take nsid from the last namespace in
      namespaces list because the list is sorted while inserting.
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarRoy Shterman <roys@lightbitslabs.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      [hch: slight refactor]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ba2dec35
  4. 19 Oct, 2017 4 commits
  5. 18 Oct, 2017 11 commits
  6. 16 Oct, 2017 1 commit
  7. 05 Oct, 2017 1 commit
  8. 04 Oct, 2017 6 commits
  9. 03 Oct, 2017 4 commits
    • Christoph Hellwig's avatar
      block: move __elv_next_request to blk-core.c · 9c988374
      Christoph Hellwig authored
      No need to have this helper inline in a header.  Also drop the __ prefix.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c988374
    • Paolo Valente's avatar
      block, bfq: decrease burst size when queues in burst exit · 7cb04004
      Paolo Valente authored
      If many queues belonging to the same group happen to be created
      shortly after each other, then the concurrent processes associated
      with these queues have typically a common goal, and they get it done
      as soon as possible if not hampered by device idling.  Examples are
      processes spawned by git grep, or by systemd during boot. As for
      device idling, this mechanism is currently necessary for weight
      raising to succeed in its goal: privileging I/O.  In view of these
      facts, BFQ does not provide the above queues with either weight
      raising or device idling.
      
      On the other hand, a burst of queue creations may be caused also by
      the start-up of a complex application. In this case, these queues need
      usually to be served one after the other, and as quickly as possible,
      to maximise responsiveness. Therefore, in this case the best strategy
      is to weight-raise all the queues created during the burst, i.e., the
      exact opposite of the strategy for the above case.
      
      To distinguish between the two cases, BFQ uses an empirical burst-size
      threshold, found through extensive tests and monitoring of daily
      usage. Only large bursts, i.e., burst with a size above this
      threshold, are considered as generated by a high number of parallel
      processes. In this respect, upstart-based boot proved to be rather
      hard to detect as generating a large burst of queue creations, because
      with upstart most of the queues created in a burst exit *before* the
      next queues in the same burst are created. To address this issue, I
      changed the burst-detection mechanism so as to not decrease the size
      of the current burst even if one of the queues in the burst is
      eliminated.
      
      Unfortunately, this missing decrease causes false positives on very
      fast systems: on the start-up of a complex application, such as
      libreoffice writer, so many queues are created, served and exited
      shortly after each other, that a large burst of queue creations is
      wrongly detected as occurring. These false positives just disappear if
      the size of a burst is decreased when one of the queues in the burst
      exits. This commit restores the missing burst-size decrease, relying
      of the fact that upstart is apparently unlikely to be used on systems
      running this and future versions of the kernel.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarMauro Andreolini <mauro.andreolini@unimore.it>
      Signed-off-by: default avatarAngelo Ruocco <angeloruocco90@gmail.com>
      Tested-by: default avatarMirko Montanari <mirkomontanari91@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7cb04004
    • Paolo Valente's avatar
      block, bfq: let early-merged queues be weight-raised on split too · 894df937
      Paolo Valente authored
      A just-created bfq_queue, say Q, may happen to be merged with another
      bfq_queue on the very first invocation of the function
      __bfq_insert_request. In such a case, even if Q would clearly deserve
      interactive weight raising (as it has just been created), the function
      bfq_add_request does not make it to be invoked for Q, and thus to
      activate weight raising for Q. As a consequence, when the state of Q
      is saved for a possible future restore, after a split of Q from the
      other bfq_queue(s), such a state happens to be (unjustly)
      non-weight-raised. Then the bfq_queue will not enjoy any weight
      raising on the split, even if should still be in an interactive
      weight-raising period when the split occurs.
      
      This commit solves this problem as follows, for a just-created
      bfq_queue that is being early-merged: it stores directly, in the saved
      state of the bfq_queue, the weight-raising state that would have been
      assigned to the bfq_queue if not early-merged.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarAngelo Ruocco <angeloruocco90@gmail.com>
      Tested-by: default avatarMirko Montanari <mirkomontanari91@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      894df937
    • Paolo Valente's avatar
      block, bfq: check and switch back to interactive wr also on queue split · 3e2bdd6d
      Paolo Valente authored
      As already explained in the message of commit "block, bfq: fix
      wrong init of saved start time for weight raising", if a soft
      real-time weight-raising period happens to be nested in a larger
      interactive weight-raising period, then BFQ restores the interactive
      weight raising at the end of the soft real-time weight raising. In
      particular, BFQ checks whether the latter has ended only on request
      dispatches.
      
      Unfortunately, the above scheme fails to restore interactive weight
      raising in the following corner case: if a bfq_queue, say Q,
      1) Is merged with another bfq_queue while it is in a nested soft
      real-time weight-raising period. The weight-raising state of Q is
      then saved, and not considered any longer until a split occurs.
      2) Is split from the other bfq_queue(s) at a time instant when its
      soft real-time weight raising is already finished.
      On the split, while resuming the previous, soft real-time
      weight-raised state of the bfq_queue Q, BFQ checks whether the
      current soft real-time weight-raising period is actually over. If so,
      BFQ switches weight raising off for Q, *without* checking whether the
      soft real-time period was actually nested in a non-yet-finished
      interactive weight-raising period.
      
      This commit addresses this issue by adding the above missing check in
      bfq_queue splits, and restoring interactive weight raising if needed.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarAngelo Ruocco <angeloruocco90@gmail.com>
      Tested-by: default avatarMirko Montanari <mirkomontanari91@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3e2bdd6d