1. 06 Sep, 2019 5 commits
    • Damien Le Moal's avatar
      block: Delay default elevator initialization · 737eb78e
      Damien Le Moal authored
      When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
      the only information known about the device is the number of hardware
      queues as the block device scan by the device driver is not completed
      yet for most drivers. The device type and elevator required features
      are not set yet, preventing to correctly select the default elevator
      most suitable for the device.
      
      This currently affects all multi-queue zoned block devices which default
      to the "none" elevator instead of the required "mq-deadline" elevator.
      These drives currently include host-managed SMR disks connected to a
      smartpqi HBA and null_blk block devices with zoned mode enabled.
      Upcoming NVMe Zoned Namespace devices will also be affected.
      
      Fix this by adding the boolean elevator_init argument to
      blk_mq_init_allocated_queue() to control the execution of
      elevator_init_mq(). Two cases exist:
      1) elevator_init = false is used for calls to
         blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
         case, a call to elevator_init_mq() is added to __device_add_disk(),
         resulting in the delayed initialization of the queue elevator
         after the device driver finished probing the device information. This
         effectively allows elevator_init_mq() access to more information
         about the device.
      2) elevator_init = true preserves the current behavior of initializing
         the elevator directly from blk_mq_init_allocated_queue(). This case
         is used for the special request based DM devices where the device
         gendisk is created before the queue initialization and device
         information (e.g. queue limits) is already known when the queue
         initialization is executed.
      
      Additionally, to make sure that the elevator initialization is never
      done while requests are in-flight (there should be none when the device
      driver calls device_add_disk()), freeze and quiesce the device request
      queue before calling blk_mq_init_sched() in elevator_init_mq().
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      737eb78e
    • Damien Le Moal's avatar
      block: Improve default elevator selection · a0958ba7
      Damien Le Moal authored
      For block devices that do not specify required features, preserve the
      current default elevator selection (mq-deadline for single queue
      devices, none for multi-queue devices). However, for devices specifying
      required features (e.g. zoned block devices ELEVATOR_F_ZBD_SEQ_WRITE
      feature), select the first available elevator providing the required
      features.
      
      In all cases, default to "none" if no elevator is available or if the
      initialization of the default elevator fails.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a0958ba7
    • Damien Le Moal's avatar
      block: Introduce elevator features · 68c43f13
      Damien Le Moal authored
      Introduce the definition of elevator features through the
      elevator_features flags in the elevator_type structure. Each flag can
      represent a feature supported by an elevator. The first feature defined
      by this patch is support for zoned block device sequential write
      constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
      by the mq-deadline elevator using zone write locking.
      
      Other possible features are IO priorities, write hints, latency targets
      or single-LUN dual-actuator disks (for which the elevator could maintain
      one LBA ordered list per actuator).
      
      The required_elevator_features field is also added to the request_queue
      structure to allow a device driver to specify elevator feature flags
      that an elevator must support for the correct operation of the device
      (e.g. device drivers for zoned block devices can have the
      ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
      The helper function blk_queue_required_elevator_features() is
      defined for setting this new field.
      
      With these two new fields in place, the elevator functions
      elevator_match() and elevator_find() are modified to allow a user to set
      only an elevator with a set of features that satisfies the device
      required features. Elevators not matching the device requirements are
      not shown in the device sysfs queue/scheduler file to prevent their use.
      
      The "none" elevator can always be selected as before.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      68c43f13
    • Damien Le Moal's avatar
      block: Change elevator_init_mq() to always succeed · 954b4a5c
      Damien Le Moal authored
      If the default elevator chosen is mq-deadline, elevator_init_mq() may
      return an error if mq-deadline initialization fails, leading to
      blk_mq_init_allocated_queue() returning an error, which in turn will
      cause the block device initialization to fail and the device not being
      exposed.
      
      Instead of taking such extreme measure, handle mq-deadline
      initialization failures in the same manner as when mq-deadline is not
      available (no module to load), that is, default to the "none" scheduler.
      With this change, elevator_init_mq() return type can be changed to void.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      954b4a5c
    • Damien Le Moal's avatar
      block: Cleanup elevator_init_mq() use · 61db437d
      Damien Le Moal authored
      Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
      calling elevator_init_mq() to make sure that the queue supports IO
      scheduling, use the elevator.c function elv_support_iosched() in
      elevator_init_mq(). This does not introduce any functional change but
      ensure that elevator_init_mq() does the right thing based on the queue
      settings.
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      61db437d
  2. 05 Sep, 2019 2 commits
  3. 04 Sep, 2019 4 commits
  4. 03 Sep, 2019 10 commits
  5. 31 Aug, 2019 1 commit
  6. 30 Aug, 2019 3 commits
    • Jens Axboe's avatar
      Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-5.4/block · 8f5914bc
      Jens Axboe authored
      Pull NVMe changes from Sagi:
      
      "The nvme updates include:
       - ana log parse fix from Anton
       - nvme quirks support for Apple devices from Ben
       - fix missing bio completion tracing for multipath stack devices from
         Hannes and Mikhail
       - IP TOS settings for nvme rdma and tcp transports from Israel
       - rq_dma_dir cleanups from Israel
       - tracing for Get LBA Status command from Minwoo
       - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
       - Some consolidation between the fabrics transports for handling the CAP
         register
       - reset race with ns scanning fix for fabrics (move fabrics commands to
         a dedicated request queue with a different lifetime from the admin
         request queue)."
      
      * 'nvme-5.4' of git://git.infradead.org/nvme: (30 commits)
        nvme-rdma: Use rq_dma_dir macro
        nvme-fc: Use rq_dma_dir macro
        nvme-pci: Tidy up nvme_unmap_data
        nvme: make fabrics command run on a separate request queue
        nvme-pci: Support shared tags across queues for Apple 2018 controllers
        nvme-pci: Add support for Apple 2018+ models
        nvme-pci: Add support for variable IO SQ element size
        nvme-pci: Pass the queue to SQ_SIZE/CQ_SIZE macros
        nvme: trace bio completion
        nvme-multipath: fix ana log nsid lookup when nsid is not found
        nvmet-tcp: Add TOS for tcp transport
        nvme-tcp: Add TOS for tcp transport
        nvme-tcp: Use struct nvme_ctrl directly
        nvme-rdma: Add TOS for rdma transport
        nvme-fabrics: Add type of service (TOS) configuration
        nvmet-tcp: fix possible memory leak
        nvmet-tcp: fix possible NULL deref
        nvmet: trace: parse Get LBA Status command in detail
        nvme: trace: parse Get LBA Status command in detail
        nvme: trace: support for Get LBA Status opcode parsed
        ...
      8f5914bc
    • Tejun Heo's avatar
      writeback: add tracepoints for cgroup foreign writebacks · 3a8e9ac8
      Tejun Heo authored
      cgroup foreign inode handling has quite a bit of heuristics and
      internal states which sometimes makes it difficult to understand
      what's going on.  Add tracepoints to improve visibility.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3a8e9ac8
    • Tejun Heo's avatar
      blkcg: add missing NULL check in ioc_cpd_alloc() · e916ad29
      Tejun Heo authored
      ioc_cpd_alloc() forgot to check NULL return from kzalloc().  Add it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e916ad29
  7. 29 Aug, 2019 15 commits