- 10 Jul, 2017 6 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull followup NVMe (mostly) changes from Sagi: I added the quiesce/unquiesce patches in here as it's easy for me easily apply changes on top. It has accumulated reviews and includes mostly nvme anyway, please tell me if you don't want to take them with this. This includes: - quiesce/unquiesce fixes in nvme and others from me - nvme-fc add create association padding spec updates from James - some more quirking from MKP - nvmet nit cleanup from Max - Fix nvme-rdma racy RDMA completion signalling from Marta - some centralization patches from me - add tagset nr_hw_queues updates on controller resets in nvme drivers from me - nvme-rdma fix resources recycling when doing error recovery from me - minor cleanups in nvme-fc from me
-
Max Gurtovoy authored
We actually using the cookie returned from the last submit_bio call. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
weiping zhang authored
Adjust io queue depth more easily, and make sure io queue depth >= 2. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Dan Carpenter authored
"i" should be signed or it could cause a forever loop on the cleanup path. "size" can be used uninitialized. Fixes: 87ad72a5 ("nvme-pci: implement host memory buffer support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
James Smart authored
Target validation of the Create Association LS revised to accept any LS as long as all non-pad data has been received. This allows a (newer) target to accept the LS from older initiators with varying pad lengths. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
James Smart authored
Revises the Create Association LS for the amount of pad expected in 1.16. Add defines for the minimum lengths that a target can accept (e.g. variable pad lengths) Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
- 07 Jul, 2017 2 commits
-
-
Javier González authored
Remove unnecessary checks when freeing dma memory in the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Javier González authored
When removing a pblk instance, control the write I/O flow to the controller as we do in the fast path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 06 Jul, 2017 14 commits
-
-
Christoph Hellwig authored
The changes in "block: Make most scsi_req_init() calls implicit" mean that every driver that supports the generic scsi ioctls needs to call scsi_req_init on newly allocated requests, but that commit didn't add the call to the ccіss driver. Fix that to avoid crashes when udev issues SG_IO commands. Fixes: ca18d6f7 ("block: Make most scsi_req_init() calls implicit") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Max Gurtovoy authored
In case we use shared tags feature, blk_mq_alloc_tag_set might fail during module initialization. In that case, fail the load with a suitable error code. Also move the tagset initialization process after defining the amount of submission queues. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
The BIO issuing loop in __blkdev_issue_zeroout() is allocating BIOs with a maximum number of bvec (pages) equal to min(nr_sects, (sector_t)BIO_MAX_PAGES) This works since the requested number of bvecs will always be limited to the absolute maximum number supported (BIO_MAX_PAGES), but this is ineficient as too many bvec entries may be requested due to the different units being used in the min() operation (number of sectors vs number of pages). To fix this, introduce the helper __blkdev_sectors_to_bio_pages() to correctly calculate the number of bvecs for zeroout BIOs as the issuing loop progresses. The calculation is done using consistent units and makes sure that the number of pages return is at least 1 (for cases where the number of sectors is less that the number of sectors in a page). Also remove a trailing space after the bit shift in the internal loop min() call. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
When our RDMA queue-pair is torn down with high load of I/O traffic, we have no way of knowing if the memory region was actually registered by the reg_mr work request as it completion flushes with error (hw might have done it or not). So in order to not deal with all this uncertanty, we simply recycle the MR in reinit_request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Usually before we teardown the controller we want to: 1. complete/cancel any ctrl inflight works 2. remove ctrl namespaces (only for removal though, resets shouldn't remove any namespaces). but we do not want to destroy the controller device as we might use it for logging during the teardown stage. This patch adds nvme_start_ctrl() which queues inflight controller works (aen, ns scan, queue start and keep-alive if kato is set) and nvme_stop_ctrl() which cancels the works namespace removal is left to the callers to handle. Move nvme_uninit_ctrl after we are done with the controller device. Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Without it its not guaranteed that no .queue_rq is inflight. Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: virtio-dev@lists.oasis-open.org Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Unlike blk_mq_stop_hw_queues, blk_mq_quiesce_queue respects the submission path rcu grace. quiesce the queue before iterating on live tags, or performing device io quiescing. While were at it, verify that the request started in mtip_abort_cmd amd mtip_queue_cmd tag iteration calls. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Unlike blk_mq_stop_hw_queues, blk_mq_quiesce_queue respects the submission path rcu grace. quiesce the queue before iterating on live tags. Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
When we requeue a request, we can always insert the request back to the scheduler instead of doing it when restarting the queues and kicking the requeue work, so get rid of the requeue kick in nvme (core and drivers). Also, now there is no need start hw queues in nvme_kill_queues We don't stop the hw queues anymore, so no need to start them. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace. Also, make sure to unquiesce before cleanup the admin queue. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-By: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace. Also make sure to kick the requeue list when appropriate. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Marta Rybczynska authored
This patch improves the way the RDMA IB signalling is done by using atomic operations for the signalling variable. This avoids race conditions on sig_count. The signalling interval changes slightly and is now the largest power of two not larger than queue depth / 2. ilog() usage idea by Bart Van Assche. Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
-
- 05 Jul, 2017 1 commit
-
-
Ming Lei authored
It is observed reading the register from HW takes a bit long, for example in my box, the following difference of 'perf report --no-children fio ...' can be seen when running I/O: 1) V4.12 without patch + 9.28% fio [mtip32xx] [k] mtip_irq_handler + 8.48% fio [mtip32xx] [k] mtip_init_cmd_header 2) V4.12 with the following patch + 9.14% fio [mtip32xx] [k] mtip_irq_handler ...... + 1.14% fio [mtip32xx] [k] mtip_init_cmd_header IOPS can be increased by ~5% with this patch too. Fixes: a4e84aae(mtip32xx: use runtime tag to initialize command header) Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 04 Jul, 2017 6 commits
-
-
kbuild test robot authored
block/bio-integrity.c:318:10-11: WARNING: return of 0/1 in function 'bio_integrity_prep' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci Fixes: e23947bd ("bio-integrity: fold bio_integrity_enabled to bio_integrity_prep") CC: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Cc: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
We might have more/less queues once we reconnect/reset. For example due to cpu going online/offline or controller constraints. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
We might have more/less queues once we reconnect/reset. For example due to cpu going online/offline Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
We might have more/less queues once we reconnect/reset. For example due to cpu going online/offline or controller constraints. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
Its what the user passed, so its probably a better idea to keep it intact. Also, limit the number of I/O queues to max online cpus and the lport maximum hw queues. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
- 03 Jul, 2017 11 commits
-
-
Christoph Hellwig authored
And instead call directly into the integrity code from bio_end_io. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Currently ->verify_fn not woks at all because at the moment it is called bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all. In order to perform verification we need to know original data vector, with new bvec rewind API this is trivial. testcase: https://github.com/dmonakhov/xfstests/commit/3c6509eaa83b9c17cd0bc95d73fcdd76e1c54a85Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> [hch: adopted for new status values] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers) need to know original data vector, but after bio traverse io-stack it may be advanced, splited and relocated many times so it is hard to guess original iterator. Let's add 'bi_done' conter which accounts number of bytes iterator was advanced during it's evolution. Later end_io handler may easily restore original iterator by rewinding iterator to iter->bi_done. Note: this change makes sizeof (struct bvec_iter) multiple to 8 Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> [hch: switched to true/false return] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Currently if some one try to advance bvec beyond it's size we simply dump WARN_ONCE and continue to iterate beyond bvec array boundaries. This simply means that we endup dereferencing/corrupting random memory region. Sane reaction would be to propagate error back to calling context But bvec_iter_advance's calling context is not always good for error handling. For safity reason let truncate iterator size to zero which will break external iteration loop which prevent us from unpredictable memory range corruption. And even it caller ignores an error, it will corrupt it's own bvecs, not others. This patch does: - Return error back to caller with hope that it will react on this - Truncate iterator size Code was added long time ago here 4550dd6c, luckily no one hit it in real life :) Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> [hch: switch to true/false returns instead of errno values] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Currently all integrity prep hooks are open-coded, and if prepare fails we ignore it's code and fail bio with EIO. Let's return real error to upper layer, so later caller may react accordingly. In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled, so it is reasonable to fold it in to one function. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> [hch: merged with the latest block tree, return bool from bio_integrity_prep] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
bio_integrity_trim inherent it's interface from bio_trim and accept offset and size, but this API is error prone because data offset must always be insync with bio's data offset. That is why we have integrity update hook in bio_advance() So only meaningful values are: offset == 0, sectors == bio_sectors(bio) Let's just remove them completely. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
SCSI drivers do care about bip_seed so we must update it accordingly. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dmitry Monakhov authored
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
When mq-deadline is taken, IOPS of sequential read and seqential write is observed more than 20% drop on sata(scsi-mq) devices, compared with using 'none' scheduler. The reason is that the default nr_requests for scheduler is too big for small queuedepth devices, and latency is increased much. Since the principle of taking 256 requests for mq scheduler is based on 128 queue depth, this patch changes into double size of min(hw queue_depth, 128). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
On each deactivation or re-scheduling (after being served) of a bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(), to perform pending updates of ioprio, weight and ioprio class for the bfq_queue. BFQ also invokes this function on I/O-request dispatches, to raise or lower weights more quickly when needed, thereby improving latency. However, the entity representing the bfq_queue may be on the active (sub)tree of a service tree when this happens, and, although with a very low probability, the bfq_queue may happen to also have a pending change of its ioprio class. If both conditions hold when __bfq_entity_update_weight_prio() is invoked, then the entity moves to a sort of hybrid state: the new service tree for the entity, as returned by bfq_entity_service_tree(), differs from service tree on which the entity still is. The functions that handle activations and deactivations of entities do not cope with such a hybrid state (and would need to become more complex to cope). This commit addresses this issue by just making __bfq_entity_update_weight_prio() not perform also a possible pending change of ioprio class, when invoked on an I/O-request dispatch for a bfq_queue. Such a change is thus postponed to when __bfq_entity_update_weight_prio() is invoked on deactivation or re-scheduling of the bfq_queue. Reported-by: Marco Piazza <mpiazza@gmail.com> Reported-by: Laurentiu Nicola <lnicola@dend.ro> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Tested-by: Marco Piazza <mpiazza@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-