- 19 Sep, 2022 12 commits
-
-
Keith Busch authored
It's only true or false, so make this a bool to reflect that and save some space in nvme_iod. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
We can get the nvme_queue from the req just as easily, so remove the duplicate path to the same structure to save some space. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Daniel Wagner authored
It's perfectly fine to use the same traddr and trsvcid more than once as long we use different host interface. This is used in setups where the host has more than one interface but the target exposes only one traddr/trsvcid combination. Use the same acceptance rules for host_iface as we have for host_traddr. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Daniel Wagner authored
On reconnect, the number of queues might have changed. In the case where we have more queues available than previously we try to access queues which are not initialized yet. The other case where we have less queues than previously, the connection attempt will fail because the target doesn't support the old number of queues and we end up in a reconnect loop. Thus, only start queues which are currently present in the tagset limited by the number of available queues. Then we update the tagset and we can start any new queue. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Daniel Wagner authored
On reconnect, the number of queues might have changed. In the case where we have more queues available than previously we try to access queues which are not initialized yet. The other case where we have less queues than previously, the connection attempt will fail because the target doesn't support the old number of queues and we end up in a reconnect loop. Thus, only start queues which are currently present in the tagset limited by the number of available queues. Then we update the tagset and we can start any new queue. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Daniel Wagner authored
Allow to set the max queues the target supports. This is useful for testing the reconnect attempt of the host with changing numbers of supported queues. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Guixin Liu authored
For no volatile write cache block device backend, sending flush bio is unnecessary, avoid to do that. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Genjian Zhang authored
The parameter is not used in this function, so remove it. Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Jackie Liu authored
Jump directly to done_kfree to release d, which is consistent with the code style behind. Reported-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Jackie Liu authored
req->cqe->result.u16 has already been assigned in the previous line, no need to do it again. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Wolfram Sang authored
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Linjun Bao authored
Current "fake" nqn field is "nqn.2014.08.org.nvmexpress:", it is not aligned with the canonical version for history reasons. Signed-off-by: Linjun Bao <meljbao@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 12 Sep, 2022 9 commits
-
-
Gaosheng Cui authored
w_start_resync has been removed since commit ac0acb9e ("drbd: use drbd_device_post_work() in more places"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
tg_update_disptime() only need to adjust postion for 'tg' in 'parent_sq', there is no need to call throtl_enqueue/dequeue_tg(), since they will set/clear flag THROTL_TG_PENDING and increase/decrease nr_pending, which is useless. By the way, clear the flag/decrease nr_pending while there are still throttled bios is not good for debugging. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
It's a little weird to call throtl_dequeue_tg() unconditionally in throtl_select_dispatch(), since it will be called in tg_update_disptime() again if some bio is still throttled. Thus call it later if there are no throttled bio. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Make the code easier to read, like everywhere else. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220827101637.1775111-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
If new configuration is submitted while a bio is throttled, then new waiting time is recalculated regardless that the bio might already wait for some time: tg_conf_updated throtl_start_new_slice tg_update_disptime throtl_schedule_next_dispatch Then io hung can be triggered by always submmiting new configuration before the throttled bio is dispatched. Fix the problem by respecting the time that throttled bio already waited. In order to do that, add new fields to record how many bytes/io are waited, and use it to calculate wait time for throttled bio under new configuration. Some simple test: 1) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 2048" > blkio.throttle.write_bps_device { sleep 2 echo "8:0 1024" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct 2) cd /sys/fs/cgroup/blkio/ echo $$ > cgroup.procs echo "8:0 1024" > blkio.throttle.write_bps_device { sleep 4 echo "8:0 2048" > blkio.throttle.write_bps_device } & dd if=/dev/zero of=/dev/sda bs=8k count=1 oflag=direct test results: io finish time before this patch with this patch 1) 10s 6s 2) 8s 6s Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
No functional changes, new apis will be used in later patches to calculate wait time for throttled bios when new configuration is submitted. Noted this patch also rename tg_with_in_iops/bps_limit() to tg_within_iops/bps_limit(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
There is a problem found by code review in tg_with_in_bps_limit() that 'bps_limit * jiffy_elapsed_rnd' might overflow. Fix the problem by calling mul_u64_u64_div_u64() instead. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be883 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c ("block: throttle split bio in case of iops limit") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 08 Sep, 2022 3 commits
-
-
Uros Bizjak authored
Use atomic_long_try_cmpxchg instead of atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications, e.g. an extra memory read can be avoided in the loop. No functional change intended. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Shigeru Yoshida authored
syzbot reported hung task [1]. The following program is a simplified version of the reproducer: int main(void) { int sv[2], fd; if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) < 0) return 1; if ((fd = open("/dev/nbd0", 0)) < 0) return 1; if (ioctl(fd, NBD_SET_SIZE_BLOCKS, 0x81) < 0) return 1; if (ioctl(fd, NBD_SET_SOCK, sv[0]) < 0) return 1; if (ioctl(fd, NBD_DO_IT) < 0) return 1; return 0; } When signal interrupt nbd_start_device_ioctl() waiting the condition atomic_read(&config->recv_threads) == 0, the task can hung because it waits the completion of the inflight IOs. This patch fixes the issue by clearing queue, not just shutdown, when signal interrupt nbd_start_device_ioctl(). Link: https://syzkaller.appspot.com/bug?id=7d89a3ffacd2b83fdd39549bc4d8e0a89ef21239 [1] Reported-by: syzbot+38e6c55d4969a14c1534@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220907163502.577561-1-syoshida@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
When __sbq_wake_up() decrements wait_cnt to 0 but races with someone else waking the waiter on the waitqueue (so the waitqueue becomes empty), it exits without reseting wait_cnt to wake_batch number. Once wait_cnt is 0, nobody will ever reset the wait_cnt or wake the new waiters resulting in possible deadlocks or busyloops. Fix the problem by making sure we reset wait_cnt even if we didn't wake up anybody in the end. Fixes: 040b83fc ("sbitmap: fix possible io hung due to lost wakeup") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220908130937.2795-1-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Sep, 2022 6 commits
-
-
Guoqing Jiang authored
It is not necessary since it is set later just before function return success. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-4-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Guoqing Jiang authored
Change the return type to void given it always returns 0. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-3-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Guoqing Jiang authored
Let's add some explanations here given the err handling is not obvious. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-2-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Miaohe Lin authored
bio_check_ro() always return false now. Remove this unneeded return value and cleanup the sole caller. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905102754.1942-1-linmiaohe@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Miaohe Lin authored
If code reaches here, needs_restart must be true. Remove this unneeded needs_restart check. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905101950.4606-1-linmiaohe@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jiapeng Chong authored
The variable added is not effectively used in the function, so delete it. block/blk-map.c:273:16: warning: variable 'added' set but not used. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2049Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220905063253.120082-1-jiapeng.chong@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 04 Sep, 2022 2 commits
-
-
Yu Kuai authored
While doing code coverage testing while CONFIG_BLK_DEV_THROTTLING_LOW is disabled, we found that there are many codes can never be reached. This patch move such codes inside "#ifdef CONFIG_BLK_DEV_THROTTLING_LOW". Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220903062826.1099085-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 16ede669. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 02 Sep, 2022 1 commit
-
-
Jens Axboe authored
This is useful for polled IO on a file, or for polled IO with the io_uring passthrough mechanism. If bio allocations are done with REQ_POLLED for those cases, then initializing the bio set with BIOSET_PERCPU_CACHE enables the local per-cpu cache which eliminates allocations (and frees) of bio structs when possible. Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 01 Sep, 2022 1 commit
-
-
Keith Busch authored
Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 26 Aug, 2022 1 commit
-
-
Liu Song authored
If "nr + nr_tags <= map_depth", then the value of nr_tags will not be greater than map_depth, so no additional comparison is required. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/1661483653-27326-1-git-send-email-liusong@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 24 Aug, 2022 1 commit
-
-
ye xingchen authored
Return the value from rtrs_clt_rdma_cq_direct() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220824075213.221397-1-ye.xingchen@zte.com.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 23 Aug, 2022 1 commit
-
-
Yu Kuai authored
There are two problems can lead to lost wakeup: 1) invalid wakeup on the wrong waitqueue: For example, 2 * wake_batch tags are put, while only wake_batch threads are woken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. 2) 'wait_cnt' can be decreased while waitqueue is empty As pointed out by Jan Kara, following race is possible: CPU1 CPU2 __sbq_wake_up __sbq_wake_up sbq_wake_ptr() sbq_wake_ptr() -> the same wait_cnt = atomic_dec_return() /* decreased to 0 */ sbq_index_atomic_inc() /* move to next waitqueue */ atomic_set() /* reset wait_cnt */ wake_up_nr() /* wake up on the old waitqueue */ wait_cnt = atomic_dec_return() /* * decrease wait_cnt in the old * waitqueue, while it can be * empty. */ Fix the problem by waking up before updating 'wake_index' and 'wait_cnt'. With this patch, noted that 'wait_cnt' is still decreased in the old empty waitqueue, however, the wakeup is redirected to a active waitqueue, and the extra decrement on the old empty waitqueue is not handled. Fixes: 88459642 ("blk-mq: abstract tag allocation out into sbitmap library") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Aug, 2022 3 commits
-
-
Jens Axboe authored
Avoid a kmalloc+kfree for each page array, if we only have a few pages that are mapped. An alloc+free for each IO is quite expensive, and it's pretty pointless if we're only dealing with 1 or a few vecs. Use UIO_FASTIOV like we do in other spots to set a sane limit for how big of an IO we want to avoid allocations for. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
bdev based polled O_DIRECT is currently quite a bit faster than passthru on the same device, and one of the reaons is that we're not able to use the bio caching for passthru IO. If REQ_POLLED is set on the request, use the fs bio set for grabbing a bio from the caches, if available. This saves 5-6% of CPU over head for polled passthru IO. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We don't need full ints for several of these members. Change the page_order and nr_entries to unsigned shorts, and the true/false from_user and null_mapped to booleans. This shrinks the struct from 32 to 24 bytes on 64-bit archs. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-