- 27 Jul, 2023 18 commits
-
-
Yu Kuai authored
'io_acct_set' is only used for raid0 and raid456, prepare to use it for raid1 and raid10, so that io accounting from different levels can be consistent. By the way, follow up patches will also use this io clone mechanism to make sure 'active_io' represents in flight io, not io that is dispatching, so that mddev_suspend will wait for io to be done as designed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-2-yukuai1@huaweicloud.com
-
Christoph Hellwig authored
The support for bitmaps on files is a very bad idea abusing various kernel APIs, and fundamentally requires the file to not be on the actual array without a way to check that this is actually the case. Add a deprecation warning to see if we might be able to eventually drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-12-hch@lst.de
-
Christoph Hellwig authored
The support for write intent bitmaps in files on an external files in md is a hot mess that abuses ->bmap to map file offsets into physical device objects, and also abuses buffer_heads in a creative way. Make this code optional so that MD can be built into future kernels without buffer_head support, and so that we can eventually deprecate it. Note this does not affect the internal bitmap support, which has none of the problems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-11-hch@lst.de
-
Christoph Hellwig authored
The md driver allocates pages for storing the bitmap file data, which are not page cache pages, and then stores the page granularity file offset in page->index, which is a field that isn't really valid except for page cache pages. Use a separate index for the superblock, and use the scheme used at read size to recalculate the index for the bitmap pages instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-10-hch@lst.de
-
Christoph Hellwig authored
Diretly apply mddev->bitmap_info.offset to the sector number to read instead of doing that in both callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-9-hch@lst.de
-
Christoph Hellwig authored
Convert read_sb_page to the normal kernel coding style, calculate the target sector only once, and add a local iosize variable to make the call to sync_page_io more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-8-hch@lst.de
-
Christoph Hellwig authored
Split the confusing loop in md_bitmap_init_from_disk that iterates over all chunks but also needs to read and map the pages into three separate loops: one that iterates over the pages to read them, a second optional one to iterate over the pages to mark them invalid if the bitmaps are out of date, and a final one that actually iterates over the chunks. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306160552.smw0qbmb-lkp@intel.com/Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-7-hch@lst.de
-
Christoph Hellwig authored
Make the difference to read_sb_page clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-6-hch@lst.de
-
Christoph Hellwig authored
Split the file write code out of write_page into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-5-hch@lst.de
-
Christoph Hellwig authored
Don't bother allocating an extra buffer in the I/O failure handler and instead use the printk built-in format to print the last 4 path name components. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-4-hch@lst.de
-
Christoph Hellwig authored
Just a small tidyup to prepare for bigger changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-3-hch@lst.de
-
Christoph Hellwig authored
Set BITMAP_WRITE_ERROR directly in write_sb_page instead of propagating the error to the caller and setting it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-2-hch@lst.de
-
Yu Kuai authored
For md_check_recovery(): 1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread. 2) if 'MD_RECOVERY_RUNING' is set: a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for md_do_sync() to be done. b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code expects that sync_thread is not NULL, otherwise new sync_thread will be registered, which will corrupt the array. Make sure md_check_recovery() won't register new sync_thread if 'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for the above corruption, Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-7-yukuai1@huaweicloud.com
-
Yu Kuai authored
md_reap_sync_thread() is just replaced with wait_event(resync_wait, ...) from action_store(), just make sure action_store() will still wait for everything to be done in md_reap_sync_thread(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewd-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-6-yukuai1@huaweicloud.com
-
Yu Kuai authored
Our test found a following deadlock in raid10: 1) Issue a normal write, and such write failed: raid10_end_write_request set_bit(R10BIO_WriteError, &r10_bio->state) one_write_done reschedule_retry // later from md thread raid10d handle_write_completed list_add(&r10_bio->retry_list, &conf->bio_end_io_list) // later from md thread raid10d if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) list_move(conf->bio_end_io_list.prev, &tmp) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list) raid_end_bio_io(r10_bio) Dependency chain 1: normal io is waiting for updating superblock 2) Trigger a recovery: raid10_sync_request raise_barrier Dependency chain 2: sync thread is waiting for normal io 3) echo idle/frozen to sync_action: action_store mddev_lock md_unregister_thread kthread_stop Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread 4) md thread can't update superblock: raid10d md_check_recovery if (mddev_trylock(mddev)) md_update_sb Dependency chain 4: update superblock is waiting for 'reconfig_mutex' Hence cyclic dependency exist, in order to fix the problem, we must break one of them. Dependency 1 and 2 can't be broken because they are foundation design. Dependency 4 may be possible if it can be guaranteed that no io can be inflight, however, this requires a new mechanism which seems complex. Dependency 3 is a good choice, because idle/frozen only requires sync thread to finish, which can be done asynchronously that is already implemented, and 'reconfig_mutex' is not needed anymore. This patch switch 'idle' and 'frozen' to wait sync thread to be done asynchronously, and this patch also add a sequence counter to record how many times sync thread is done, so that 'idle' won't keep waiting on new started sync thread. Noted that raid456 has similiar deadlock([1]), and it's verified[2] this deadlock can be fixed by this patch as well. [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
-
Yu Kuai authored
Currently, for idle and frozen, action_store will hold 'reconfig_mutex' and call md_reap_sync_thread() to stop sync thread, however, this will cause deadlock (explained in the next patch). In order to fix the problem, following patch will release 'reconfig_mutex' and wait on 'resync_wait', like md_set_readonly() and do_md_stop() does. Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN' unconditionally, which might cause unexpected problems, for example, frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while 'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which might starve in progress frozen. A mutex is added to synchronize idle and frozen from action_store(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there are no functional changes except that MD_RECOVERY_RUNNING is checked again after 'reconfig_mutex' is held. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
This reverts commit 9dfbdafd. Because it will introduce a defect that sync_thread can be running while MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems, for example: list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0). Call trace: __list_add_valid+0xfc/0x140 insert_work+0x78/0x1a0 __queue_work+0x500/0xcf4 queue_work_on+0xe8/0x12c md_check_recovery+0xa34/0xf30 raid10d+0xb8/0x900 [raid10] md_thread+0x16c/0x2cc kthread+0x1a4/0x1ec ret_from_fork+0x10/0x18 This is because work is requeued while it's still inside workqueue: t1: t2: action_store mddev_lock if (mddev->sync_thread) mddev_unlock md_unregister_thread // first sync_thread is done md_check_recovery mddev_try_lock /* * once MD_RECOVERY_DONE is set, new sync_thread * can start. */ set_bit(MD_RECOVERY_RUNNING, &mddev->recovery) INIT_WORK(&mddev->del_work, md_start_sync) queue_work(md_misc_wq, &mddev->del_work) test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...) // set pending bit insert_work list_add_tail mddev_unlock mddev_lock_nointr md_reap_sync_thread // MD_RECOVERY_RUNNING is cleared mddev_unlock t3: // before queued work started from t2 md_check_recovery // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started INIT_WORK(&mddev->del_work, md_start_sync) work->data = 0 // work pending bit is cleared queue_work(md_misc_wq, &mddev->del_work) insert_work list_add_tail // list is corrupted The above commit is reverted to fix the problem, the deadlock this commit tries to fix will be fixed in following patches. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529132037.2124527-2-yukuai1@huaweicloud.com
-
- 26 Jul, 2023 1 commit
-
-
Jinyoung Choi authored
If a problem occurs in the process of creating an integrity payload, the status of bio is always BLK_STS_RESOURCE. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230725051839epcms2p8e4d20ad6c51326ad032e8406f59d0aaa@epcms2p8Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 25 Jul, 2023 11 commits
-
-
Bart Van Assche authored
blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING has been set. This is suboptimal since running the queue asynchronously is slower than running the queue synchronously. This patch modifies blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set: - Run the queue synchronously if it is allowed to sleep. - Run the queue asynchronously if it is not allowed to sleep. Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller may be invoked from atomic context. The following caller chains have been reviewed: blk_mq_run_hw_queue(hctx, false) blk_mq_get_tag() /* may sleep, hence the functions it calls may also sleep */ blk_execute_rq() /* may sleep */ blk_mq_run_hw_queues(q, async=false) blk_freeze_queue_start() /* may sleep */ blk_mq_requeue_work() /* may sleep */ scsi_kick_queue() scsi_requeue_run_queue() /* may sleep */ scsi_run_host_queues() scsi_ioctl_reset() /* may sleep */ blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false) blk_mq_dispatch_plug_list(plug, from_sched=false) blk_mq_flush_plug_list(plug, from_schedule=false) __blk_flush_plug(plug, from_schedule=false) blk_add_rq_to_plug() blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */ blk_mq_plug_issue_direct() blk_mq_flush_plug_list() /* see above */ blk_mq_dispatch_plug_list(plug, from_sched=false) blk_mq_flush_plug_list() /* see above */ blk_mq_try_issue_directly() blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */ blk_mq_try_issue_list_directly(hctx, list) blk_mq_insert_requests() /* see above */ Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously. Leave out the direct blk_mq_run_hw_queues() call. This patch causes scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead of synchronously. Since scsi_run_queue() is not called from the hot I/O submission path, this patch does not affect the hot path. This patch prepares for allowing blk_mq_run_hw_queue() to sleep if BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from atomic context and must not sleep. Hence the removal of the blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests(). Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Inline scsi_kick_queue() to prepare for modifying the second argument passed to blk_mq_run_hw_queues(). Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230721172731.955724-2-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
There is no good reason to pass the bio to bio_try_merge_hw_seg. Just pass the current bvec and rename the function to bvec_try_merge_hw_page. This will allow reusing this function for supporting multi-page integrity payload bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Link: https://lore.kernel.org/r/20230724165433.117645-9-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The update of bi_size is the only thing in __bio_try_merge_page that needs a bio. Move it to the callers, and merge __bio_try_merge_page and page_is_mergeable into a single bvec_try_merge_page that only takes the current bvec instead of a full bio. This will allow reusing this function for supporting multi-page integrity payload bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Link: https://lore.kernel.org/r/20230724165433.117645-8-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_add_page already checks that there is space in bi_size a little earlier. So after we failed to add to an existing segment, just check that there is another one available instead of duplicating the bi_size check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Link: https://lore.kernel.org/r/20230724165433.117645-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Checking for availability in bi_size in a function that attempts to merge into an existing segment is a bit odd, as the limit also applies when adding a new segment. This code works fine as we always call __bio_try_merge_page, but contributes to sub-optimal calling conventions and doesn't lead to clear code. Move it to two of the callers instead, the third one already has a more strict check that includes max_hw_segments anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230724165433.117645-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move the bi_vcnt out of __bio_try_merge_page and into the two callers that don't already have it in preparation for additional changes to __bio_try_merge_page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230724165433.117645-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
__bio_try_merge_page is a way too low-level helper to assert that the bio is not cloned. Move the check into bio_add_page and bio_iov_iter_get_pages instead, which are the high level entry points that should enforce this variant. bio_add_hw_page already this check, coverig the third (indirect) caller of __bio_try_merge_page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230724165433.117645-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use the SECTOR_SHIFT magic constant instead of the magic number. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230724165433.117645-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_add_hw_page already checks if the number of bytes trying to be added even fit into max_hw_sectors limit of the queue. Remove the call to bio_full and just do a check for the smaller of the number of segments in the bio and the queue max segments limit, and do this cheap check before the more expensive gap to previous check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jinyoung Choi <j-young.choi@samsung.com> Link: https://lore.kernel.org/r/20230724165433.117645-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Jul, 2023 1 commit
-
-
Nitesh Shetty authored
Reduce some code by making use of bio_integrity_bytes(). Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230719121608.32105-1-nj.shetty@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 17 Jul, 2023 9 commits
-
-
Chengming Zhou authored
Since we don't need to maintain inflight flush_data requests list anymore, we can reuse rq->queuelist for flush pending list. Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist before reusing it in the state machine when end, since the rq->rq_next also reuse it, may have corrupted rq->queuelist by the driver. This patch decrease the size of struct request by 16 bytes. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230717040058.3993930-5-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
The flush state machine use a double list to link all inflight flush_data requests, to avoid issuing separate post-flushes for these flush_data requests which shared PREFLUSH. So we can't reuse rq->queuelist, this is why we need rq->flush.list In preparation of the next patch that reuse rq->queuelist for flush state machine, we change the double linked list to unsigned long counter, which count all inflight flush_data requests. This is ok since we only need to know if there is any inflight flush_data request, so unsigned long counter is good. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230717040058.3993930-4-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
If the policy == (REQ_FSEQ_DATA | REQ_FSEQ_POSTFLUSH), it means that the data sequence and post-flush sequence need to be done for this request. The rq->flush.seq should record what sequences have been done (or don't need to be done). So in this case, pre-flush doesn't need to be done, we should init rq->flush.seq to REQ_FSEQ_PREFLUSH not REQ_FSEQ_POSTFLUSH. Fixes: 615939a2 ("blk-mq: defer to the normal submission path for post-flush requests") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230717040058.3993930-3-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
If request need to be completed remotely, we insert it into percpu llist, and smp_call_function_single_async() if llist is empty previously. We don't need to use per-rq csd, percpu csd is enough. And the size of struct request is decreased by 24 bytes. This way is cleaner, and looks correct, given block softirq is guaranteed to be scheduled to consume the list if one new request is added to this percpu list, either smp_call_function_single_async() returns -EBUSY or 0. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230717040058.3993930-2-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Currently the write_cache attribute allows enabling the QUEUE_FLAG_WC flag on devices that never claimed the capability. Fix that by adding a QUEUE_FLAG_HW_WC flag that is set by blk_queue_write_cache and guards re-enabling the cache through sysfs. Note that any rescan that calls blk_queue_write_cache will still re-enable the write cache as in the current code. Fixes: 93e9d8e8 ("block: add ability to flag write back caching on a device") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230707094239.107968-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Get rid of the local queue_wc_store variable and handling setting and clearing the QUEUE_FLAG_WC flag diretly instead the if / else if. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230707094239.107968-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Thomas Weißschuh authored
Add a module alias to nbd.ko that allows the generic netlink core to automatically load the module when netlink messages for nbd are received. This frees the user from manually having to load the module before using nbd functionality via netlink. If the system policy allows it this can even be used to load the nbd module from containers which would otherwise not have access to the necessary module files to do a normal "modprobe nbd". For example this avoids the following error when using nbd-client: $ nbd-client localhost 10809 /dev/nbd0 ... Error: Couldn't resolve the nbd netlink family, make sure the nbd module is loaded and your nbd driver supports the netlink interface. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Josef Bacik <josef@toxicpadna.com> Link: https://lore.kernel.org/r/20230713-b4-nbd-genl-v3-1-226cbddba04b@weissschuh.netSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Azeem Shaikh authored
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230703172159.3668349-3-azeemshaikh38@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Azeem Shaikh authored
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230703172159.3668349-2-azeemshaikh38@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-