- 27 Aug, 2021 2 commits
-
-
Jens Axboe authored
Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.15/drivers Pull MD changes from Song. * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: raid1: ensure write behind bio has less than BIO_MAX_VECS sectors md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard
-
Guoqing Jiang authored
We can't split write behind bio with more than BIO_MAX_VECS sectors, otherwise the below call trace was triggered because we could allocate oversized write behind bio later. [ 8.097936] bvec_alloc+0x90/0xc0 [ 8.098934] bio_alloc_bioset+0x1b3/0x260 [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] [ 8.100988] ? __bio_clone_fast+0xa8/0xe0 [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod] [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod] [ 8.104084] submit_bio_noacct+0x139/0x530 [ 8.105127] submit_bio+0x78/0x1d0 [ 8.106163] ext4_io_submit+0x48/0x60 [ext4] [ 8.107242] ext4_writepages+0x652/0x1170 [ext4] [ 8.108300] ? do_writepages+0x41/0x100 [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4] [ 8.110406] do_writepages+0x41/0x100 [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100 [ 8.112513] file_write_and_wait_range+0x61/0xb0 [ 8.113564] ext4_sync_file+0x73/0x370 [ext4] [ 8.114607] __x64_sys_fsync+0x33/0x60 [ 8.115635] do_syscall_64+0x33/0x40 [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae Thanks for the comment from Christoph. [1]. https://bugs.archlinux.org/task/70992 Cc: stable@vger.kernel.org # v5.12+ Reported-by: Jens Stutte <jens@chianterastutte.eu> Tested-by: Jens Stutte <jens@chianterastutte.eu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
-
- 26 Aug, 2021 1 commit
-
-
Xiao Ni authored
We are seeing the following warning in raid10_handle_discard. [ 695.110751] ============================= [ 695.131439] WARNING: suspicious RCU usage [ 695.151389] 4.18.0-319.el8.x86_64+debug #1 Not tainted [ 695.174413] ----------------------------- [ 695.192603] drivers/md/raid10.c:1776 suspicious rcu_dereference_check() usage! [ 695.225107] other info that might help us debug this: [ 695.260940] rcu_scheduler_active = 2, debug_locks = 1 [ 695.290157] no locks held by mkfs.xfs/10186. In the first loop of function raid10_handle_discard. It already determines which disk need to handle discard request and add the rdev reference count rdev->nr_pending. So the conf->mirrors will not change until all bios come back from underlayer disks. It doesn't need to use rcu_dereference to get rdev. Cc: stable@vger.kernel.org Fixes: d30588b2 ('md/raid10: improve raid10 discard request') Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com>
-
- 25 Aug, 2021 6 commits
-
-
Christoph Hellwig authored
The nbd->destroy_complete pointer is not really needed. For creating a device without a specific index we now simplify skip devices marked NBD_DESTROY_ON_DISCONNECT as there is not much point to reuse them. For device creation with a specific index there is no real need to treat the case of a requested but not finished disconnect different than any other device that is being shutdown, i.e. we can just return an error, as a slightly different race window would anyway. Fixes: 6e4df4c6 ("nbd: reduce the nbd_index_mutex scope") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+2c98885bcd769f56b6d6@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-7-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Device marked as NBD_DESTROY_ON_DISCONNECT can and should be skipped given that they won't survive the disconnect. So skip them and try to grab a reference directly and just continue if the the devices is being torn down or created and thus has a zero refcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-6-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tetsuo Handa authored
Set nbd->index before releasing nbd_index_mutex, as populate_nbd_status() might access nbd->index as soon as nbd_index_mutex is released. Fixes: 6e4df4c6 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-5-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tetsuo Handa authored
Previously nbd_index_mutex was held during whole add/remove/lookup operations in order to guarantee that partially initialized devices are not reachable via idr_find() or idr_for_each(). But now that partially initialized devices become reachable as soon as idr_alloc() succeeds, we need to skip partially initialized devices. Since it seems that all functions use refcount_inc_not_zero(&nbd->refs) in order to skip destroying devices, update nbd->refs from zero to non-zero as the last step of device initialization in order to also skip partially initialized devices. Fixes: 6e4df4c6 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch, added comments] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-4-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
When nbd_genl_connect restarts to wait for a disconnecting device, nbd needs to be reset to NULL. Do that by facoring out a helper to find an unused device. Fixes: 6177b56c ("nbd: refactor device search and allocation in nbd_genl_connect") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reported-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-3-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tetsuo Handa authored
idr_remove needs external synchronization. Fixes: 6e4df4c6 ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-2-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 18 Aug, 2021 1 commit
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe updates from Christoph: "nvme updates for Linux 5.15. - suspend improvements for devices with an HMB (Keith Busch) - handle double completions more gacefull (Sagi Grimberg) - cleanup the selects for the nvme core code a bit (Sagi Grimberg) - don't update queue count when failing to set io queues (Ruozhu Li) - various nvmet connect fixes (Amit Engel) - cleanup lightnvm leftovers (Keith Busch, me) - small cleanups (Colin Ian King, Hou Pu) - add tracing for the Set Features command (Hou Pu) - CMB sysfs cleanups (Keith Busch) - add a mutex_destroy call (Keith Busch)" * tag 'nvme-5.15-2021-08-18' of git://git.infradead.org/nvme: (21 commits) nvme: remove the unused NVME_NS_* enum nvme: remove nvm_ndev from ns nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers nvmet: check that host sqsize does not exceed ctrl MQES nvmet: avoid duplicate qid in connect cmd nvmet: pass back cntlid on successful completion nvme-rdma: don't update queue count when failing to set io queues nvme-tcp: don't update queue count when failing to set io queues nvme-tcp: pair send_mutex init with destroy nvme: allow user toggling hmb usage nvme-pci: disable hmb on idle suspend nvmet: remove redundant assignments of variable status nvmet: add set feature tracing support nvme: add set feature tracing support nvme-fabrics: remove superfluous nvmf_host_put in nvmf_parse_options nvme-pci: cmb sysfs: one file, one value nvme-pci: use attribute group for cmb sysfs nvme: code command_id with a genctr for use-after-free validation nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data nvme-pci: limit maximum queue depth to 4095 ...
-
- 17 Aug, 2021 1 commit
-
-
Christoph Hellwig authored
These values are unused now that the lightnvm support is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
-
- 16 Aug, 2021 21 commits
-
-
Keith Busch authored
Now that the lightnvm driver is removed, we don't need a pointer to it's now non-existent struct. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
Transport drivers need both core and fabrics modules, instead of selecting both, have the selection transitive such that NVME_FABRICS selects NVME_CORE and transport drivers select NVME_FABRICS. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Pavel Skripkin authored
Syzbot hit WARNING in internal_create_group(). The problem was in too big disk->first_minor. disk->first_minor is initialized by value, which comes from userspace and there wasn't any sanity checks about value correctness. It can cause duplicate creation of sysfs files/links, because disk->first_minor will be passed to MKDEV() which causes truncation to byte. Since maximum minor value is 0xff, let's check if first_minor is correct minor number. NOTE: the root case of the reported warning was in wrong error handling in register_disk(), but we can avoid passing knowingly wrong values to sysfs API, because sysfs error messages can confuse users. For example: user passed 1048576 as index, but sysfs complains about duplicate creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0. Log and reproducer for above example can be found on syzkaller bug report page. Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1 Fixes: b0d9111a ("nbd: use an idr to keep track of nbd devices") Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Amit Engel authored
Check that host sqsize is not greater-than Maximum Queue Entries Supported (MQES) value supported by the controller. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Amit Engel authored
According to the NVMe specification, if the host sends a Connect command specifying a queue id which has already been created, a status value of NVME_SC_CMD_SEQ_ERROR is returned. Signed-off-by: Amit Engel <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Amit Engel authored
According to the NVMe specification, the response dword 0 value of the Connect command is based on status code: return cntlid for successful compeltion return IPO and IATTR for connect invalid parameters. Fix a missing error information for a zero sized queue, and return the cntlid also for I/O queue Connect commands. Signed-off-by: Amit Engel <amit.engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Ruozhu Li authored
We update ctrl->queue_count and schedule another reconnect when io queue count is zero.But we will never try to create any io queue in next reco- nnection, because ctrl->queue_count already set to zero.We will end up having an admin-only session in Live state, which is exactly what we try to avoid in the original patch. Update ctrl->queue_count after queue_count zero checking to fix it. Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Ruozhu Li authored
We update ctrl->queue_count and schedule another reconnect when io queue count is zero.But we will never try to create any io queue in next reco- nnection, because ctrl->queue_count already set to zero.We will end up having an admin-only session in Live state, which is exactly what we try to avoid in the original patch. Update ctrl->queue_count after queue_count zero checking to fix it. Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
Each mutex_init() should have a corresponding mutex_destroy(). Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The NVMe host memory buffer may consume a non-negligable amount of memory. Controllers are required to function without the host memory buffer enabled, but with possibly degraded performance. Export a sysfs property to toggle this feature on a per-device granularity so users may choose to reclaim memory at the expense of storage performance. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
An idle suspend may or may not disable host memory access from devices placed in low power mode. Either way, it should always be safe to disable the host memory buffer prior to entering the low power mode, and this should also always be faster than a full device shutdown. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Colin Ian King authored
There are two occurrances where variable status is being assigned a value that is never read and it is being re-assigned a new value almost immediately afterwards on an error exit path. The assignments are redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hou Pu authored
A nvme connect command produces following trace from the target side. Before: kworker/0:1H-56 [000] .... 9012.155139: nvmet_req_init: nvmet1: qid=0, cmdid=16, nsid=0, flags=0x40, meta=0x0, cmd=(nvme_admin_set_features, cdw10=07 00 00 00 07 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00) kworker/0:1H-56 [000] .... 9012.872272: nvmet_req_init: nvmet1: qid=0, cmdid=13, nsid=0, flags=0x40, meta=0x0, cmd=(nvme_admin_set_features, cdw10=0b 00 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00) cmdline:/sys/kernel/debug/tracing# cat trace | grep feature kworker/0:1H-56 [000] .... 203.493914: nvmet_req_init: nvmet1: qid=0, cmdid=29, nsid=0, flags=0x40, meta=0x0, cmd=(nvme_admin_set_features, fid=0x7, sv=0x0, cdw11=0x70007) kworker/0:1H-56 [000] .... 204.197079: nvmet_req_init: nvmet1: qid=0, cmdid=29, nsid=0, flags=0x40, meta=0x0, cmd=(nvme_admin_set_features, fid=0xb, sv=0x0, cdw11=0x900) Using ',' to separate different field like others in nvmet_trace_admin_get_features. Signed-off-by: Hou Pu <houpu.main@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hou Pu authored
A nvme connect command produces following trace. Before: /sys/kernel/debug/tracing# cat trace | grep feature kworker/5:1H-98 [005] .... 3221.294844: nvme_setup_cmd: nvme0: qid=0, cmdid=25, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_set_features cdw10=07 00 00 00 07 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00) kworker/4:1H-124 [004] .... 3222.009186: nvme_setup_cmd: nvme0: qid=0, cmdid=17, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_set_features cdw10=0b 00 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00) After: /sys/kernel/debug/tracing# cat trace | grep feature kworker/0:1H-253 [000] .... 196.060509: nvme_setup_cmd: nvme0: qid=0, cmdid=29, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_set_features fid=0x7, sv=0x0, cdw11=0x70007) kworker/0:1H-253 [000] .... 196.763947: nvme_setup_cmd: nvme0: qid=0, cmdid=29, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_set_features fid=0xb, sv=0x0, cdw11=0x900) Using ',' to separate different field like others in nvmet_trace_admin_get_features. Signed-off-by: Hou Pu <houpu.main@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hou Pu authored
Opts->host is NULL there. It is checked just before. So remove nvmf_host_put. It is introduced by commit 59a2f3f0 ("nvme: fix potential memory leak in option parsing"). Signed-off-by: Hou Pu <houpu.main@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
An attribute should only be exporting one value as recommended in Documentation/filesystems/sysfs.rst. Implement CMB attributes this way. The old attribute will remain for backward compatibility. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
Appending sysfs files to the controller kobject is a bit clunky and becomes a maintenance problem as more attributes are added. The attribute group infrastructure handles this better, so use that. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We cannot detect a (perhaps buggy) controller that is sending us a completion for a request that was already completed (for example sending a completion twice), this phenomenon was seen in the wild a few times. So to protect against this, we use the upper 4 msbits of the nvme sqe command_id to use as a 4-bit generation counter and verify it matches the existing request generation that is incrementing on every execution. The 16-bit command_id structure now is constructed by: | xxxx | xxxxxxxxxxxx | gen request tag This means that we are giving up some possible queue depth as 12 bits allow for a maximum queue depth of 4095 instead of 65536, however we never create such long queues anyways so no real harm done. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We already validate it when receiving the c2hdata pdu header and this is not changing so this is a redundant check. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We are going to use the upper 4-bits of the command_id for a generation counter, so enforce the new queue depth upper limit. As we enforce both min and max queue depth, use param_set_uint_minmax istead of open coding it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
It is a useful helper hence move it to common code so others can enjoy it. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 14 Aug, 2021 1 commit
-
-
Christoph Hellwig authored
Lightnvm supports the OCSSD 1.x and 2.0 specs which were early attempts to produce Open Channel SSDs and never made it into the NVMe spec proper. They have since been superceeded by NVMe enhancements such as ZNS support. Remove the support per the deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210812132308.38486-1-hch@lst.deReviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Aug, 2021 7 commits
-
-
Christoph Hellwig authored
nbd_index_mutex is currently held over add_disk and inside ->open, which leads to lock order reversals. Refactor the device creation code path so that nbd_dev_add is called without nbd_index_mutex lock held and only takes it for the IDR insertation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-7-hch@lst.de [axboe: fix whitespace] Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Use idr_for_each_entry instead of the awkward callback to find an existing device for the index == -1 case, and de-duplicate the device allocation if no existing device was found. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-6-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Return the device we just allocated instead of doing an extra search for it in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-5-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Fold nbd_del_disk and remove the pointless NULL check on ->disk given that it is always set for a successfully allocated nbd_device structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-4-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Share common code for the synchronous and workqueue based device removal, and remove the pointless use of refcount_dec_and_mutex_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-3-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hou Tao authored
Now open_mutex is used to synchronize partition operations (e.g, blk_drop_partitions() and blkdev_reread_part()), however it makes nbd driver broken, because nbd may call del_gendisk() in nbd_release() or nbd_genl_disconnect() if NBD_CFLAG_DESTROY_ON_DISCONNECT is enabled, and deadlock occurs, as shown below: // AB-BA dead-lock nbd_genl_disconnect blkdev_open nbd_disconnect_and_put lock bd_mutex // last ref nbd_put lock nbd_index_mutex del_gendisk nbd_open try lock nbd_index_mutex try lock bd_mutex or // AA dead-lock nbd_release lock bd_mutex nbd_put try lock bd_mutex Instead of fixing block layer (e.g, introduce another lock), fixing the nbd driver to call del_gendisk() in a kworker when NBD_DESTROY_ON_DISCONNECT is enabled. When NBD_DESTROY_ON_DISCONNECT is disabled, nbd device will always be destroy through module removal, and there is no risky of deadlock. To ensure the reuse of nbd index succeeds, moving the calling of idr_remove() after del_gendisk(), so if the reused index is not found in nbd_index_idr, the old disk must have been deleted. And reusing the existing destroy_complete mechanism to ensure nbd_genl_connect() will wait for the completion of del_gendisk(). Also adding a new workqueue for nbd removal, so nbd_cleanup() can ensure all removals complete before exits. Reported-by: syzbot+0fe7752e52337864d29b@syzkaller.appspotmail.com Fixes: c76f48eb ("block: take bd_mutex around delete_partitions in del_gendisk") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-2-hch@lst.deReviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Baokun Li authored
If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-