- 27 Jan, 2019 1 commit
-
-
Jens Axboe authored
We can't touch a bio after ->make_request_fn(), for all we know it could already have been completed by the time this function returns. This reverts commit 698cef17. Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 24 Jan, 2019 9 commits
-
-
Damien Le Moal authored
Fix typo in REQ_OP_ZONE_RESET description. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
The description of the BLKGETNRZONES zoned block device ioctl was not added as a comment together with this ioctl definition in commit 65e4e3ee ("block: Introduce BLKGETNRZONES ioctl"). Add its description here. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
This patch avoids that sparse reports the following warnings: CHECK block/blk-wbt.c block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static? block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static? CC block/blk-wbt.o block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes] void wbt_issue(struct rq_qos *rqos, struct request *rq) ^~~~~~~~~ block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes] void wbt_requeue(struct rq_qos *rqos, struct request *rq) ^~~~~~~~~~~ Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
Swap REQ_NOWAIT and REQ_NOUNMAP and add REQ_HIPRI. Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hannes Reinecke authored
Bit 6 in the ANACAP field is used to indicate that the ANA group ID doesn't change while the namespace is attached to the controller. There is an optimisation in the code to only allocate space for the ANA group header, as the namespace list won't change and hence would not need to be refreshed. However, this optimisation was never carried over to the actual workflow, which always assumes that the buffer is large enough to hold the ANA header _and_ the namespace list. So drop this optimisation and always allocate enough space. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Raju Rangoju authored
Under heavy load if we don't have any pre-allocated rsps left, we dynamically allocate a rsp, but we are not actually allocating memory for nvme_completion (rsp->req.rsp). In such a case, accessing pointer fields (req->rsp->status) in nvmet_req_init() will result in crash. To fix this, allocate the memory for nvme_completion by calling nvmet_rdma_alloc_rsp() Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load") Cc: <stable@vger.kernel.org> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
If the device supports less queues than provided (if the device has less completion vectors), we might hit a bug due to the fact that we ignore that in nvme_rdma_map_queues (we override the maps nr_queues with user opts). Instead, keep track of how many default/read/poll queues we actually allocated (rather than asked by the user) and use that to assign our queue mappings. Fixes: b65bb777 (" nvme-rdma: support separate queue maps for read and write") Reported-by: Saleem, Shiraz <shiraz.saleem@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Currently, we have several problems with the timeout handler: 1. If we timeout on the controller establishment flow, we will hang because we don't execute the error recovery (and we shouldn't because the create_ctrl flow needs to fail and cleanup on its own) 2. We might also hang if we get a disconnet on a queue while the controller is already deleting. This racy flow can cause the controller disable/shutdown admin command to hang. We cannot complete a timed out request from the timeout handler without mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work). So we serialize it in the timeout handler and teardown io and admin queues to guarantee that no one races with us from completing the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Currently, we have several problems with the timeout handler: 1. If we timeout on the controller establishment flow, we will hang because we don't execute the error recovery (and we shouldn't because the create_ctrl flow needs to fail and cleanup on its own) 2. We might also hang if we get a disconnet on a queue while the controller is already deleting. This racy flow can cause the controller disable/shutdown admin command to hang. We cannot complete a timed out request from the timeout handler without mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work). So we serialize it in the timeout handler and teardown io and admin queues to guarantee that no one races with us from completing the request. Reported-by: Jaesoo Lee <jalee@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Jan, 2019 3 commits
-
-
Tejun Heo authored
sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.comAcked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Except for blk_queue_split(), bio_split() is used for splitting bio too, then the remained bio is often resubmit to queue via generic_make_request(). So the same queue enter recursion exits in this case too. Unfortunatley commit cd4a4ae4 doesn't help this case. This patch covers the above case by setting BIO_QUEUE_ENTERED before calling q->make_request_fn. In theory the per-bio flag is used to simulate one stack variable, it is just fine to clear it after q->make_request_fn is returned. Especially the same bio can't be submitted from another context. Fixes: cd4a4ae4 ("block: don't use blocking queue entered for recursive bio submits") Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: NeilBrown <neilb@suse.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ernesto A. Fernández authored
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently not allowed to create blocks for an empty inode. This confusion comes from trying to bit shift a negative number, so check the size of the inode first. The problem is most visible for hfsplus, because the fallback to buffered I/O doesn't happen and the write fails with EIO. This is in part the fault of the module, because it gives a wrong return value on ->get_block(); that will be fixed in a separate patch. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 18 Jan, 2019 1 commit
-
-
Thomas Gleixner authored
Remove the imprecise and sloppy: "This files is licensed under the GPL." license notice in the top level comment. 1) The file already contains a SPDX license identifier which clearly states that the license of the file is GPL V2 only 2) The notice resolves to GPL v1 or later for scanners which is just contrary to the intent of SPDX identifiers to provide clear and non ambiguous license information. Aside of that the value add of this notice is below zero, Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Fixes: 6a5ac984 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Jan, 2019 3 commits
-
-
Ming Lei authored
When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(), we still try to allocate multiple irq vectors again, so irq queues covers the admin queue actually. But we don't consider that, then number of the allocated irq vector may be same with sum of io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way is obviously wrong, and finally breaks nvme_pci_map_queues(), and warning from pci_irq_get_affinity() is triggered. IRQ queues should cover admin queues, this patch makes this point explicitely in nvme_calc_io_queues(). We got severl boot failure internal report on aarch64, so please consider to fix it in v4.20. Fixes: 6451fe73 ("nvme: fix irq vs io_queue calculations") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Tested-by: fin4478 <fin4478@hotmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
If we end up in nvmet_tcp_try_recv_one with a bogus state queue receive state we will access result which is uninitialized. Initialize restult to 0 which will be considered as if no data was received by the tcp socket. Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
We need to pass bio->bi_opf after bio intergrity preparing, otherwise the flag of REQ_INTEGRITY may not be set on the allocated request, then breaks block integrity. Fixes: f9afca4d ("blk-mq: pass in request/bio flags to queue mapping") Cc: Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jan, 2019 2 commits
-
-
Jan Kara authored
bd_set_size() updates also block device's block size. This is somewhat unexpected from its name and at this point, only blkdev_open() uses this functionality. Furthermore, this can result in changing block size under a filesystem mounted on a loop device which leads to livelocks inside __getblk_gfp() like: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106 ... Call Trace: init_page_buffers+0x3e2/0x530 fs/buffer.c:904 grow_dev_page fs/buffer.c:947 [inline] grow_buffers fs/buffer.c:1009 [inline] __getblk_slow fs/buffer.c:1036 [inline] __getblk_gfp+0x906/0xb10 fs/buffer.c:1313 __bread_gfp+0x2d/0x310 fs/buffer.c:1347 sb_bread include/linux/buffer_head.h:307 [inline] fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75 fat_ent_read_block fs/fat/fatent.c:441 [inline] fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101 __fat_get_block fs/fat/inode.c:148 [inline] ... Trivial reproducer for the problem looks like: truncate -s 1G /tmp/image losetup /dev/loop0 /tmp/image mkfs.ext4 -b 1024 /dev/loop0 mount -t ext4 /dev/loop0 /mnt losetup -c /dev/loop0 l /mnt Fix the problem by moving initialization of a block device block size into a separate function and call it when needed. Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with debugging the problem. Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
NBD can update block device block size implicitely through bd_set_size(). Make it explicitely set blocksize with set_blocksize() as this behavior of bd_set_size() is going away. CC: Josef Bacik <jbacik@fb.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Jan, 2019 2 commits
-
-
Marcos Paulo de Souza authored
bio_alloc_bioset returns a bio pointer or NULL, so we can avoid storing the returned data into a new variable. Acked-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Comments on function __bfq_deactivate_entity contains two imprecise or wrong statements: 1) The function performs the deactivation of the entity. 2) The function must be invoked only if the entity is on a service tree. This commits replaces both statements with the correct ones: 1) The functions updates sched_data and service trees for the entity, so as to represent entity as inactive (which is only part of the steps needed for the deactivation of the entity). 2) The function must be invoked on every entity being deactivated. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 11 Jan, 2019 5 commits
-
-
Miquel Raynal authored
A feature has been added in the libahci driver: the possibility to set a new flag in hpriv->flags to let the core handle PHY suspend/resume automatically. Make use of this feature to make suspend to RAM work with SATA drives on A3700. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Miquel Raynal authored
A3700 comphy initialization is done in the firmware (TF-A). Looking at the SATA PHY initialization routine, there is a comment about "vendor specific" registers. Two registers are mentioned. They are not initialized there in the firmware because they are AHCI related, while the firmware at this location does only PHY configuration. The solution to avoid doing such initialization is relying on U-Boot. While this work at boot time, U-Boot is definitely not going to run during a resume after suspending to RAM. Two possible solutions were considered: * Fixing the firmware. * Fixing the kernel driver. The first solution would take ages to propagate, while the second solution is easy to implement as the driver as been a little bit reworked to prepare for such platform configuration. Hence, this patch adds an Armada 3700 configuration function to set these two registers both at boot time (in the probe) and after a suspend (in the resume path). Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Miquel Raynal authored
At the beginning, only Armada 38x SoCs where supported by the ahci_mvebu.c driver. Commit 15d3ce7b ("ata: ahci_mvebu: add support for Armada 3700 variant") introduced Armada 3700 support. As opposed to Armada 38x SoCs, the 3700 variants do not have to configure mbus and the regret option. This patch took care of avoiding such configuration when not needed in the probe function, but failed to do the same in the resume path. While doing so looks harmless by experience, let's clean the driver logic and avoid doing this useless configuration with Armada 3700 SoCs. Because the logic is very similar between these two places, it has been decided to factorize this code and put it in a "Armada 38x configuration function". This function is part of a new (per-compatible) platform data structure, so that the addition of such configuration function for Armada 3700 will be eased. Fixes: 15d3ce7b ("ata: ahci_mvebu: add support for Armada 3700 variant") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Miquel Raynal authored
For Armada-38x (32-bit) SoCs, PM platform support has been added since: commit 32f9494c ("ARM: mvebu: prepare pm-board.c for the introduction of Armada 38x support") commit 3cbd6a6c ("ARM: mvebu: Add standby support") For Armada 64-bit SoCs, like the A3700 also using this AHCI driver, PM platform support has always existed. There are even suspend/resume hooks in this driver since: commit d6ecf158 ("ata: ahci_mvebu: add suspend/resume support") Remove the stale comment at the end of this driver stating that all the above does not exist yet. Fixes: d6ecf158 ("ata: ahci_mvebu: add suspend/resume support") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Miquel Raynal authored
Current implementation of the libahci does not take into account the new PHY framework. Correct the situation by adding a call to phy_set_mode() before phy_power_on(). PHYs should also be handled at suspend/resume time. For this, call ahci_platform_enable/disable_phys() at suspend/resume_host() time. These calls are guarded by a HFLAG (AHCI_HFLAG_SUSPEND_PHYS) that the user of the libahci driver must set manually in hpriv->flags at probe time. This is to avoid breaking users that have not been tested with this change. Reviewed-by: Hans de Goede <hdegoede@redhat.com> Suggested-by: Grzegorz Jaszczyk <jaz@semihalf.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 10 Jan, 2019 2 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Christoph. * 'nvme-5.0' of git://git.infradead.org/nvme: nvme: don't initlialize ctrl->cntlid twice nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN nvme: pad fake subsys NQN vid and ssvid with zeros nvme-multipath: zero out ANA log buffer nvme-fabrics: unset write/poll queues for discovery controllers nvme-tcp: don't ask if controller is fabrics nvme-tcp: remove dead code nvme-pci: fix out of bounds access in nvme_cqe_pending nvme-pci: rerun irq setup on IO queue init errors nvme-pci: use the same attributes when freeing host_mem_desc_bufs. nvme-pci: fix the wrong setting of nr_maps
-
Jaegeuk Kim authored
If we don't drop caches used in old offset or block_size, we can get old data from new offset/block_size, which gives unexpected data to user. For example, Martijn found a loopback bug in the below scenario. 1) LOOP_SET_FD loads first two pages on loop file 2) LOOP_SET_STATUS64 changes the offset on the loop file 3) mount is failed due to the cached pages having wrong superblock Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reported-by: Martijn Coenen <maco@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 09 Jan, 2019 12 commits
-
-
Jonathan Corbet authored
Commit 5f0ed774 ("block: sum requests in the plug structure") removed the request_count parameter from block_attempt_plug_merge(), but did not remove the associated kerneldoc comment, introducing this warning to the docs build: ./block/blk-core.c:685: warning: Excess function parameter 'request_count' description in 'blk_attempt_plug_merge' Remove the obsolete description and make things a little quieter. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Andrey Smirnov authored
ctrl->cntlid will already be initialized from id->cntlid for non-NVME_F_FABRICS controllers few lines below. For NVME_F_FABRICS controllers this field should already be initialized, otherwise the check if (ctrl->cntlid != le16_to_cpu(id->cntlid)) below will always be a no-op. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
James Dingwall authored
If a device provides an NQN it is expected to be globally unique. Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did not satisfy this requirement. In these circumstances if a system has >1 affected device then only one device is enabled. If this quirk is enabled then the device supplied subnqn is ignored and we fallback to generating one as if the field was empty. In this case we also suppress the version check so we don't print a warning when the quirk is enabled. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
We need to preserve the leading zeros in the vid and ssvid when generating a unique NQN. Truncating these may lead to naming collisions. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hannes Reinecke authored
When nvme_init_identify() fails the ANA log buffer is deallocated but _not_ set to NULL. This can cause double free oops when this controller is deleted without ever being reconnected. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
Even if user-space sent it to us, it got it wrong so lets help by disallowing it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
For sure we are a fabric driver. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We should never touch the opal device from the transport driver. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hongbo Yao authored
There is an out of bounds array access in nvme_cqe_peding(). When enable irq_thread for nvme interrupt, there is racing between the nvmeq->cq_head updating and reading. nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending() uses its value as an array index, the index will be out of bounds. Signed-off-by: Hongbo Yao <yaohongbo@huawei.com> [hch: slight coding style update] Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
If the driver is unable to create a subset of IO queues for any reason, the read/write and polled queue sets will not match the actual allocated hardware contexts. This leaves gaps in the CPU affinity mappings and causes the following kernel panic after blk_mq_map_queue_type() returns a NULL hctx. BUG: unable to handle kernel NULL pointer dereference at 0000000000000198 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 Workqueue: nvme-wq nvme_scan_work [nvme_core] RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440 RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286 RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820 RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008 R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_init_queue+0x35/0x60 nvme_validate_ns+0xc6/0x7c0 [nvme_core] ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core] nvme_scan_work+0xc8/0x340 [nvme_core] ? __wake_up_common+0x6d/0x120 ? try_to_wake_up+0x55/0x410 process_one_work+0x1e9/0x3d0 worker_thread+0x2d/0x3d0 ? process_one_work+0x3d0/0x3d0 kthread+0x111/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 Modules linked in: nvme nvme_core serio_raw CR2: 0000000000000198 Fix by re-running the interrupt vector setup from scratch using a reduced count that may be successful until the created queues matches the irq affinity plus polling queue sets. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Liviu Dudau authored
When using HMB the PCIe host driver allocates host_mem_desc_bufs using dma_alloc_attrs() but frees them using dma_free_coherent(). Use the correct dma_free_attrs() function to free the buffers. Signed-off-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Jianchao Wang authored
We only set the nr_maps to 3 if poll queues are supported. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-