Commits · 15dfc662ef31a20b59097d59b0792b06770255fa · Kirill Smelkov / linux

29 Oct, 2021 4 commits

null_blk: Fix handling of submit_queues and poll_queues attributes · 15dfc662

Shin'ichiro Kawasaki authored Oct 29, 2021

Commit 0a593fbb ("null_blk: poll queue support") introduced the poll
queue feature to null_blk. After this change, null_blk device has both
submit queues and poll queues, and null_map_queues() callback maps the
both queues for corresponding hardware contexts. The commit also added
the device configuration attribute 'poll_queues' in same manner as the
existing attribute 'submit_queues'. These attributes allow to modify the
numbers of queues. However, when the new values are stored to these
attributes, the values are just handled only for the corresponding
queue. When number of submit_queue is updated, number of poll_queue is
not counted, or vice versa. This caused inconsistent number of queues
and queue mapping and resulted in null-ptr-dereference. This failure was
observed in blktests block/029 and block/030.

To avoid the inconsistency, fix the attribute updates to care both
submit_queues and poll_queues. Introduce the helper function
nullb_update_nr_hw_queues() to handle stores to the both two attributes.
Add poll_queues field to the struct nullb_device to track the number in
same manner as submit_queues. Add two more fields prev_submit_queues and
prev_poll_queues to keep the previous values before change. In case the
block layer failed to update the nr_hw_queues, refer the previous values
in null_map_queues() to map queues in same manner as before change.

Also add poll_queues value checks in nullb_update_nr_hw_queues() and
null_validate_conf(). They ensure the poll_queues value of each device
is within the range from 1 to module parameter value of poll_queues.

Fixes: 0a593fbb ("null_blk: poll queue support")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211029103926.845635-1-shinichiro.kawasaki@wdc.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

15dfc662

block: ataflop: Fix warning comparing pointer to 0 · df75db1f

Jiapeng Chong authored Oct 29, 2021

Fix the following coccicheck warning:

./drivers/block/ataflop.c:1464:20-21: WARNING comparing pointer to 0.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1635501029-81391-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

df75db1f

bcache: replace snprintf in show functions with sysfs_emit · 1b86db5f

Qing Wang authored Oct 29, 2021

coccicheck complains about the use of snprintf() in sysfs show functions.

Fix the following coccicheck warning:
drivers/md/bcache/sysfs.h:54:12-20: WARNING: use scnprintf or sprintf.

Implement sysfs_print() by sysfs_emit() and remove snprint() since no one
uses it any more.
Suggested-by: Coly Li <colyli@suse.de>
Signed-off-by: Qing Wang <wangqing@vivo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211029060930.119923-3-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

1b86db5f

bcache: move uapi header bcache.h to bcache code directory · cf2197ca

Coly Li authored Oct 29, 2021

The header file include/uapi/linux/bcache.h is not really a user space
API heaer. This file defines the ondisk format of bcache internal meta
data but no one includes it from user space, bcache-tools has its own
copy of this header with minor modification.

Therefore, this patch moves include/uapi/linux/bcache.h to bcache code
directory as drivers/md/bcache/bcache_ondisk.h.
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

cf2197ca

28 Oct, 2021 1 commit

Merge tag 'nvme-5.16-2021-10-28' of git://git.infradead.org/nvme into for-5.16/drivers · ca778797

Jens Axboe authored Oct 28, 2021

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.16

 - support the current discovery subsystem entry (Hannes Reinecke)
 - use flex_array_size and struct_size (Len Baker)"

* tag 'nvme-5.16-2021-10-28' of git://git.infradead.org/nvme:
  nvmet: use flex_array_size and struct_size
  nvmet: register discovery subsystem as 'current'
  nvmet: switch check for subsystem type
  nvme: add new discovery log page entry definitions

ca778797

27 Oct, 2021 4 commits

nvmet: use flex_array_size and struct_size · d156cfca

Len Baker authored Oct 24, 2021

In an effort to avoid open-coded arithmetic in the kernel [1], use the
flex_array_size() and struct_size() helpers instead of an open-coded
calculation.

[1] https://github.com/KSPP/linux/issues/160Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d156cfca

nvmet: register discovery subsystem as 'current' · 2953b30b

Hannes Reinecke authored Oct 18, 2021

Register the discovery subsystem as the 'current' discovery subsystem,
and add a new discovery log page entry for it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2953b30b

nvmet: switch check for subsystem type · 598e7593

Hannes Reinecke authored Oct 18, 2021

Invert the check for discovery subsystem type to allow for additional
discovery subsystem types.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

598e7593

nvme: add new discovery log page entry definitions · 785d584c

Hannes Reinecke authored Oct 18, 2021

TP8014 adds a new SUBTYPE value and a new field EFLAGS for the
discovery log page entry.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

785d584c

25 Oct, 2021 1 commit

block: ataflop: more blk-mq refactoring fixes · d28e4dff

Michael Schmitz authored Oct 24, 2021

As it turns out, my earlier patch in commit 86d46fda (block:
ataflop: fix breakage introduced at blk-mq refactoring) was
incomplete. This patch fixes any remaining issues found during
more testing and code review.

Requests exceeding 4 k are handled in 4k segments but
__blk_mq_end_request() is never called on these (still
sectors outstanding on the request). With redo_fd_request()
removed, there is no provision to kick off processing of the
next segment, causing requests exceeding 4k to hang. (By
setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround,
this behaviour can be avoided).

Instead of reintroducing redo_fd_request(), requeue the remainder
of the request by calling blk_mq_requeue_request() on incomplete
requests (i.e. when blk_update_request() still returns true), and
rely on the block layer to queue the residual as new request.

Both error handling and formatting needs to release the
ST-DMA lock, so call finish_fdc() on these (this was previously
handled by redo_fd_request()). finish_fdc() may be called
legitimately without the ST-DMA lock held - make sure we only
release the lock if we actually held it. In a similar way,
early exit due to errors in ataflop_queue_rq() must release
the lock.

After minor errors, fd_error sets up to recalibrate the drive
but never re-runs the current operation (another task handled by
redo_fd_request() before). Call do_fd_action() to get the next
steps (seek, retry read/write) underway.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Fixes: 6ec3938c (ataflop: convert to blk-mq)
CC: linux-block@vger.kernel.org
Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d28e4dff

22 Oct, 2021 1 commit

block: remove support for cryptoloop and the xor transfer · 47e96246

Christoph Hellwig authored Oct 19, 2021

Support for cyrptoloop has been officially marked broken and deprecated
in favor of dm-crypt (which supports the same broken algorithms if
needed) in Linux 2.6.4 (released in March 2004), and support for it has
been entirely removed from losetup in util-linux 2.23 (released in April
2013). The XOR transfer has never been more than a toy to demonstrate
the transfer in the bad old times of crypto export restrictions.
Remove them as they have some nasty interactions with loop device life
times due to the iteration over all loop devices in
loop_unregister_transfer.
Suggested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019075639.2333969-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

47e96246

21 Oct, 2021 9 commits

mtd: add add_disk() error handling · 83b863f4

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-10-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

83b863f4

rnbd: add error handling support for add_disk() · 2e9e31be

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-9-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2e9e31be

um/drivers/ubd_kern: add error handling support for add_disk() · 66638f16

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

ubd_disk_register() never returned an error, so just fix
that now and let the caller handle the error condition.
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-8-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

66638f16

m68k/emu/nfblock: add error handling support for add_disk() · 21fd880d

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-7-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

21fd880d

xen-blkfront: add error handling support for add_disk() · 293a7c52

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on device_add_disk() as this function
returned void. Now that this is fixed, use the shiny new error
handling. The function xlvbd_alloc_gendisk() typically does the
unwinding on error on allocating the disk and creating the tag,
but since all that error handling was stuffed inside
xlvbd_alloc_gendisk() we must repeat the tag free'ing as well.

We set the info->rq to NULL to ensure blkif_free() doesn't crash
on blk_mq_stop_hw_queues() on device_add_disk() error as the queue
will be long gone by then.
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-6-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

293a7c52

bcache: add error handling support for add_disk() · 2961c3bb

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

This driver doesn't do any unwinding with blk_cleanup_disk()
even on errors after add_disk() and so we follow that
tradition.
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-5-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2961c3bb

dm: add add_disk() error handling · e7089f65

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

e7089f65

block: aoe: fixup coccinelle warnings · ff06ed7e

Ye Guojin authored Oct 21, 2021

coccicheck complains about the use of snprintf() in sysfs show
functions:
WARNING  use scnprintf or sprintf

Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211021064931.1047687-1-ye.guojin@zte.com.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

ff06ed7e

Merge tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme into for-5.16/drivers · cbab6ae0

Jens Axboe authored Oct 21, 2021

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.16

 - fix a multipath partition scanning deadlock (Hannes Reinecke)
 - generate uevent once a multipath namespace is operational again
   (Hannes Reinecke)
 - support unique discovery controller NQNs (Hannes Reinecke)
 - fix use-after-free when a port is removed (Israel Rukshin)
 - clear shadow doorbell memory on resets (Keith Busch)
 - use struct_size (Len Baker)
 - add error handling support for add_disk (Luis Chamberlain)
 - limit the maximal queue size for RDMA controllers (Max Gurtovoy)
 - use a few more symbolic names (Max Gurtovoy)
 - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
 - add support for ->map_queues on FC (Saurav Kashyap)"

* tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme: (23 commits)
  nvmet: use struct_size over open coded arithmetic
  nvme: drop scan_lock and always kick requeue list when removing namespaces
  nvme-pci: clear shadow doorbell memory on resets
  nvme-rdma: fix error code in nvme_rdma_setup_ctrl
  nvme-multipath: add error handling support for add_disk()
  nvmet: use macro definitions for setting cmic value
  nvmet: use macro definition for setting nmic value
  nvme: display correct subsystem NQN
  nvme: Add connect option 'discovery'
  nvme: expose subsystem type in sysfs attribute 'subsystype'
  nvmet: set 'CNTRLTYPE' in the identify controller data
  nvmet: add nvmet_is_disc_subsys() helper
  nvme: add CNTRLTYPE definitions for 'identify controller'
  nvmet: make discovery NQN configurable
  nvmet-rdma: implement get_max_queue_size controller op
  nvmet: add get_max_queue_size op for controllers
  nvme-rdma: limit the maximal queue size for RDMA controllers
  nvmet-tcp: fix use-after-free when a port is removed
  nvmet-rdma: fix use-after-free when a port is removed
  nvmet: fix use-after-free when a port is removed
  ...

cbab6ae0

20 Oct, 2021 20 commits

nvmet: use struct_size over open coded arithmetic · 117d5b6d

Len Baker authored Oct 17, 2021

As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

In this case this is not actually dynamic size: all the operands
involved in the calculation are constant values. However it is better to
refactor this anyway, just to keep the open-coded math idiom out of
code.

So, use the struct_size() helper to do the arithmetic instead of the
argument "size + count * size" in the kmalloc() function.

This code was detected with the help of Coccinelle and audited and fixed
manually.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-argumentsSigned-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

117d5b6d

nvme: drop scan_lock and always kick requeue list when removing namespaces · 2b81a5f0

Hannes Reinecke authored Oct 20, 2021

When reading the partition table on initial scan hits an I/O error the
I/O will hang with the scan_mutex held:

[<0>] do_read_cache_page+0x49b/0x790
[<0>] read_part_sector+0x39/0xe0
[<0>] read_lba+0xf9/0x1d0
[<0>] efi_partition+0xf1/0x7f0
[<0>] bdev_disk_changed+0x1ee/0x550
[<0>] blkdev_get_whole+0x81/0x90
[<0>] blkdev_get_by_dev+0x128/0x2e0
[<0>] device_add_disk+0x377/0x3c0
[<0>] nvme_mpath_set_live+0x130/0x1b0 [nvme_core]
[<0>] nvme_mpath_add_disk+0x150/0x160 [nvme_core]
[<0>] nvme_alloc_ns+0x417/0x950 [nvme_core]
[<0>] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core]
[<0>] nvme_scan_work+0x168/0x310 [nvme_core]
[<0>] process_one_work+0x231/0x420

and trying to delete the controller will deadlock as it tries to grab
the scan mutex:

[<0>] nvme_mpath_clear_ctrl_paths+0x25/0x80 [nvme_core]
[<0>] nvme_remove_namespaces+0x31/0xf0 [nvme_core]
[<0>] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core]

As we're now properly ordering the namespace list there is no need to
hold the scan_mutex in nvme_mpath_clear_ctrl_paths() anymore.
And we always need to kick the requeue list as the path will be marked
as unusable and I/O will be requeued _without_ a current path.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2b81a5f0

nvme-pci: clear shadow doorbell memory on resets · 58847f12

Keith Busch authored Oct 14, 2021

The host memory doorbell and event buffers need to be initialized on
each reset so the driver doesn't observe stale values from the previous
instantiation.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Tested-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

58847f12

nvme-rdma: fix error code in nvme_rdma_setup_ctrl · 09748122

Max Gurtovoy authored Oct 17, 2021

In case that icdoff is not zero or mandatory keyed sgls are not
supported by the NVMe/RDMA target, we'll go to error flow but we'll
return 0 to the caller. Fix it by returning an appropriate error code.

Fixes: c66e2998 ("nvme-rdma: centralize controller setup sequence")
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

09748122

nvme-multipath: add error handling support for add_disk() · 11384580

Luis Chamberlain authored Oct 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Since we now can tell for sure when a disk was added, move
setting the bit NVME_NSHEAD_DISK_LIVE only when we did
add the disk successfully.

Nothing to do here as the cleanup is done elsewhere. We take
care and use test_and_set_bit() because it is protects against
two nvme paths simultaneously calling device_add_disk() on the
same namespace head.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

11384580

nvmet: use macro definitions for setting cmic value · d56ae18f

Max Gurtovoy authored Sep 23, 2021

This makes the code more readable.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d56ae18f

nvmet: use macro definition for setting nmic value · 571b5444

Max Gurtovoy authored Sep 23, 2021

This makes the code more readable.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

571b5444

nvme: display correct subsystem NQN · e5ea42fa

Hannes Reinecke authored Sep 22, 2021

With discovery controllers supporting unique subsystem NQNs the
actual subsystem NQN might be different from that one passed in
via the connect args. So add a helper to display the resulting
subsystem NQN.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e5ea42fa

nvme: Add connect option 'discovery' · 20e8b689

Hannes Reinecke authored Sep 22, 2021

Add a connect option 'discovery' to specify that the connection
should be made to a discovery controller, not a normal I/O controller.
With discovery controllers supporting unique subsystem NQNs we
cannot easily distinguish by the subsystem NQN if this should be
a discovery connection, but we need this information to blank out
options not supported by discovery controllers.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

20e8b689

nvme: expose subsystem type in sysfs attribute 'subsystype' · 954ae166

Hannes Reinecke authored Sep 22, 2021

With unique discovery controller NQNs we cannot distinguish the
subsystem type by the NQN alone, but need to check the subsystem
type, too.
So expose the subsystem type in a new sysfs attribute 'subsystype'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

954ae166

nvmet: set 'CNTRLTYPE' in the identify controller data · d3aef701

Hannes Reinecke authored Sep 22, 2021

Set the correct 'CNTRLTYPE' field in the identify controller data.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d3aef701

nvmet: add nvmet_is_disc_subsys() helper · a294711e

Hannes Reinecke authored Sep 22, 2021

Add a helper function to determine if a given subsystem is a discovery
subsystem.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a294711e

nvme: add CNTRLTYPE definitions for 'identify controller' · e15a8a97

Hannes Reinecke authored Sep 22, 2021

Update the 'identify controller' structure to define the newly added
CNTRLTYPE field.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e15a8a97

nvmet: make discovery NQN configurable · 626851e9

Hannes Reinecke authored Sep 22, 2021

TPAR8013 allows for unique discovery NQNs, so make the discovery
controller NQN configurable by exposing a subsys attribute
'discovery_nqn'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

626851e9

nvmet-rdma: implement get_max_queue_size controller op · c7d792f9

Max Gurtovoy authored Sep 23, 2021

Limit the maximal queue size for RDMA controllers. Today, the target
reports a limit of 1024 and this limit isn't valid for some of the RDMA
based controllers. For now, limit RDMA transport to 128 entries (the
max queue depth configured for Linux NVMe/RDMA host).

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.
Reported-by: Mark Ruijter <mruijter@primelogic.nl>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c7d792f9

nvmet: add get_max_queue_size op for controllers · 6d1555cc

Max Gurtovoy authored Sep 23, 2021

Some transports, such as RDMA, would like to set the queue size
according to device/port/ctrl characteristics. Add a new nvmet transport
op that is called during ctrl initialization. This will not effect
transports that don't implement this option.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

6d1555cc

nvme-rdma: limit the maximal queue size for RDMA controllers · 44c3c625

Max Gurtovoy authored Sep 23, 2021

Corrent limit of 1024 isn't valid for some of the RDMA based ctrls. In
case the target expose a cap of larger amount of entries (e.g. 1024),
the initiator may fail to create a QP with this size. Thus limit to a
value that works for all RDMA adapters.

Future general solution should use RDMA/core API to calculate this size
according to device capabilities and number of WRs needed per NVMe IO
request.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

44c3c625

nvmet-tcp: fix use-after-free when a port is removed · 2351ead9

Israel Rukshin authored Oct 06, 2021

When removing a port, all its controllers are being removed, but there
are queues on the port that doesn't belong to any controller (during
connection time). This causes a use-after-free bug for any command
that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
should be destroyed before freeing the port via configfs. Destroy
the remaining queues after the accept_work was cancelled guarantees
that no new queue will be created.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

2351ead9

nvmet-rdma: fix use-after-free when a port is removed · fcf73a80

Israel Rukshin authored Oct 06, 2021

When removing a port, all its controllers are being removed, but there
are queues on the port that doesn't belong to any controller (during
connection time). This causes a use-after-free bug for any command
that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
should be destroyed before freeing the port via configfs. Destroy the
remaining queues after the RDMA-CM was destroyed guarantees that no
new queue will be created.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

fcf73a80

nvmet: fix use-after-free when a port is removed · e3e19dcc

Israel Rukshin authored Oct 06, 2021

When a port is removed through configfs, any connected controllers
are starting teardown flow asynchronously and can still send commands.
This causes a use-after-free bug for any command that dereferences
req->port (like in nvmet_parse_io_cmd).

To fix this, wait for all the teardown scheduled works to complete
(like release_work at rdma/tcp drivers). This ensures there are no
active controllers when the port is eventually removed.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e3e19dcc