- 06 Dec, 2016 9 commits
-
-
Max Gurtovoy authored
Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Bart Van Assche authored
Adjust indentation such that arguments are aligned. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Bart Van Assche authored
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Solganik Alexander authored
When removing a namespace we delete it from the subsystem namespaces list with list_del_init which allows us to know if it is enabled or not. The problem is that list_del_init initialize the list next and does not respect the RCU list-traversal we do on the IO path for locating a namespace. Instead we need to use list_del_rcu which is allowed to run concurrently with the _rcu list-traversal primitives (keeps list next intact) and guarantees concurrent nvmet_find_naespace forward progress. By changing that, we cannot rely on ns->dev_link for knowing if the namspace is enabled, so add enabled indicator entry to nvmet_ns for that. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Cc: <stable@vger.kernel.org> # v4.8+
-
Bart Van Assche authored
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Bart Van Assche authored
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Samuel Jones authored
Queue size needs to respect the Maximum Queue Entries Supported advertised by the controller in its Capability register. Signed-off-by: Samuel Jones <sjones@kalray.eu> Reviewed-by: Christoph Hellwig <hch@lst.de> [sagig: fixed queue_size adjustment according to Daniel Verkamp <daniel.verkamp@intel.com> comment] Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
Bart Van Assche authored
nvmet_sq_init() returns a value <= 0. nvmet_rdma_cm_reject() expects a second argument that is a NVME_RDMA_CM_* constant. Hence this patch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
-
- 03 Dec, 2016 3 commits
-
-
Jens Axboe authored
We have this: ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined! ERROR: "__divdi3" [drivers/block/nbd.ko] undefined! nbd.c:(.text+0x247c72): undefined reference to `__divdi3' due to a recent commit, that did 64-bit division. Use the proper divider function so that 32-bit compiles don't break. Fixes: ef77b515 ("nbd: use loff_t for blocksize and nbd_set_size args") Signed-off-by: Jens Axboe <axboe@fb.com>
-
Josef Bacik authored
If we have large devices (say like the 40t drive I was trying to test with) we will end up overflowing the int arguments to nbd_set_size and not get the right size for our device. Fix this by using loff_t everywhere so I don't have to think about this again. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Shaohua Li authored
Signed-off-by: Shaohua Li <shli@fb.com> Fixes: cf43e6be ("block: add scalable completion tracking of requests") Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 01 Dec, 2016 9 commits
-
-
Ritesh Harjani authored
Factor out common code for setting REQ_NOMERGE flag which is being used out at certain places and make it a helper instead, req_set_nomerge(). Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Get rid of the inline. Signed-off-by: Jens Axboe <axboe@fb.com>
-
Rabin Vincent authored
If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Cc: stable@vger.kernel.org Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Pan Bian authored
Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function mtip_block_initialize(), variable rv takes the return value, and its value should be negative on errors. rv is initialized as 0 and is not reset when the call to ida_pre_get() fails. So 0 may be returned. The return value 0 indicates that there is no error, which may be inconsistent with the execution status. This patch fixes the bug by explicitly assigning -ENOMEM to rv on the branch that ida_pre_get() fails. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Chaitanya Kulkarni authored
Add support for handling write zeroes command on target. Call into __blkdev_issue_zeroout, which the block layer expands into the best suitable variant of zeroing the LBAs. Allow write zeroes operation to deallocate the LBAs when calling __blkdev_issue_zeroout. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Chaitanya Kulkarni authored
Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block device, if the device supports optional command bit set for write zeroes. Add support to setup write zeroes command. Set maximum possible write zeroes sectors in one write zeroes command according to nvme write zeroes command definition. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Chaitanya Kulkarni authored
Add the command structure, optional command set support (ONCS) bit and a new error code for the Write Zeroes command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Chaitanya Kulkarni authored
This adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future, this should also help with improving the way zeroing discards work. For this operation, suitable entry is exported in sysfs which indicate the number of maximum bytes allowed in one write zeroes operation by the device. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Chaitanya Kulkarni authored
Similar to __blkdev_issue_discard this variant allows submitting the final bio asynchronously and chaining multiple ranges into a single completion. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Damien Le Moal authored
Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of a zoned block device. However, the first and last zones reported for a partition make sense only if the partition start sector and size are aligned on the device zone size. The same applies for zone reset. Resetting the first or the last zone of a partition straddling zones may impact neighboring partitions. Finally, if a partition start sector is not at the beginning of a sequential zone, it will be impossible to write to the first sectors of the partition on a host-managed device. Avoid all these problems and incoherencies by ignoring partitions that are not zone aligned. Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the correct disk zoning type (host-aware, host-managed or none) but bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone size is unknown). So test this as a way to ensure that a zoned block device is being handled as such. As a result, for a host-aware devices, unaligned zone partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the disk will be treated as a regular block device (as it should). If zoned block device support is enabled, only aligned partitions will be accepted. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 29 Nov, 2016 19 commits
-
-
Javier González authored
Since targets are given a virtual target device, it is necessary to translate all communication between targets and the backend device. Implement the translation layer for get/set bad block table. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
On target-specific operations pass on nvm_tgt_dev instead of the generic nvm device. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Target devices do not have access to the device driver operations. Introduce a helper function that exposes the max. number of physical sectors supported by the underlying device. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Avoid calling media manager and device-specific operations directly from rrpc. Create helper functions on lightnvm's core instead. Signed-off-by: Javier González <javier@cnexlabs.com> Made it work with null_blk as well. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
In order to naturally support multi-target instances on an Open-Channel SSD, targets should own the LUNs they get blocks from and manage provisioning internally. This is done in several steps. Since targets own the LUNs the are instantiated on top of and manage the free block list internally, there is no need for a LUN abstraction in the media manager. LUNs are intrinsically managed as in the physical layout (ch:0,lun:0, ..., ch:0,lun:n, ch:1,lun:0, ch:1,lun:n, ..., ch:m,lun:0, ch:m,lun:n) and given to the targets based on the target creation ioctl. This simplifies LUN management and clears the path for a partition manager to sit directly underneath LightNVM targets. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
In order to naturally support multi-target instances on an Open-Channel SSD, targets should own the LUNs they get blocks from and manage provisioning internally. This is done in several steps. A part of this transformation is that targets manage their blocks internally. This patch eliminates the nvm_block abstraction and moves block management to the target logic. The rrpc target is transformed. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Since LUNs are managed internally on targets, the media manager has no access to the free LUN lists. Thus, debug functions that show LUN information on the device should not be implemented on the media manager, but rather on the target in itself. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Since LUNs are managed internally on the target, there is no need for the media manager to implement a get_lun operation. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
In order to naturally support multi-target instances on an Open-Channel SSD, targets should own the LUNs they get blocks from and manage provisioning internally. This is done in several steps. This patch moves the block provisioning inside of the target and removes the get/put block interface from the media manager. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
LUNs are exclusively owned by targets implementing a block device FTL. Doing this reservation requires at the moment a 2-way callback gennvm <-> target. The reason behind this is that LUNs were not assumed to always be exclusively owned by targets. However, this design decision goes against I/O determinism QoS (two targets would mix I/O on the same parallel unit in the device). This patch makes LUN reservation as part of the target creation on the media manager. This makes that LUNs are always exclusively owned by the target instantiated on top of them. LUN stripping and/or sharing should be implemented on the target itself or the layers on top. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
The gen_lun abstraction in the generic media manager was conceived on the assumption that a single target would instantiated on top of it. This has complicated target design to implement multi-instances. Remove this abstraction and move its logic to nvm_lun, which manages physical lun geometry and operations. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
There is a constant to refer to free blocks. Use it when marking bad blocks instead of using a constant value Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Before vectored I/Os were supported on rrpc, the physical address was stored as part of the nvm_rqd request. This variable become obsolete when the ppa_list was introduced. Cleanup this variable. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Targets are assumed to used the same generic ppa format, where the address is partitioned on ch:lun:block:pg:pl:sec. Thus, make the function in charge of transforming the ppa address from a linear format to the generic one available to all targets. This function will be needed by the media manager in order to do target mapping translations when targets are divided on different physical partitions. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Cleanup definition leftovers from old gennvm interface Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
LightNVM used to be managed and configured through sysfs. Since the introduction of management ioctls this interface is redundant and outdated. Get rid of it. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
rrpc cannot handle bios of size > 256kb due to NVMe using a 64 bit bitmap to signal I/O completion. If a larger bio comes, split it explicitly. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Add ECC error codes to enable the appropriate handling in the target. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Javier González authored
Bad blocks should be managed by block owners. This would be either targets for data blocks or sysblk for system blocks. In order to support this, export two functions: One to mark a block as an specific type (e.g., bad block) and another to update the bad block table on the device. Move bad block management to rrpc. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-