- 23 Feb, 2017 4 commits
-
-
Omar Sandoval authored
Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Scott Bauer authored
During an error on a comannd, ex: user provides wrong pw to unlock range, we will gracefully terminate the opal session. We want to propagate the original error to userland instead of the result of the session termination, which is almost always a success. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Scott Bauer authored
A device may change capabilities after each reset, e.g. due to a firmware upgrade. We should thus check for Security Send/Receive and OPAL support after each reset. Based on patches from Christoph and Keith. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Scott Bauer authored
Before we free the opal structure we need to clean up any saved locking ranges that the user had told us to unlock from a suspend. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 22 Feb, 2017 27 commits
-
-
Daniel Roschka authored
Adds support for detection of the NVMe controller found in the following recent MacBooks: - Retina MacBook 2016 (MacBook9,1) - 13" MacBook Pro 2016 without Touch Bar (MacBook13,1) - 13" MacBook Pro 2016 with Touch Bar (MacBook13,2) Signed-off-by: Daniel Roschka <danielroschka@phoenitydawn.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
This will enable the user to control the specific interface for connection establishment in case the host has more than 1 interface under the same subnet. E.g: Host interfaces configured as: - ib0 1.1.1.1/16 - ib1 1.1.1.2/16 Target interfaces configured as: - ib0 1.1.1.3/16 (listener interface) - ib1 1.1.1.4/16 the following connect command will go through host iface ib0 (default): nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 but the following command will go through host iface ib1: nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 -w 1.1.1.2 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Christophe JAILLET authored
According to the preceeding goto, it is likely that 'out_destroy_sq' was expected here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
Also remove redundant debug prints. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
This will enable the usage for nvme rdma target. Also move from a lookup array to a switch statement. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
James Smart authored
Discovery controllers don't set the values. They are in reserved areas of the Identify Controller data structure. Given the cmd completed, the minimal capsule sizes are supported, so no need to check nqn to detect discovery controllers and special case validations. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
This driver previously required we have a special check for IO submitted to nvme IO queues that are temporarily suspended. That is no longer necessary since blk-mq provides a quiesce, so any IO that actually gets submitted to such a queue must be ended since the queue isn't going to start back up. This is fixing a condition where we have fewer IO queues after a controller reset. This may happen if the number of CPU's has changed, or controller firmware update changed the queue count, for example. While it may be possible to complete the IO on a different queue, the block layer does not provide a way to resubmit a request on a different hardware context once the request has entered the queue. We don't want these requests to be stuck indefinitely either, so ending them in error is our only option at the moment. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
If a namespace has already been marked dead, we don't want to kick the request_queue again since we may have just freed it from another thread. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
If the device is not present, the driver should disable the queues immediately. Prior to this, the driver was relying on the watchdog timer to kill the queues if requests were outstanding to the device, and that just delays removal up to one second. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Andy Lutomirski authored
NVMe devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. The maximum acceptable latency is controlled using dev_pm_qos (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational states with total latency greater than this value will not be used. As a special case, setting the latency tolerance to 0 will disable APST entirely. On hardware without APST support, the sysfs file will not be exposed. The latency tolerance for newly-probed devices is set by the module parameter nvme_core.default_ps_max_latency_us. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. This feature is quirked off on a known-buggy Samsung device. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Andy Lutomirski authored
Currently, all NVMe quirks are based on PCI IDs. Add a mechanism to define quirks based on identify_ctrl's vendor id, model number, and/or firmware revision. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Johannes Thumshirn authored
nvmf_create_ctrl() relys on the presence of a create_crtl callback in the registered nvmf_transport_ops, so make nvmf_register_transport require one. Update the available call-sites as well to reflect these changes. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Parav Pandit authored
This patch defines CNS field as 8-bit field and avoids cpu_to/from_le conversions. Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS for readability (don't rely on the fact that NVME_ID_CNS_NS = 0). Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
No need to dereference req twice to get the cmd when we already have it stored in a local variable. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
Easier for debugging and testing state machine transitions. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
We usually log the cntlid which is confusing in case we have multiple subsystems each with it's own cntlid ida. Instead make cntlid ida globally unique and log the initial association. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
James Smart authored
Cleanup of abort flag processing in fcp_op_done. References were unnecessary Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Colin Ian King authored
trivial fix to spelling mistake in pr_err message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
The wording in the entries were poor and not understandable by even deities. Kill the selection for default block scheduler, and impose a policy with sane defaults. Architected-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
By embedding the function data with the function sequence, we can eliminate the external function data and state variable code. It also made obvious some other small cleanups. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Tobin C. Harding authored
Coccinelle emits a warning about casting the return value of kmalloc(). Coccinelle suggests removing the cast as do kerneljanitors. Remove cast from kmalloc() call. Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Tobin C. Harding authored
Checkpatch emits ERROR:OPEN_BRACE: that open brace { should be on the previous line. Move open brace to new line. Also add space after if/switch statement since we introduce more checkpatch errors if not fixed at the same time. Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Tobin C. Harding authored
Checkpatch emits 85 trailing whitespace warnings. Remove trailing whitespace. Signed-off-by: Tobin C. Harding <me@tobin.cc> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
Add a buffer size check against discovery and response header lengths before we loop over their buffers. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
Add helper which verifies the response token is valid and matches the expected value. Merges token_type and response_get_token. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jon Derrick authored
The short atom parser can return an errno from decoding but does not currently return the error as a signed value. Convert all of the parsers to ssize_t. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 21 Feb, 2017 9 commits
-
-
Omar Sandoval authored
The end_device and sas_host devices support BSG ioctls, but the request_queue allocated for them isn't set up to allocate the struct scsi_request payload. This leads to memory corruption in the call to scsi_req_init() in bsg_map_hdr(), since it will memset past the end of the allocated request. Fix it by setting ->cmd_size on the allocated request_queue. Fixes: 82ed4db4 ("block: split scsi_request out of struct request") Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
When a device gets removed, block device inode unhashed so that it is not used anymore (bdget() will not find it anymore). Later when a new device gets created with the same device number, we create new block device inode. However there may be file system device inodes whose i_bdev still points to the original block device inode and thus we get two active block device inodes for the same device. They will share the same gendisk so the only visible differences will be that page caches will not be coherent and BDIs will be different (the old block device inode still points to unregistered BDI). Fix the problem by checking in bd_acquire() whether i_bdev still points to active block device inode and re-lookup the block device if not. That way any open of a block device happening after the old device has been removed will get correct block device inode. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Josef Bacik authored
If we fail to register the blockdev we need to make sure to destroy the recv workqueue. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Josef Bacik authored
We noticed when trying to do O_DIRECT to an export on the server side that we were getting requests smaller than the 4k sectorsize of the device. This is because the client isn't setting the logical and physical blocksizes properly for the underlying device. Fix this up by setting the queue blocksizes and then calling bd_set_size. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Josef Bacik authored
Break the ioctl handling out into helper functions, some of these things are getting pretty big and unwieldy. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Without this drivers that don't clear the state themselves can see off effects. For example Hyper-V VMs using the storvsc driver will often hang during boot due to uncleared Test Unit Ready failures. Fixes: e9c787e6 ("scsi: allocate scsi_cmnd structures as part of struct request") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block layer updates from Jens Axboe: - blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12. - Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others. - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API. - Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar. - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. - Support for ranged discard requests and discard merging, also from Christoph. - Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks. - Error code fixup for gdrom from Christophe. - cciss pci irq allocation cleanup from Christoph. - Making the cdrom device operations read only, from Kees Cook. - Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan. - Set of fixes and updates for lightnvm, from Matias and Javier. - A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd. - Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device. - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it. - SG gap merging fix from Ming Lei for block. - Loop fix also from Ming, fixing a race and crash between setting loop status and IO. - Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations. - Double free fix from cgroup writeback, from Tejun. - Another double free fix in blkcg, from Hou Tao. - Partition overflow fix for EFI from Alden Tondettar. * tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits) nvme: Check for Security send/recv support before issuing commands. block/sed-opal: allocate struct opal_dev dynamically block/sed-opal: tone down not supported warnings block: don't defer flushes on blk-mq + scheduling blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers blk-mq: don't special case flush inserts for blk-mq-sched blk-mq-sched: don't add flushes to the head of requeue queue blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not block: do not allow updates through sysfs until registration completes lightnvm: set default lun range when no luns are specified lightnvm: fix off-by-one error on target initialization Maintainers: Modify SED list from nvme to block Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN uapi: sed-opal fix IOW for activate lsp to use correct struct cdrom: Make device operations read-only elevator: fix loading wrong elevator type for blk-mq devices cciss: switch to pci_irq_alloc_vectors block/loop: fix race between I/O and set_status blk-mq-sched: don't hold queue_lock when calling exit_icq block: set make_request_fn manually in blk_mq_update_nr_hw_queues ...
-