Commits · 5ab189cf3abbc9994bae3be524c5b88589ed56e2 · Kirill Smelkov / linux

28 Jul, 2021 1 commit

blk-iocost: fix operation ordering in iocg_wake_fn() · 5ab189cf

Tejun Heo authored Jul 27, 2021

iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it
wants the wq_entry to be always removed whether it ended up waking the
task or not. finish_wait() tests whether wq_entry needs removal without
grabbing the wait_queue lock and expects the waker to use
list_del_init_careful() after all waking operations are complete, which
iocg_wake_fn() didn't do. The operation order was wrong and the regular
list_del_init() was used.

The result is that if a waiter wakes up racing the waker, it can free pop
the wq_entry off stack before the waker is still looking at it, which can
lead to a backtrace like the following.

  [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP
  ...
  [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0
  ...
  [7312084.858314] Call Trace:
  [7312084.863548]  _raw_spin_lock_irqsave+0x22/0x30
  [7312084.872605]  try_to_wake_up+0x4c/0x4f0
  [7312084.880444]  iocg_wake_fn+0x71/0x80
  [7312084.887763]  __wake_up_common+0x71/0x140
  [7312084.895951]  iocg_kick_waitq+0xe8/0x2b0
  [7312084.903964]  ioc_rqos_throttle+0x275/0x650
  [7312084.922423]  __rq_qos_throttle+0x20/0x30
  [7312084.930608]  blk_mq_make_request+0x120/0x650
  [7312084.939490]  generic_make_request+0xca/0x310
  [7312084.957600]  submit_bio+0x173/0x200
  [7312084.981806]  swap_readpage+0x15c/0x240
  [7312084.989646]  read_swap_cache_async+0x58/0x60
  [7312084.998527]  swap_cluster_readahead+0x201/0x320
  [7312085.023432]  swapin_readahead+0x2df/0x450
  [7312085.040672]  do_swap_page+0x52f/0x820
  [7312085.058259]  handle_mm_fault+0xa16/0x1420
  [7312085.066620]  do_page_fault+0x2c6/0x5c0
  [7312085.074459]  page_fault+0x2f/0x40

Fix it by switching to list_del_init_careful() and putting it at the end.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rik van Riel <riel@surriel.com>
Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>

5ab189cf

27 Jul, 2021 1 commit

blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling · b93af305

John Garry authored Jul 27, 2021

If the blk_mq_sched_alloc_tags() -> blk_mq_alloc_rqs() call fails, then we
call blk_mq_sched_free_tags() -> blk_mq_free_rqs().

It is incorrect to do so, as any rqs would have already been freed in the
blk_mq_alloc_rqs() call.

Fix by calling blk_mq_free_rq_map() only directly.

Fixes: 6917ff0b ("blk-mq-sched: refactor scheduler initialization")
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1627378373-148090-1-git-send-email-john.garry@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b93af305

23 Jul, 2021 1 commit

loop: reintroduce global lock for safe loop_validate_file() traversal · 3ce6e1f6

Tetsuo Handa authored Jul 06, 2021

Commit 6cc8e743 ("loop: scale loop device by introducing per
device lock") re-opened a race window for NULL pointer dereference at
loop_validate_file() where commit 310ca162 ("block/loop: Use
global lock for ioctl() operation.") has closed.

Although we need to guarantee that other loop devices will not change
during traversal, we can't take remote "struct loop_device"->lo_mutex
inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore,
introduce a global lock dedicated for loop_validate_file() which is
conditionally taken before local "struct loop_device"->lo_mutex is taken.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cc8e743 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3ce6e1f6

22 Jul, 2021 1 commit

Merge tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme into block-5.14 · 7054133d

Jens Axboe authored Jul 22, 2021

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.14:

 - tracing fix (Keith Busch)
 - fix multipath head refcounting (Hannes Reinecke)
 - Write Zeroes vs PI fix (me)
 - drop a bogus WARN_ON (Zhihao Cheng)"

* tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme:
  nvme: set the PRACT bit when using Write Zeroes with T10 PI
  nvme: fix nvme_setup_command metadata trace event
  nvme: fix refcounting imbalance when all paths are down
  nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING

7054133d

21 Jul, 2021 4 commits

nvme: set the PRACT bit when using Write Zeroes with T10 PI · aaeb7bb0

Christoph Hellwig authored Jul 21, 2021

When using Write Zeroes on a namespace that has protection
information enabled they behavior without the PRACT bit
counter-intuitive and will generally lead to validation failures
when reading the written blocks.  Fix this by always setting the
PRACT bit that generates matching PI data on the fly.

Fixes: 6e02318e ("nvme: add support for the Write Zeroes command")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

aaeb7bb0

nvme: fix nvme_setup_command metadata trace event · 234211b8

Keith Busch authored Jul 19, 2021

The metadata address is set after the trace event, so the trace is not
capturing anything useful. Rather than logging the memory address, it's
useful to know if the command carries a metadata payload, so change the
trace event to log that true/false state instead.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

234211b8

nvme: fix refcounting imbalance when all paths are down · 5396fdac

Hannes Reinecke authored Jul 16, 2021

When the last path to a ns_head drops the current code
removes the ns_head from the subsystem list, but will only
delete the disk itself if the last reference to the ns_head
drops. This is causing an refcounting imbalance eg when
applications have a reference to the disk, as then they'll
never get notified that the disk is in fact dead.
This patch moves the call 'del_gendisk' into nvme_mpath_check_last_path(),
ensuring that the disk can be properly removed and applications get the
appropriate notifications.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

5396fdac

nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING · 7764656b

Zhihao Cheng authored Jul 05, 2021

Followling process:
nvme_probe
  nvme_reset_ctrl
    nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)
    queue_work(nvme_reset_wq, &ctrl->reset_work)

-------------->	nvme_remove
		  nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING)
worker_thread
  process_one_work
    nvme_reset_work
    WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)

, which will trigger WARN_ON in nvme_reset_work():
[  127.534298] WARNING: CPU: 0 PID: 139 at drivers/nvme/host/pci.c:2594
[  127.536161] CPU: 0 PID: 139 Comm: kworker/u8:7 Not tainted 5.13.0
[  127.552518] Call Trace:
[  127.552840]  ? kvm_sched_clock_read+0x25/0x40
[  127.553936]  ? native_send_call_func_single_ipi+0x1c/0x30
[  127.555117]  ? send_call_function_single_ipi+0x9b/0x130
[  127.556263]  ? __smp_call_single_queue+0x48/0x60
[  127.557278]  ? ttwu_queue_wakelist+0xfa/0x1c0
[  127.558231]  ? try_to_wake_up+0x265/0x9d0
[  127.559120]  ? ext4_end_io_rsv_work+0x160/0x290
[  127.560118]  process_one_work+0x28c/0x640
[  127.561002]  worker_thread+0x39a/0x700
[  127.561833]  ? rescuer_thread+0x580/0x580
[  127.562714]  kthread+0x18c/0x1e0
[  127.563444]  ? set_kthread_struct+0x70/0x70
[  127.564347]  ret_from_fork+0x1f/0x30

The preceding problem can be easily reproduced by executing following
script (based on blktests suite):
test() {
  pdev="$(_get_pci_dev_from_blkdev)"
  sysfs="/sys/bus/pci/devices/${pdev}"
  for ((i = 0; i < 10; i++)); do
    echo 1 > "$sysfs/remove"
    echo 1 > /sys/bus/pci/rescan
  done
}

Since the device ctrl could be updated as an non-RESETTING state by
repeating probe/remove in userspace (which is a normal situation), we
can replace stack dumping WARN_ON with a warnning message.

Fixes: 82b057ca ("nvme-pci: fix multiple ctrl removal schedulin")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>

7764656b

17 Jul, 2021 1 commit

block: increase BLKCG_MAX_POLS · ec645dc9

Oleksandr Natalenko authored Jul 17, 2021

After mq-deadline learned to deal with cgroups, the BLKCG_MAX_POLS value
became too small for all the elevators to be registered properly. The
following issue is seen:

```
calling  bfq_init+0x0/0x8b @ 1
blkcg_policy_register: BLKCG_MAX_POLS too small
initcall bfq_init+0x0/0x8b returned -28 after 507 usecs
```

which renders BFQ non-functional.

Increase BLKCG_MAX_POLS to allow enough space for everyone.

Fixes: 08a9ad8b ("block/mq-deadline: Add cgroup support")
Link: https://lore.kernel.org/lkml/8988303.mDXGIdCtx8@natalenko.name/Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Link: https://lore.kernel.org/r/20210717123328.945810-1-oleksandr@natalenko.nameSigned-off-by: Jens Axboe <axboe@kernel.dk>

ec645dc9

15 Jul, 2021 4 commits

xen-blkfront: sanitize the removal state machine · 05d69d95

Christoph Hellwig authored Jul 15, 2021

xen-blkfront has a weird protocol where close message from the remote
side can be delayed, and where hot removals are treated somewhat
differently from regular removals, all leading to potential NULL
pointer removals, and a del_gendisk from the block device release
method, which will deadlock. Fix this by just performing normal hot
removals even when the device is opened like all other Linux block
drivers.

Fixes: c76f48eb ("block: take bd_mutex around delete_partitions in del_gendisk")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

05d69d95

Merge tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme into block-5.14 · a347c153

Jens Axboe authored Jul 15, 2021

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.14

 - fix various races in nvme-pci when shutting down just after probing
   (Casey Chen)
 - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)"

* tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme:
  nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
  nvme-pci: fix multiple races in nvme_setup_io_queues
  nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE

a347c153

nbd: fix order of cleaning up the queue and freeing the tagset · 16ad3db3

Wang Qing authored Jul 06, 2021

We must release the queue before freeing the tagset.

Fixes: 4af5f2e0 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk")
Reported-and-tested-by: syzbot+9ca43ff47167c0ee3466@syzkaller.appspotmail.com
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210706040016.1360412-1-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>

16ad3db3

pd: fix order of cleaning up the queue and freeing the tagset · 58b63e0f

Guoqing Jiang authored Jul 06, 2021

We must release the queue before freeing the tagset.

Fixes: 262d431f ("pd: use blk_mq_alloc_disk and blk_cleanup_disk")
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210706010734.1356066-1-guoqing.jiang@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>

58b63e0f

13 Jul, 2021 3 commits

nvme-pci: do not call nvme_dev_remove_admin from nvme_remove · 251ef6f7

Casey Chen authored Jul 07, 2021

nvme_dev_remove_admin could free dev->admin_q and the admin_tagset
while they are being accessed by nvme_dev_disable(), which can be called
by nvme_reset_work via nvme_remove_dead_ctrl.

Commit cb4bfda6 ("nvme-pci: fix hot removal during error handling")
intended to avoid requests being stuck on a removed controller by killing
the admin queue. But the later fix c8e9e9b7 ("nvme-pci: unquiesce
admin queue on shutdown"), together with nvme_dev_disable(dev, true)
right before nvme_dev_remove_admin() could help dispatch requests and
fail them early, so we don't need nvme_dev_remove_admin() any more.

Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

251ef6f7

nvme-pci: fix multiple races in nvme_setup_io_queues · e4b9852a

Casey Chen authored Jul 07, 2021

Below two paths could overlap each other if we power off a drive quickly
after powering it on. There are multiple races in nvme_setup_io_queues()
because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.

nvme_reset_work()                                nvme_remove()
  nvme_setup_io_queues()                           nvme_dev_disable()
  ...                                              ...
A1  clear NVMEQ_ENABLED bit for admin queue          lock
    retry:                                       B1  nvme_suspend_io_queues()
A2    pci_free_irq() admin queue                 B2  nvme_suspend_queue() admin queue
A3    pci_free_irq_vectors()                         nvme_pci_disable()
A4    nvme_setup_irqs();                         B3    pci_free_irq_vectors()
      ...                                            unlock
A5    queue_request_irq() for admin queue
      set NVMEQ_ENABLED bit
      ...
      nvme_create_io_queues()
A6      result = queue_request_irq();
        set NVMEQ_ENABLED bit
      ...
      fail to allocate enough IO queues:
A7      nvme_suspend_io_queues()
        goto retry

If B3 runs in between A1 and A2, it will crash if irqaction haven't
been freed by A2. B2 is supposed to free admin queue IRQ but it simply
can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.

Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
gets cleared.

After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
is checking irqaction. A3 also could race with B2 if B2 is freeing
IRQ while A3 is checking irqaction.

Fix: A2 and A3 take lock for mutual exclusion.

A3 could race with B3 since they could run free_msi_irqs() in parallel.

Fix: A3 takes lock for mutual exclusion.

A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
interrupted by B3.

Fix: A4 takes lock for mutual exclusion.

If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
They are just allocated by A5/A6.

Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.

A7 could get chance to pci_free_irq() for certain IO queue while B3 is
checking irqaction.

Fix: A7 takes lock.

nvme_dev->online_queues need to be protected by shutdown_lock. Since it
is not atomic, both paths could modify it using its own copy.
Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

e4b9852a

nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE · 8b43ced6

Prabhakar Kushwaha authored Jul 13, 2021

dev_get_by_name() finds network device by name but it also increases the
reference count.

If a nvme-tcp queue is present and the network device driver is removed
before nvme_tcp, we will face the following continuous log:

  "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2"

And rmmod further halts. Similar case arises during reboot/shutdown
with nvme-tcp queue present and both never completes.

To fix this, use __dev_get_by_name() which finds network device by
name without increasing any reference counter.

Fixes: 3ede8f72 ("nvme-tcp: allow selecting the network interface for connections")
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
[hch: remove the ->ndev member entirely]
Signed-off-by: Christoph Hellwig <hch@lst.de>

8b43ced6

07 Jul, 2021 3 commits

blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs · a731763f

Yu Kuai authored Jul 07, 2021

We run a test that create millions of cgroups and blkgs, and then trigger
blkg_destroy_all(). blkg_destroy_all() will hold spin lock for a long
time in such situation. Thus release the lock when a batch of blkgs are
destroyed.

blkcg_activate_policy() and blkcg_deactivate_policy() might have the
same problem, however, as they are basically only called from module
init/exit paths, let's leave them alone for now.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210707015649.1929797-1-yukuai3@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a731763f

block: fix the problem of io_ticks becoming smaller · d80c228d

Chunguang Xu authored Jul 06, 2021

On the IO submission path, blk_account_io_start() may interrupt
the system interruption. When the interruption returns, the value
of part->stamp may have been updated by other cores, so the time
value collected before the interruption may be less than part->
stamp. So when this happens, we should do nothing to make io_ticks
more accurate? For kernels less than 5.0, this may cause io_ticks
to become smaller, which in turn may cause abnormal ioutil values.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d80c228d

Merge branch 'nvme-5.14' of git://git.infradead.org/nvme into block-5.14 · c6af8db9

Jens Axboe authored Jul 07, 2021

Pull single NVMe fix from Christoph.

* 'nvme-5.14' of git://git.infradead.org/nvme:
  nvme-tcp: can't set sk_user_data without write_lock

c6af8db9

05 Jul, 2021 1 commit

nvme-tcp: can't set sk_user_data without write_lock · 0755d3be

Maurizio Lombardi authored Jul 02, 2021

The sk_user_data pointer is supposed to be modified only while
holding the write_lock "sk_callback_lock", otherwise
we could race with other threads and crash the kernel.

we can't take the write_lock in nvmet_tcp_state_change()
because it would cause a deadlock, but the release_work queue
will set the pointer to NULL later so we can simply remove
the assignment.

Fixes: b5332a9f ("nvmet-tcp: fix incorrect locking in state_change sk callback")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

0755d3be

02 Jul, 2021 1 commit

loop: remove unused variable in loop_set_status() · 585af8ed

Tetsuo Handa authored Jul 03, 2021

Commit 0384264e ("block: pass a gendisk to bdev_disk_changed")
changed to pass lo->lo_disk instead of lo->lo_device.

Fixes: 0384264e ("block: pass a gendisk to bdev_disk_changed")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20210702152714.7978-1-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: Jens Axboe <axboe@kernel.dk>

585af8ed

01 Jul, 2021 5 commits

block: remove the bdgrab in blk_drop_partitions · 63c38d85

Christoph Hellwig authored Jul 01, 2021

There is no need to hold a bdev reference when removing the partition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210701081638.246552-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

63c38d85

block: grab a device refcount in disk_uevent · 498dcc13

Christoph Hellwig authored Jul 01, 2021

Sending uevents requires the struct device to be alive.  To
ensure that grab the device refcount instead of just an inode
reference.

Fixes: bc359d03 ("block: add a disk_uevent helper")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210701081638.246552-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

498dcc13

s390/dasd: Avoid field over-reading memcpy() · 2b7a8dc0

Kees Cook authored Jul 01, 2021

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field array bounds checking for memcpy(), memmove(), and memset(),
avoid intentionally reading across neighboring array fields.

Add a wrapping structure to serve as the memcpy() source, so the compiler
can do appropriate bounds checking, avoiding this future warning:

In function '__fortify_memcpy',
    inlined from 'create_uid' at drivers/s390/block/dasd_eckd.c:749:2:
./include/linux/fortify-string.h:246:4: error: call to '__read_overflow2_field' declared with attribute error: detected read beyond size of field (2nd parameter)
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210701142221.3408680-3-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2b7a8dc0

dasd: unexport dasd_set_target_state · 299f2b5f

Christoph Hellwig authored Jul 01, 2021

dasd_set_target_state is only used inside of dasd_mod.ko, so don't
export it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210701142221.3408680-2-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

299f2b5f

block: check disk exist before trying to add partition · b5cfbd35

Yufen Yu authored Jun 10, 2021

If disk have been deleted, we should return fail for ioctl
BLKPG_DEL_PARTITION. Otherwise, the directory /sys/class/block
may remain invalid symlinks file. The race as following:

blkdev_open
				del_gendisk
				    disk->flags &= ~GENHD_FL_UP;
				    blk_drop_partitions
blkpg_ioctl
    bdev_add_partition
    add_partition
        device_add
	    device_add_class_symlinks

ioctl may add_partition after del_gendisk() have tried to delete
partitions. Then, symlinks file will be created.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Link: https://lore.kernel.org/r/20210610023241.3646241-1-yuyufen@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b5cfbd35

30 Jun, 2021 14 commits

ubd: remove dead code in ubd_setup_common · efee99e6

Christoph Hellwig authored Jun 28, 2021

Remove some leftovers of the fake major number parsing that cause
complains from some compilers.

Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

efee99e6

nvme: use return value from blk_execute_rq() · ae5e6886

Keith Busch authored Jun 10, 2021

We don't have an nvme status to report if the driver's .queue_rq()
returns an error without dispatching the requested nvme command. Check
the return value from blk_execute_rq() for all passthrough commands so
the caller may know their command was not successful.

If the command is from the target passthrough interface and fails to
dispatch, synthesize the response back to the host as a internal target
error.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

ae5e6886

block: return errors from blk_execute_rq() · fb9b16e1

Keith Busch authored Jun 10, 2021

The synchronous blk_execute_rq() had not provided a way for its callers
to know if its request was successful or not. Return the blk_status_t
result of the request.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

fb9b16e1

nvme: use blk_execute_rq() for passthrough commands · be42a33b

Keith Busch authored Jun 10, 2021

The generic blk_execute_rq() knows how to handle polled completions. Use
that instead of implementing an nvme specific handler.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

be42a33b

block: support polling through blk_execute_rq · c01b5a81

Keith Busch authored Jun 10, 2021

Poll for completions if the request's hctx is a polling type.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-2-kbusch@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

c01b5a81

block: remove REQ_OP_SCSI_{IN,OUT} · da6269da

Christoph Hellwig authored Jun 24, 2021

With the legacy IDE driver gone drivers now use either REQ_OP_DRV_*
or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests
into a single one.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

da6269da

block: mark blk_mq_init_queue_data static · 5ec780a6

Christoph Hellwig authored Jun 24, 2021

All driver uses are gone now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

5ec780a6

loop: rewrite loop_exit using idr_for_each_entry · 8e60947d

Christoph Hellwig authored Jun 23, 2021

Use idr_for_each_entry to simplify removing all devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210623145908.92973-10-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

8e60947d

loop: split loop_lookup · b9848081

Christoph Hellwig authored Jun 23, 2021

loop_lookup has two callers - one wants to do the a find by index and the
other wants any unbound loop device. Open code the respective
functionality in each caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210623145908.92973-9-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

b9848081

loop: don't allow deleting an unspecified loop device · e5d66a10

Christoph Hellwig authored Jun 23, 2021

Passing a negative index to loop_lookup while return any unbound device.
Doing that for a delete does not make much sense, so add check to
explicitly reject that case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210623145908.92973-8-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

e5d66a10

loop: move loop_ctl_mutex locking into loop_add · 18d1f200

Christoph Hellwig authored Jun 23, 2021

Move acquiring and releasing loop_ctl_mutex from the callers into
loop_add.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210623145908.92973-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

18d1f200

loop: split loop_control_ioctl · f9d10764

Christoph Hellwig authored Jun 23, 2021

Split loop_control_ioctl into a helper for each command. This keeps the
code nicely separated for the upcoming locking changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

f9d10764

loop: don't call loop_lookup before adding a loop device · 4157fe0b

Christoph Hellwig authored Jun 23, 2021

loop_add returns the right error if the slot wasn't available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210623145908.92973-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

4157fe0b

loop: remove the l argument to loop_add · d6da83d0

Christoph Hellwig authored Jun 23, 2021

None of the callers cares about the allocated struct loop_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210623145908.92973-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

d6da83d0