Commits · 98d34c087249d39838874b83e17671e7d5eb1ca7 · Kirill Smelkov / linux

02 Jul, 2024 4 commits

xen-blkfront: fix sector_size propagation to the block layer · 98d34c08

Christoph Hellwig authored Jun 25, 2024

Ensure that info->sector_size and info->physical_sector_size are set
before the call to blkif_set_queue_limits by doing away with the
local variables and arguments that propagate them.

Thanks to Marek Marczykowski-Górecki and Jürgen Groß for root causing
the issue.

Fixes: ba3f67c1 ("xen-blkfront: atomically update queue limits")
Reported-by: Rusty Bird <rustybird@net-c.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20240625055238.7934-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

98d34c08

null_blk: Fix description of the fua parameter · 1c0b3fca

Damien Le Moal authored Jul 02, 2024

The description of the fua module parameter is defined using
MODULE_PARM_DESC() with the first argument passed being "zoned". That is
the wrong name, obviously. Fix that by using the correct "fua" parameter
name so that "modinfo null_blk" displays correct information.

Fixes: f4f84586 ("null_blk: Introduce fua attribute")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240702073234.206458-1-dlemoal@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

1c0b3fca

block/mq-deadline: Fix the tag reservation code · 39823b47

Bart Van Assche authored May 09, 2024

The current tag reservation code is based on a misunderstanding of the
meaning of data->shallow_depth. Fix the tag reservation code as follows:
* By default, do not reserve any tags for synchronous requests because
  for certain use cases reserving tags reduces performance. See also
  Harshit Mogalapalli, [bug-report] Performance regression with fio
  sequential-write on a multipath setup, 2024-03-07
  (https://lore.kernel.org/linux-block/5ce2ae5d-61e2-4ede-ad55-551112602401@oracle.com/)
* Reduce min_shallow_depth to one because min_shallow_depth must be less
  than or equal any shallow_depth value.
* Scale dd->async_depth from the range [1, nr_requests] to [1,
  bits_per_sbitmap_word].

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Zhiguo Niu <zhiguo.niu@unisoc.com>
Fixes: 07757588 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240509170149.7639-3-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

39823b47

block: Call .limit_depth() after .hctx has been set · 6259151c

Bart Van Assche authored May 09, 2024

Call .limit_depth() after data->hctx has been set such that data->hctx can
be used in .limit_depth() implementations.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Zhiguo Niu <zhiguo.niu@unisoc.com>
Fixes: 07757588 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240509170149.7639-2-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

6259151c

01 Jul, 2024 4 commits

nvme: don't set io_opt if NOWS is zero · f3bf25d5

Christoph Hellwig authored Jul 01, 2024

NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
and we thus can't detect if it isn't set.  Thus a NOWS value of 0 means
that the Namespace Optimal Write Size is a single LBA, which is clearly
bogus.  Ignore the value in that case and don't propagate an io_opt
value to the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240701051800.1245240-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

f3bf25d5

block: don't reduce max_sectors based on io_opt · 37105615

Christoph Hellwig authored Jul 01, 2024

Don't reduce the max_sectors value below the normal cap when the driver
advertsizes a very low io_opt. This restores the behavior we had before
the recent changes to the max_sectors calculation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240701051800.1245240-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

37105615

block: remove a duplicate io_min check in blk_validate_limits · f62e8edc

Christoph Hellwig authored Jul 01, 2024

If io_min is larger than the cap, it must by definition be non-zero.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240701051800.1245240-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

f62e8edc

blk-wbt: don't throttle swap writes in direct reclaim · 4e63aeb5

Baokun Li authored Jun 04, 2024

Now we avoid throttling swap writes by determining whether the current
process is kswapd (aka current_is_kswapd()), but swap writes can come
from either kswapd or direct reclaim, so the swap writes from direct
reclaim will still be throttled.

When a process holds a lock to allocate a free page, and enters direct
reclaim because there is no free memory, then it might trigger a hung
due to the wbt throttling that causes other processes to fail to get
the lock.

Both kswapd and direct reclaim set the REQ_SWAP flag, so use REQ_SWAP
instead of current_is_kswapd() to avoid throttling swap writes. Also
renamed WBT_KSWAPD to WBT_SWAP and WBT_RWQ_KSWAPD to WBT_RWQ_SWAP.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240604030522.3686177-1-libaokun@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4e63aeb5

28 Jun, 2024 19 commits

block: pass a gendisk to the queue_sysfs_entry methods · 62e35f94

Christoph Hellwig authored Jun 27, 2024

The kobject for the queue entries is embedded into a struct gendisk.
Pass it to the sysfs methods instead of the request_queue derived from
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240627111407.476276-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

62e35f94

block: add helper macros to de-duplicate the queue sysfs attributes · 319e8cfd

Christoph Hellwig authored Jun 27, 2024

A lof the code to implement the queue sysfs attributes is repetitive.
Add a few macros to generate the common cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240627111407.476276-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

319e8cfd

block: simplify queue_logical_block_size · 5476394a

Christoph Hellwig authored Jun 27, 2024

queue_logical_block_size is never called with a 0 queue, and the
logical_block_size field in queue_limits is always initialized for
a live queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240627111407.476276-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

5476394a

blk-throttle: fix lower control under super low iops limit · 1beabab8

Yu Kuai authored Jun 18, 2024

User will configure allowed iops limit in 1s, and calculate_io_allowed()
will calculate allowed iops in the slice by:

limit * HZ / throtl_slice

However, if limit is quite low, the result can be 0, then
allowed IO in the slice is 0, this will cause missing dispatch and
control will be lower than limit.

For example, set iops_limit to 5 with HD disk, and test will found that
iops will be 3.

This is usually not a big deal, because user will unlikely to configure
such low iops limit, however, this is still a problem in the extreme
scene.

Fix the problem by making sure the wait time calculated by
tg_within_iops_limit() should allow at least one IO to be dispatched.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240618062108.3680835-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1beabab8

block: set bip_vcnt correctly · 3991657a

Anuj Gupta authored Jun 26, 2024

Set the bip_vcnt correctly in bio_integrity_init_user and
bio_integrity_copy_user. If the bio gets split at a later point,
this value is required to set the right bip_vcnt in the cloned bio.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240626100700.3629-3-anuj20.g@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3991657a

rust: block: fix generated bindings after refactoring of features · 5b026e34

Andreas Hindborg authored Jun 28, 2024

Block device features and flags were refactored from `enum` to `#define`.
This broke Rust binding generation. This patch fixes the binding
generation.

Fixes: fcf865e3 ("block: convert features and flags to __bitwise types")
Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20240628091152.2185241-1-nmi@metaspace.dkSigned-off-by: Jens Axboe <axboe@kernel.dk>

5b026e34

rnbd-cnt: don't set QUEUE_FLAG_SAME_FORCE · caffa7cd

Christoph Hellwig authored Jun 27, 2024

QUEUE_FLAG_SAME_FORCE has been set by rnbd-cnt since the initial
merge.  There is no good reason for a driver to force exact core
delivery, which is tunable for very specific workloads and not a
driver setting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

caffa7cd

rnbd: don't set QUEUE_FLAG_SAME_COMP · 40988f15

Christoph Hellwig authored Jun 27, 2024

QUEUE_FLAG_SAME_COMP is already set by default.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

40988f15

mpt3sas_scsih: don't set QUEUE_FLAG_NOMERGES · 8b77f23f

Christoph Hellwig authored Jun 27, 2024

Setting QUEUE_FLAG_NOMERGES was added in commit d1b01d14 ("scsi:
mpt3sas: Set NVMe device queue depth as 128") without any explanation.
Drivers should second guess the block layer merge decisions, so remove
the flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

8b77f23f

megaraid_sas: don't set QUEUE_FLAG_NOMERGES · aa57abe6

Christoph Hellwig authored Jun 27, 2024

Setting QUEUE_FLAG_NOMERGES was added in commit 15dd0381
("scsi: megaraid_sas: NVME Interface detection and prop settings")
without any explanation.  Drivers should second guess the block
layer merge decisions, so remove the flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

aa57abe6

loop: don't set QUEUE_FLAG_NOMERGES · 667ea363

Christoph Hellwig authored Jun 27, 2024

QUEUE_FLAG_NOMERGES isn't really a driver interface, but a user tunable.
There also isn't any good reason to set it in the loop driver.

The original commit adding it (5b5e20f4 "block: loop: set
QUEUE_FLAG_NOMERGES for request queue of loop") claims that "It doesn't
make sense to enable merge because the I/O submitted to backing file is
handled page by page."  which of course isn't true for multi-page bvec
now, and it never has been for direct I/O, for which commit 40326d8a
("block/loop: allow request merge for directio mode") alredy disabled
the nomerges flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

667ea363

block: check bio alignment in blk_mq_submit_bio · 0676c434

Ming Lei authored Jun 20, 2024

IO logical block size is one fundamental queue limit, and every IO has
to be aligned with logical block size because our bio split can't deal
with unaligned bio.

The check has to be done with queue usage counter grabbed because device
reconfiguration may change logical block size, and we can prevent the
reconfiguration from happening by holding queue usage counter.

logical_block_size stays in the 1st cache line of queue_limits, and this
cache line is always fetched in fast path via bio_may_exceed_limits(),
so IO perf won't be affected by this check.

Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ye Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240620030631.3114026-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0676c434

block: Add ioprio to block_rq tracepoint · aa6ff4eb

Dongliang Cui authored Jun 14, 2024

Sometimes we need to track the processing order of requests with
ioprio set. So the ioprio of request can be useful information.

Example：

block_rq_insert: 8,0 RA 16384 () 6500840 + 32 be,0,6 [binder:815_3]
block_rq_issue: 8,0 RA 16384 () 6500840 + 32 be,0,6 [binder:815_3]
block_rq_complete: 8,0 RA () 6500840 + 32 be,0,6 [0]
Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240614074936.113659-1-dongliang.cui@unisoc.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

aa6ff4eb

block: remove bio_integrity_process · d19b4634

Christoph Hellwig authored Jun 26, 2024

Move the bvec interation into the generate/verify helpers to avoid a bit
of argument passing churn.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240626045950.189758-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

d19b4634

block: switch on bio operation in bio_integrity_prep · df3c485e

Christoph Hellwig authored Jun 26, 2024

Use a single switch to perform read and write specific checks and exit
early for other operations instead of having two checks using different
predicates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240626045950.189758-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

df3c485e

block: remove allocation failure warnings in bio_integrity_prep · dac18fab

Christoph Hellwig authored Jun 26, 2024

Allocation failures already leave a stack trace, so don't spew another
warning.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240626045950.189758-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

dac18fab

block: simplify adding the payload in bio_integrity_prep · c096df90

Christoph Hellwig authored Jun 26, 2024

bio_integrity_add_page can add physically contiguous regions of any size,
so don't bother chunking up the kmalloced buffer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240626045950.189758-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

c096df90

block: only zero non-PI metadata tuples in bio_integrity_prep · c546d6f4

Christoph Hellwig authored Jun 26, 2024

The PI generation helpers already zero the app tag, so relax the zeroing
to non-PI metadata.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240626045950.189758-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

c546d6f4

bcache: work around a __bitwise to bool conversion sparse warning · f1e46758

Christoph Hellwig authored Jun 28, 2024

Sparse is a bit dumb about bitwise operation on __bitwise types used
in boolean contexts.  Add a !! to explicitly propagate to boolean
without a warning.

Fixes: fcf865e3 ("block: convert features and flags to __bitwise types")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240628131657.667797-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

f1e46758

27 Jun, 2024 4 commits

loop: Fix a race between loop detach and loop open · 18048c1a

Gulam Mohamed authored Jun 18, 2024

1. Userspace sends the command "losetup -d" which uses the open() call
   to open the device
2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
   function loop_clr_fd()
3. If LOOP_CLR_FD is the first command received at the time, then the
   AUTOCLEAR flag is not set and deletion of the
   loop device proceeds ahead and scans the partitions (drop/add
   partitions)

        if (disk_openers(lo->lo_disk) > 1) {
                lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
                loop_global_unlock(lo, true);
                return 0;
        }

 4. Before scanning partitions, it will check to see if any partition of
    the loop device is currently opened
 5. If any partition is opened, then it will return EBUSY:

    if (disk->open_partitions)
                return -EBUSY;
 6. So, after receiving the "LOOP_CLR_FD" command and just before the above
    check for open_partitions, if any other command
    (like blkid) opens any partition of the loop device, then the partition
    scan will not proceed and EBUSY is returned as shown in above code
 7. But in "__loop_clr_fd()", this EBUSY error is not propagated
 8. We have noticed that this is causing the partitions of the loop to
    remain stale even after the loop device is detached resulting in the
    IO errors on the partitions

Fix:

Defer the detach of loop device to release function, which is called when
the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR at
the time of detach i.e in loop_clr_fd() function.

Test case involves the following two scripts:

script1.sh:

while [ 1 ];
do
        losetup -P -f /home/opt/looptest/test10.img
        blkid /dev/loop0p1
done

script2.sh:

while [ 1 ];
do
        losetup -d /dev/loop0
done

Without fix, the following IO errors have been observed:

kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
        phys_seg 1 prio class 0
kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
        phys_seg 1 prio class 0
kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
        read
Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240618164042.343777-1-gulam.mohamed@oracle.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

18048c1a

block: Delete blk_queue_flag_test_and_set() · 63db4a1f

John Garry authored Jun 27, 2024

Since commit 70200574 ("block: remove QUEUE_FLAG_DISCARD"),
blk_queue_flag_test_and_set() has not been used, so delete it.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240627160735.842189-1-john.g.garry@oracle.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

63db4a1f

block: clean up the check in blkdev_iomap_begin() · e269537e

Li Nan authored Jun 25, 2024

It is odd to check the offset amidst a series of assignments. Moving this
check to the beginning of the function makes the code look better.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240625115517.1472120-1-linan666@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e269537e

block: use the right type for stub rq_integrity_vec() · 69b65176

Jens Axboe authored Jun 26, 2024

For !CONFIG_BLK_DEV_INTEGRITY, rq_integrity_vec() wasn't updated
properly. Fix it up.

Fixes: cf546dd2 ("block: change rq_integrity_vec to respect the iterator")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

69b65176

26 Jun, 2024 9 commits

block: move dma_pad_mask into queue_limits · e94b45d0

Christoph Hellwig authored Jun 26, 2024

dma_pad_mask is a queue_limits by all ways of looking at it, so move it
there and set it through the atomic queue limits APIs.

Add a little helper that takes the alignment and pad into account to
simplify the code that is touched a bit.

Note that there never was any need for the > check in
blk_queue_update_dma_pad, this probably was just copy and paste from
dma_update_dma_alignment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-9-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

e94b45d0

block: remove the fallback case in queue_dma_alignment · abfc9d81

Christoph Hellwig authored Jun 26, 2024

Now that all updates go through blk_validate_limits the default of 511
is set at initialization time.  Also remove the unused NULL check as
calling this helper on a NULL queue can't happen (and doesn't make
much sense to start with).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-8-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

abfc9d81

block: remove disk_update_readahead · 73781b3b

Christoph Hellwig authored Jun 26, 2024

Mark blk_apply_bdi_limits non-static and open code disk_update_readahead
in the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

73781b3b

block: conding style fixup for blk_queue_max_guaranteed_bio · 3302f6f0

Christoph Hellwig authored Jun 26, 2024

"static" never goes on a line of its own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

3302f6f0

block: convert features and flags to __bitwise types · fcf865e3

Christoph Hellwig authored Jun 26, 2024

... and let sparse help us catch mismatches or abuses.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

fcf865e3

block: rename BLK_FEAT_MISALIGNED · ec9b1cf0

Christoph Hellwig authored Jun 26, 2024

This is a flag for ->flags and not a feature for ->features. And fix the
one place that actually incorrectly cleared it from ->features.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

ec9b1cf0

block: correctly report cache type · 78887d00

Christoph Hellwig authored Jun 26, 2024

Check the features flag and the override flag using the
blk_queue_write_cache, helper otherwise we're going to always
report "write through".

Fixes: 1122c0c1 ("block: move cache control settings out of queue->flags")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240626142637.300624-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

78887d00

md: set md-specific flags for all queue limits · 573d5abf

Christoph Hellwig authored Jun 26, 2024

The md driver wants to enforce a number of flags for all devices, even
when not inheriting them from the underlying devices. To make sure these
flags survive the queue_limits_set calls that md uses to update the
queue limits without deriving them form the previous limits add a new
md_init_stacking_limits helper that calls blk_set_stacking_limits and sets
these flags.

Fixes: 1122c0c1 ("block: move cache control settings out of queue->flags")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240626142637.300624-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

573d5abf

block: change rq_integrity_vec to respect the iterator · cf546dd2

Mikulas Patocka authored May 27, 2024

If we allocate a bio that is larger than NVMe maximum request size,
attach integrity metadata to it and send it to the NVMe subsystem, the
integrity metadata will be corrupted.

Splitting the bio works correctly. The function bio_split will clone the
bio, trim the iterator of the first bio and advance the iterator of the
second bio.

However, the function rq_integrity_vec has a bug - it returns the first
vector of the bio's metadata and completely disregards the metadata
iterator that was advanced when the bio was split. Thus, the second bio
uses the same metadata as the first bio and this leads to metadata
corruption.

This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec
instead of returning the first vector. mp_bvec_iter_bvec reads the
iterator and uses it to build a bvec for the current position in the
iterator.

The "queue_max_integrity_segments(rq->q) > 1" check was removed, because
the updated rq_integrity_vec function works correctly with multiple
segments.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/49d1afaa-f934-6ed2-a678-e0d428c63a65@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

cf546dd2