An error occurred fetching the project authors.
- 27 Jul, 2018 1 commit
-
-
Greg Edwards authored
After the bio has been updated to represent the remaining sectors, reset bi_done so bio_rewind_iter() does not rewind further than it should. This resolves a bio_integrity_process() failure on reads where the original request was split. Fixes: 63573e35 ("bio-integrity: Restore original iterator on verify stage") Signed-off-by:
Greg Edwards <gedwards@ddn.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 26 Jul, 2018 2 commits
-
-
Martin Wilck authored
bio_iov_iter_get_pages() currently only adds pages for the next non-zero segment from the iov_iter to the bio. That's suboptimal for callers, which typically try to pin as many pages as fit into the bio. This patch converts the current bio_iov_iter_get_pages() into a static helper, and introduces a new helper that allocates as many pages as 1) fit into the bio, 2) are present in the iov_iter, 3) and can be pinned by MM. Error is returned only if zero pages could be pinned. Because of 3), a zero return value doesn't necessarily mean all pages have been pinned. Callers that have to pin every page in the iov_iter must still call this function in a loop (this is currently the case). This change matters most for __blkdev_direct_IO_simple(), which calls bio_iov_iter_get_pages() only once. If it obtains less pages than requested, it returns a "short write" or "short read", and __generic_file_write_iter() falls back to buffered writes, which may lead to data corruption. Fixes: 72ecad22 ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Martin Wilck authored
If the last page of the bio is not "full", the length of the last vector slot needs to be corrected. This slot has the index (bio->bi_vcnt - 1), but only in bio->bi_io_vec. In the "bv" helper array, which is shifted by the value of bio->bi_vcnt at function invocation, the correct index is (nr_pages - 1). v2: improved readability following suggestions from Ming Lei. v3: followed a formatting suggestion from Christoph Hellwig. Fixes: 2cefe4db ("block: add bio_iov_iter_get_pages()") Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 19 Jun, 2018 1 commit
-
-
Bart Van Assche authored
Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit 0ba99ca4 clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit 0ba99ca4. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 12 Jun, 2018 1 commit
-
-
Kees Cook authored
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by:
Kees Cook <keescook@chromium.org>
-
- 08 Jun, 2018 1 commit
-
-
Jens Axboe authored
Add a helper that allows a caller to initialize a new bio_set, using the settings from an existing bio_set. Reported-by:
Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by:
Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by:
Li Wang <liwang@redhat.com> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 02 Jun, 2018 1 commit
-
-
Christoph Hellwig authored
For the upcoming removal of buffer heads in XFS we need to keep track of the number of outstanding writeback requests per page. For this we need to know if bio_add_page merged a region with the previous bvec or not. Instead of adding additional arguments this refactors bio_add_page to be implemented using three lower level helpers which users like XFS can use directly if they care about the merge decisions. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
- 30 May, 2018 1 commit
-
-
Kent Overstreet authored
All users have been converted to bioset_init(), kill off the old API. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 14 May, 2018 8 commits
-
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Recently found a bug where a driver left bi_next not NULL and then called bio_endio(), and then the submitter of the bio used bio_copy_data() which was treating src and dst as lists of bios. Fixed that bug by splitting out bio_list_copy_data(), but in case other things are depending on bi_next in weird ways, add a warning to help avoid more bugs like that in the future. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Since a bio can point to userspace pages (e.g. direct IO), this is generally necessary. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Found a bug (with ASAN) where we were passing a bio to bio_copy_data() with bi_next not NULL, when it should have been - a driver had left bi_next set to something after calling bio_endio(). Since the normal case is only copying single bios, split out bio_list_copy_data() to avoid more bugs like this in the future. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Add versions that take bvec_iter args instead of using bio->bi_iter - to be used by bcachefs. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Similarly to mempool_init()/mempool_exit(), take a pointer indirection out of allocation/freeing by allowing biosets to be embedded in other structs. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kent Overstreet authored
Minor performance improvement by getting rid of pointer indirections from allocation/freeing fastpaths. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 22 Mar, 2018 1 commit
-
-
Mikulas Patocka authored
I'm getting a slab named "biovec-(1<<(21-12))". It is caused by unintended expansion of the macro BIO_MAX_PAGES. This patch renames it to biovec-max. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 23 Jan, 2018 1 commit
-
-
Goldwyn Rodrigues authored
We inadvertently set it again on the source bio, but we need to set it on the new split bio instead. Fixes: fbbaf700 ("block: trace completion of all bios.") Signed-off-by:
Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 06 Jan, 2018 1 commit
-
-
Ming Lei authored
bcache is the only user of bio_alloc_pages(), so move this function into bcache, and avoid it being misused in the future. Also rename it to bch_bio_allo_pages() since it is bcache only. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 20 Dec, 2017 1 commit
-
-
Shaohua Li authored
If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal <vgoyal@redhat.com> Cc: stable@vger.kernel.org Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 22 Nov, 2017 1 commit
-
-
Mikulas Patocka authored
Remove useless assignment to the variable "split" because the variable is unconditionally assigned later. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 17 Nov, 2017 1 commit
-
-
Michael Lyle authored
A new field was introduced in 74d46992, bi_partno, instead of using bdev->bd_contains and encoding the partition information in the bi_bdev field. __bio_clone_fast was changed to copy the disk information, but not the partition information. At minimum, this regressed bcache and caused data corruption. Signed-off-by:
Michael Lyle <mlyle@lyle.org> Fixes: 74d46992 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reported-by:
Pavel Goran <via-bcache@pvgoran.name> Reported-by:
Campbell Steven <casteven@gmail.com> Reviewed-by:
Coly Li <colyli@suse.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 26 Oct, 2017 1 commit
-
-
Byungchul Park authored
Darrick posted the following warning and Dave Chinner analyzed it: > ====================================================== > WARNING: possible circular locking dependency detected > 4.14.0-rc1-fixes #1 Tainted: G W > ------------------------------------------------------ > loop0/31693 is trying to acquire lock: > (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs] > > but now in release context of a crosslock acquired at the following: > ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #2 ((complete)&ret.event){+.+.}: > lock_acquire+0xab/0x200 > wait_for_completion_io+0x4e/0x1a0 > submit_bio_wait+0x7f/0xb0 > blkdev_issue_zeroout+0x71/0xa0 > xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs] > xfs_bmapi_write+0x374/0x11f0 [xfs] > xfs_iomap_write_direct+0x2ac/0x430 [xfs] > xfs_file_iomap_begin+0x20d/0xd50 [xfs] > iomap_apply+0x43/0xe0 > dax_iomap_rw+0x89/0xf0 > xfs_file_dax_write+0xcc/0x220 [xfs] > xfs_file_write_iter+0xf0/0x130 [xfs] > __vfs_write+0xd9/0x150 > vfs_write+0xc8/0x1c0 > SyS_write+0x45/0xa0 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > -> #1 (&xfs_nondir_ilock_class){++++}: > lock_acquire+0xab/0x200 > down_write_nested+0x4a/0xb0 > xfs_ilock+0x263/0x330 [xfs] > xfs_setattr_size+0x152/0x370 [xfs] > xfs_vn_setattr+0x6b/0x90 [xfs] > notify_change+0x27d/0x3f0 > do_truncate+0x5b/0x90 > path_openat+0x237/0xa90 > do_filp_open+0x8a/0xf0 > do_sys_open+0x11c/0x1f0 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}: > up_write+0x1c/0x40 > xfs_iunlock+0x1d0/0x310 [xfs] > xfs_file_fallocate+0x8a/0x310 [xfs] > loop_queue_work+0xb7/0x8d0 > kthread_worker_fn+0xb9/0x1f0 > > Chain exists of: > &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event > > Possible unsafe locking scenario by crosslock: > > CPU0 CPU1 > ---- ---- > lock(&xfs_nondir_ilock_class); > lock((complete)&ret.event); > lock(&(&ip->i_mmaplock)->mr_lock); > unlock((complete)&ret.event); > > *** DEADLOCK *** The warning is a false positive, caused by the fact that all wait_for_completion()s in submit_bio_wait() are waiting with the same lock class. However, some bios have nothing to do with others, for example in the case of loop devices, there's no direct connection between the bios of an upper device and the bios of a lower device(=loop device). The safest way to assign different lock classes to different devices is to do it for each gendisk. In other words, this patch assigns a lockdep_map per gendisk and uses it when initializing completion in submit_bio_wait(). Analyzed-by:
Dave Chinner <david@fromorbit.com> Reported-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Byungchul Park <byungchul.park@lge.com> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-10-git-send-email-byungchul.park@lge.comSigned-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 25 Oct, 2017 1 commit
-
-
Christoph Hellwig authored
Simplify the code by getting rid of the submit_bio_ret structure. (This also helps address a lockdep false positive.) Signed-off-by:
Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-2-git-send-email-byungchul.park@lge.comSigned-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 16 Oct, 2017 1 commit
-
-
Randy Dunlap authored
Sphinx treats symbols that end with '_' as a kind of special documentation indicator, so fix that by adding an ending '*' to it. ../block/bio.c:404: ERROR: Unknown target name: "gfp". Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 11 Oct, 2017 14 commits
-
-
Al Viro authored
just need to copy it iter instead of iter->nr_segs Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
it's a bounce buffer; we don't *care* how badly is the real source/destination fragmented, all that matters is the total size. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
we do want *iter advanced Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
we want the one passed to it advanced, anyway Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... into bio_{map,copy}_user_iov() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Use iov_iter_npages() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... they might actually succeed in some cases (when we are at the queue-imposed segments limit, the next page is not mergable with the last one we'd got in, but the first page covered by the next iovec *is* mergable). Make sure that once it's failed, we are done with that bio. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... and to hell with iov_for_each() nonsense Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Since "block: support large requests in blk_rq_map_user_iov" we started to call it with partially drained iter; that works fine on the write side, but reads create a copy of iter for completion time. And that needs to take the possibility of ->iov_iter != 0 into account... Cc: stable@vger.kernel.org #v4.5+ Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
we need to take care of failure exit as well - pages already in bio should be dropped by analogue of bio_unmap_pages(), since their refcounts had been bumped only once per reference in bio. Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Vitaly Mayatskikh authored
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Cc: stable@vger.kernel.org Signed-off-by:
Vitaly Mayatskikh <v.mayatskih@gmail.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 06 Oct, 2017 1 commit
-
-
Tim Hansen authored
This patch removes redundant checks for null values on bio_pool and bvec_pool. Found using make coccicheck M=block/ on linux-net tree on the next-20170929 tag. Signed-off-by:
Tim Hansen <devtimhansen@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-