- 19 Aug, 2015 1 commit
-
-
Keith Busch authored
The SG_GAPS queue flag caused checks for bio vector alignment against PAGE_SIZE, but the device may have different constraints. This patch adds a queue limits so a driver with such constraints can set to allow requests that would have been unnecessarily split. The new gaps check takes the request_queue as a parameter to simplify the logic around invoking this function. This new limit makes the queue flag redundant, so removing it and all usage. Device-mappers will inherit the correct settings through blk_stack_limits(). Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 18 Aug, 2015 2 commits
-
-
Jeff Moyer authored
A value of 2560 (1280k) will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress across a number of different storage devices, a value of 1280 does not show a big performance difference from 512, but will hopefully help software RAID setups using SATA disks, as reported by Christoph. NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to BLK_DEF_MAX_SECTORS. So, this patch essentially changes aeoblk to Use a larger maximum sector size, and I did not test this. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jeff Moyer authored
This reverts commit 34b48db6. That commit caused performance regressions for streaming I/O workloads on a number of different storage devices, from SATA disks to external RAID arrays. It also managed to trip up some buggy firmware in at least one drive, causing data corruption. The next patch will bump the default max_sectors_kb value to 1280, which will accommodate a 10-data-disk stripe write with chunk size 128k. In the testing I've done using iozone, fio, and aio-stress, a value of 1280 does not show a big performance difference from 512. This will hopefully still help the software RAID setup that Christoph saw the original performance gains with while still not regressing other storage configurations. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 15 Aug, 2015 2 commits
-
-
Ming Lei authored
Inside timeout handler, blk_mq_tag_to_rq() is called to retrieve the request from one tag. This way is obviously wrong because the request can be freed any time and some fiedds of the request can't be trusted, then kernel oops might be triggered[1]. Currently wrt. blk_mq_tag_to_rq(), the only special case is that the flush request can share same tag with the request cloned from, and the two requests can't be active at the same time, so this patch fixes the above issue by updating tags->rqs[tag] with the active request(either flush rq or the request cloned from) of the tag. Also blk_mq_tag_to_rq() gets much simplified with this patch. Given blk_mq_tag_to_rq() is mainly for drivers and the caller must make sure the request can't be freed, so in bt_for_each() this helper is replaced with tags->rqs[tag]. [1] kernel oops log [ 439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M [ 439.697162] IP: [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M [ 439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M [ 439.700653] Dumping ftrace buffer:^M [ 439.700653] (ftrace buffer empty)^M [ 439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M [ 439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M [ 439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M [ 439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M [ 439.730500] RIP: 0010:[<ffffffff812d89ba>] [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.730500] RSP: 0018:ffff880819203da0 EFLAGS: 00010283^M [ 439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M [ 439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M [ 439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M [ 439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M [ 439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M [ 439.730500] FS: 00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M [ 439.730500] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M [ 439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M [ 439.730500] Stack:^M [ 439.730500] 0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M [ 439.755663] ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M [ 439.755663] Call Trace:^M [ 439.755663] <IRQ> ^M [ 439.755663] [<ffffffff812dc104>] bt_for_each+0x6e/0xc8^M [ 439.755663] [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M [ 439.755663] [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M [ 439.755663] [<ffffffff812dc1b3>] blk_mq_tag_busy_iter+0x55/0x5e^M [ 439.755663] [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M [ 439.755663] [<ffffffff812d8911>] blk_mq_rq_timer+0x5d/0xd4^M [ 439.755663] [<ffffffff810a3e10>] call_timer_fn+0xf7/0x284^M [ 439.755663] [<ffffffff810a3d1e>] ? call_timer_fn+0x5/0x284^M [ 439.755663] [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M [ 439.755663] [<ffffffff810a46d6>] run_timer_softirq+0x1ce/0x1f8^M [ 439.755663] [<ffffffff8104c367>] __do_softirq+0x181/0x3a4^M [ 439.755663] [<ffffffff8104c76e>] irq_exit+0x40/0x94^M [ 439.755663] [<ffffffff81031482>] smp_apic_timer_interrupt+0x33/0x3e^M [ 439.755663] [<ffffffff815559a4>] apic_timer_interrupt+0x84/0x90^M [ 439.755663] <EOI> ^M [ 439.755663] [<ffffffff81554350>] ? _raw_spin_unlock_irq+0x32/0x4a^M [ 439.755663] [<ffffffff8106a98b>] finish_task_switch+0xe0/0x163^M [ 439.755663] [<ffffffff8106a94d>] ? finish_task_switch+0xa2/0x163^M [ 439.755663] [<ffffffff81550066>] __schedule+0x469/0x6cd^M [ 439.755663] [<ffffffff8155039b>] schedule+0x82/0x9a^M [ 439.789267] [<ffffffff8119b28b>] signalfd_read+0x186/0x49a^M [ 439.790911] [<ffffffff8106d86a>] ? wake_up_q+0x47/0x47^M [ 439.790911] [<ffffffff811618c2>] __vfs_read+0x28/0x9f^M [ 439.790911] [<ffffffff8117a289>] ? __fget_light+0x4d/0x74^M [ 439.790911] [<ffffffff811620a7>] vfs_read+0x7a/0xc6^M [ 439.790911] [<ffffffff8116292b>] SyS_read+0x49/0x7f^M [ 439.790911] [<ffffffff81554c17>] entry_SYSCALL_64_fastpath+0x12/0x6f^M [ 439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89 f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b 53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10 ^M [ 439.790911] RIP [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.790911] RSP <ffff880819203da0>^M [ 439.790911] CR2: 0000000000000158^M [ 439.790911] ---[ end trace d40af58949325661 ]---^M Cc: <stable@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
There may be lots of pending requests so that the buffer of PAGE_SIZE can't hold them at all. One typical example is scsi-mq, the queue depth(.can_queue) of scsi_host and blk-mq is quite big but scsi_device's queue_depth is a bit small(.cmd_per_lun), then it is quite easy to have lots of pending requests in hw queue. This patch fixes the following warning and the related memory destruction. [ 359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M [ 359.055595] irq event stamp: 15537^M [ 359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M [ 359.055614] Dumping ftrace buffer:^M [ 359.055660] (ftrace buffer empty)^M [ 359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M [ 359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M [ 359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M [ 359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M [ 359.055693] RIP: 0010:[<ffffffff811541c5>] [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M Cc: <stable@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 13 Aug, 2015 12 commits
-
-
Dongsu Park authored
Update block/biovecs.txt so that it includes a note on what kind of effects arbitrarily sized bios would bring to the block layer. Also fix a trivial typo, bio_iter_iovec. Cc: Christoph Hellwig <hch@infradead.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
Call pre-defined helper bio_add_page() instead of open coding for iterating through bi_io_vec[]. Doing that, it's possible to make some parts in filesystems and mm/page_io.c simpler than before. Acked-by: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now performed by blk_queue_split() in md_make_request(). Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
If a read request fits entirely in a chunk, it will be passed directly to the underlying device (providing it hasn't failed of course). If it doesn't fit, the slightly less efficient path that uses the stripe_cache is used. Requests that get to the stripe cache are always completely split up as necessary. So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work, but could cause it to take the less efficient path more often. All that is needed to manage this is for 'chunk_aligned_read' do some bio splitting, much like the RAID0 code does. Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (1<<31)>>9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
Btrfs has been doing bio splitting from btrfs_map_bio(), by checking device limits as well as calling ->merge_bvec_fn() etc. That is not necessary any more, because generic_make_request() is now able to handle arbitrarily sized bios. So clean up unnecessary code paths. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Chris Mason <clm@fb.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
The bcache driver has always accepted arbitrarily large bios and split them internally. Now that every driver must accept arbitrarily large bios this code isn't nessecary anymore. Cc: linux-bcache@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
Since generic_make_request() can now handle arbitrary size bios, all we have to do is make sure the bvec array doesn't overflow. __bio_add_page() doesn't need to call ->merge_bvec_fn(), where we can get rid of unnecessary code paths. Removing the call to ->merge_bvec_fn() is also fine, as no driver that implements support for BLOCK_PC commands even has a ->merge_bvec_fn() method. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: rebase and resolve merge conflicts, change a couple of comments, make bio_add_page() warn once upon a cloned bio.] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Kent Overstreet authored
The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Viresh Kumar authored
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 11 Aug, 2015 1 commit
-
-
Sasha Levin authored
Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few dereferences of 'bio' after a call to bio_put(). This causes use-after-frees such as: [521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714 [521120.720638] Read of size 4 by task mount.ocfs2/9644 [521120.721212] ============================================================================= [521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected [521120.722968] ----------------------------------------------------------------------------- [521120.722968] [521120.723915] Disabling lock debugging due to kernel taint [521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080 [521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800 [521120.726037] [521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............ [521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................ [521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................ [521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................ [521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6.... [521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..<.... [521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 ................ [521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G B 4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420 [521120.744274] ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0 [521120.745025] ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00 [521120.745908] ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854 [521120.747063] Call Trace: [521120.747520] dump_stack (lib/dump_stack.c:52) [521120.748053] print_trailer (mm/slub.c:653) [521120.748582] object_err (mm/slub.c:660) [521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194) [521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250) [521120.753580] dio_bio_complete (fs/direct-io.c:478) [521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291) [521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322) [521120.761658] blkdev_direct_IO (fs/block_dev.c:162) [521120.762993] generic_file_read_iter (mm/filemap.c:1738) [521120.767405] blkdev_read_iter (fs/block_dev.c:1649) [521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434) [521120.772126] vfs_read (fs/read_write.c:454) [521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594) [521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186) [521120.777375] Memory state around the buggy address: [521120.778118] ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.779211] ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.781465] ^ [521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [521120.784818] ================================================================== This patch fixes a few of those places that I caught while auditing the patch, but the original patch should be audited further for more occurences of this issue since I'm not too familiar with the code. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 29 Jul, 2015 3 commits
-
-
Jens Axboe authored
Commit bcf2843b3f8f added ->bi_error to cleanup the error passing for struct bio, but that ended up adding 4 bytes and a 4 byte hole to the size of struct bio. For a clean config, that bumped it from 128 bytes, to 136 bytes, on x86-64. The ->bi_flags member is currently an unsigned long, but it fits easily within an int. Change it to an unsigned int, adjust the the pool offset code, and move ->bi_error into the new hole. Then we end up with a 128 byte bio again. Change the bio flag set/clear to use cmpxchg to ensure we don't lose any flags when manipulating them. Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately. Signed-off-by: Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 17 Jul, 2015 7 commits
-
-
Jens Axboe authored
Lots of devices support huge discard sizes these days. Depending on how the device handles them internally, huge discards can introduce massive latencies (hundreds of msec) on the device side. We have a sysfs file, discard_max_bytes, that advertises the max hardware supported discard size. Make this writeable, and split the settings into a soft and hard limit. This can be set from 'discard_granularity' and up to the hardware limit. Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw set limit. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Some drivers use it now, others just set the limits field manually. But in preparation for splitting this into a hard and soft limit, ensure that they all call the proper function for setting the hw limit for discards. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
Percpu refcount is the perfect match for partition's case, and the conversion is quite straight. With the convertion, one pair of atomic inc/dec can be saved for accounting block I/O, which is run in hot path of block I/O. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
So the helper can be used in both generic partition case and part0 case. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management and ACPI fixes from Rafael Wysocki: "These fix two bugs in the cpufreq core (including one recent regression), fix a 4.0 PCI regression related to the ACPI resources management and quieten an RCU-related lockdep complaint about a tracepoint in the suspend-to-idle code. Specifics: - Fix a recently introduced issue in the cpufreq policy object reinitialization that leads to CPU offline/online breakage (Viresh Kumar) - Make it possible to access frequency tables of offline CPUs which is needed by thermal management code among other things (Viresh Kumar) - Fix an ACPI resource management regression introduced during the 4.0 cycle that may cause incorrect resource validation results to appear in 32-bit x86 kernels due to silent truncation of 64-bit values to 32-bit (Jiang Liu) - Fix up an RCU-related lockdep complaint about suspicious RCU usage in idle caused by using a suspend tracepoint in the core suspend- to-idle code (Rafael J Wysocki)" * tag 'pm+acpi-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel cpufreq: Allow freq_table to be obtained for offline CPUs cpufreq: Initialize the governor again while restoring policy suspend-to-idle: Prevent RCU from complaining about tick_freeze()
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Fix SMBIOS call handling and hwswitch state coherency in the dell-laptop driver. Cleanups for intel_*_ipc drivers. Details: dell-laptop: - Do not cache hwswitch state - Check return value of each SMBIOS call - Clear buffer before each SMBIOS call intel_scu_ipc: - Move local memory initialization out of a mutex intel_pmc_ipc: - Update kerneldoc formatting - Fix compiler casting warnings" * tag 'platform-drivers-x86-v4.2-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_scu_ipc: move local memory initialization out of a mutex intel_pmc_ipc: Update kerneldoc formatting dell-laptop: Do not cache hwswitch state dell-laptop: Check return value of each SMBIOS call dell-laptop: Clear buffer before each SMBIOS call intel_pmc_ipc: Fix compiler casting warnings
-
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommuLinus Torvalds authored
Pull m68knommu/coldfire fixes from Greg Ungerer: "Contains build fixes and updates for the ColdFire defconfigs. Specifically there is a couple of fixes that address problems building allnoconfig. Also fix for enabling PCI bus on the M54xx family of ColdFire" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: enable PCI support for m5475evb defconfig m68k: fix io functions for ColdFire/MMU/PCI case m68knommu: update defconfig for ColdFire m5475evb m68knommu: update defconfig for ColdFire m5407c3 m68knommu: update defconfig for ColdFire m5307c3 m68knommu: update defconfig for ColdFire m5275evb m68knommu: update defconfig for ColdFire m5272c3 m68knommu: update defconfig for ColdFire m5249evb m68knommu: update defconfig for m5208evb m68knommu: make ColdFire SoC selection a choice m68knommu: improve the clock configuration defaults m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire
-
- 16 Jul, 2015 6 commits
-
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A collection of fixes from the last few weeks that should go into the current series. This contains: - Various fixes for the per-blkcg policy data, fixing regressions since 4.1. From Arianna and Tejun - Code cleanup for bcache closure macros from me. Really just flushing this out, it's been sitting in another branch for months - FIELD_SIZEOF cleanup from Maninder Singh - bio integrity oops fix from Mike - Timeout regression fix for blk-mq from Ming Lei" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: set default timeout as 30 seconds NVMe: Reread partitions on metadata formats bcache: don't embed 'return' statements in closure macros blkcg: fix blkcg_policy_data allocation bug blkcg: implement all_blkcgs list blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods block/blk-cgroup.c: free per-blkcg data when freeing the blkcg block: use FIELD_SIZEOF to calculate size of a field bio integrity: do not assume bio_integrity_pool exists if bioset exists
-
git://github.com/kleikamp/linux-shaggyLinus Torvalds authored
Pull jfs fixes from David Kleikamp: "A couple trivial fixes and an error path fix" * tag 'jfs-4.2' of git://github.com/kleikamp/linux-shaggy: jfs: clean up jfs_rename and fix out of order unlock jfs: fix indentation on if statement jfs: removed a prohibited space after opening parenthesis
-
Rafael J. Wysocki authored
* pm-cpuidle: suspend-to-idle: Prevent RCU from complaining about tick_freeze() * pm-cpufreq: cpufreq: Allow freq_table to be obtained for offline CPUs cpufreq: Initialize the governor again while restoring policy * acpi-resources: ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
-
Ming Lei authored
It is reasonable to set default timeout of request as 30 seconds instead of 30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64 based systems may choose 100 HZ. Signed-off-by: Ming Lei <ming.lei@canonical.com> Fixes: c76cbbcf ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()" Signed-off-by: Jens Axboe <axboe@fb.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds authored
Pull TPM bugfixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: tpm, tpm_crb: fail when TPM2 ACPI table contents look corrupted tpm: Fix initialization of the cdev
-
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds authored
Pull rdma fixes from Doug Ledford: "Mainly fix-ups for the various 4.2 items" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits) IB/core: Destroy ocrdma_dev_id IDR on module exit IB/core: Destroy multcast_idr on module exit IB/mlx4: Optimize do_slave_init IB/mlx4: Fix memory leak in do_slave_init IB/mlx4: Optimize freeing of items on error unwind IB/mlx4: Fix use of flow-counters for process_mad IB/ipath: Convert use of __constant_<foo> to <foo> IB/ipoib: Set MTU to max allowed by mode when mode changes IB/ipoib: Scatter-Gather support in connected mode IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush IB/ucma: Fix lockdep warning in ucma_lock_files rds: rds_ib_device.refcount overflow RDMA/nes: Fix for incorrect recording of the MAC address RDMA/nes: Fix for resolving the neigh RDMA/core: Fixes for port mapper client registration IB/IPoIB: Fix bad error flow in ipoib_add_port() IB/mlx4: Do not attemp to report HCA clock offset on VFs IB/cm: Do not queue work to a device that's going away IB/srp: Avoid using uninitialized variable ...
-
- 15 Jul, 2015 6 commits
-
-
Keith Busch authored
This patch has the driver automatically reread partitions if a namespace has a separate metadata format. Previously revalidating a disk was sufficient to get the correct capacity set on such formatted drives, but partitions that may exist would not have been surfaced. Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Tested-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
git://git.samba.org/jlayton/linuxLinus Torvalds authored
Pull file locking updates from Jeff Layton: "I had thought that I was going to get away without a pull request this cycle. There was a NFSv4 file locking problem that cropped up that I tried to fix in the NFSv4 code alone, but that fix has turned out to be problematic. These patches fix this in the correct way. Note that this touches some NFSv4 code as well. Ordinarily I'd wait for Trond to ACK this, but he's on holiday right now and the bug is rather nasty. So I suggest we merge this and if he raises issues with it we can sort it out when he gets back" Acked-by: Bruce Fields <bfields@fieldses.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [ +1 to this series fixing a 100% reproducible slab corruption + general protection fault in my nfs-root test environment. - Dan ] Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux: locks: inline posix_lock_file_wait and flock_lock_file_wait nfs4: have do_vfs_lock take an inode pointer locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait locks: have flock_lock_file take an inode pointer instead of a filp Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM fixes from Paolo Bonzini: - Fix FPU refactoring ("kvm: x86: fix load xsave feature warning") - Fix eager FPU mode (Cc stable) - AMD bits of MTRR virtualization * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: fix load xsave feature warning KVM: x86: apply guest MTRR virtualization on host reserved pages KVM: SVM: Sync g_pat with guest-written PAT value KVM: SVM: use NPT page attributes KVM: count number of assigned devices KVM: VMX: fix vmwrite to invalid VMCS KVM: x86: reintroduce kvm_is_mmio_pfn x86: hyperv: add CPUID bit for crash handlers
-
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arcLinus Torvalds authored
Pull ARC fixes from Vineet Gupta: - Makefile changes (top-level+ARC) reinstates -O3 builds (regression since 3.16) - IDU intc related fixes, IRQ affinity - patch to make bitops safer for ARC - perf fix from Alexey to remove signed PC braino - Futex backend gets llock/scond support * tag 'arc-v4.2-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARCv2: support HS38 releases ARC: make sure instruction_pointer() returns unsigned value ARC: slightly refactor macros for boot logging ARC: Add llock/scond to futex backend arc:irqchip: prepare for drivers/irqchip/irqchip.h removal ARC: Make ARC bitops "safer" (add anti-optimization) ARCv2: [axs103] bump CPU frequency from 75 to 90 MHZ ARCv2: intc: IDU: Fix potential race in installing a chained IRQ handler ARCv2: intc: IDU: support irq affinity ARC: fix unused var wanring ARC: Don't memzero twice in dma_alloc_coherent for __GFP_ZERO ARC: Override toplevel default -O2 with -O3 kbuild: Allow arch Makefiles to override {cpp,ld,c}flags ARCv2: guard SLC DMA ops with spinlock ARC: Kconfig: better way to disable ARC_HAS_LLSC for ARC_CPU_750D
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Martin Schwidefsky: "One improvement for the zcrypt driver, the quality attribute for the hwrng device has been missing. Without it the kernel entropy seeding will not happen automatically. And six bug fixes, the most important one is the fix for the vector register corruption due to machine checks" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/nmi: fix vector register corruption s390/process: fix sfpc inline assembly s390/dasd: fix kernel panic when alias is set offline s390/sclp: clear upper register halves in _sclp_print_early s390/oprofile: fix compile error s390/sclp: fix compile error s390/zcrypt: enable s390 hwrng to seed kernel entropy
-
Dave Kleikamp authored
The end of jfs_rename(), which is also used by the error paths, included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2 and out3. If we come in through these labels, IWRITE_LOCK() has not been called yet. In moving that call to the correct spot, I also moved some exceptional truncate code earlier as well, since the early error paths don't need to deal with it, and I renamed out4: to out_tx: so a future patch by Jan Kara doesn't need to deal with renumbering or confusing out-of-order labels. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
-