- 22 Jan, 2018 40 commits
-
-
Liu Bo authored
We've observed that btrfs_get_extent() and merge_extent_mapping() could return -EEXIST in several cases, and they are caused by some racy condition, e.g dio read vs dio write, which makes the problem very tricky to reproduce. This adds extent map selftests in order to simulate those racy situations. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> [ minor string adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
These helpers are extent map specific, move them to extent_map.c. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
This is a prepare work for the following extent map selftest, which runs tests against em merge logic. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) and # drop [8K, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ About how t2(dio read/write) runs into -EEXIST: a) add_extent_mapping() gets -EEXIST for adding em [0, 32k), b) search_extent_mapping() then returns [0, 8k) as the existing em, even though start == existing->start, em is [0, 32k) so that extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k, c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree, d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write, which is confusing applications. Here I conclude all the possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start As we can see, it turns out that if start is within existing em (front inclusive), then the existing em should be returned as is, otherwise, we try our best to merge candidate em with sibling ems to form a larger em (in order to reduce the total number of em). Reported-by: David Vallender <david.vallender@landmark.co.uk> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
%block_len could be checked on deciding if two em are mergeable. merge_extent_mapping() has only added the front pad if the front part of em gets truncated, but it's possible that the end part gets truncated. For both compressed extent and inline extent, em->block_len is not adjusted accordingly, and for regular extent, em->block_len always equals to em->len, hence this sets em->block_len with em->len. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Matthew Wilcox authored
The reada_lock in struct btrfs_device was only initialised, and not actually used. That's good because there's another lock also called reada_lock in the btrfs_fs_info that was quite heavily used. Remove this one. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Before rbio_orig_end_io() goes to free rbio, rbio may get merged with more bios from other rbios and rbio->bio_list becomes non-empty, in that case, these newly merged bios don't end properly. Once unlock_stripe() is done, rbio->bio_list will not be updated any more and we can call bio_endio() on all queued bios. It should only happen in error-out cases, the normal path of recover and full stripe write have already set RBIO_RMW_LOCKED_BIT to disable merge before doing IO, so rbio_orig_end_io() called by them doesn't have the above issue. Reported-by: Jérôme Carretero <cJ-ko@zougloub.eu> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Since raid6 recover tries all possible combinations of failed stripes, - when raid6 rebuild algorithm is used, i.e. raid6_datap_recov() and raid6_2data_recov(), it may change the in-memory content of failed stripes, if such a raid bio is cached, a later raid write rmw or recover can steal @stripe_pages from it instead of reading from disks, such that it carries the wrong content to do write rmw or recovery and ends up with corruption or recovery failures. - when raid5 rebuild algorithm is used, i.e. xor, raid bio can be cached because the only failed stripe which contains @rbio->bio_pages gets modified, others remain the same so that their in-memory content is consistent with their on-disk content. This adds a check to skip caching rbio if using raid6 recover. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Bio iterated by set_bio_pages_uptodate() is raid56 internal one, so it will never be a BIO_CLONED bio, and since this is called by end_io functions, bio->bi_iter.bi_size is zero, we mustn't use bio_for_each_segment() as that is a no-op if bi_size is zero. Fixes: 6592e58c ("Btrfs: fix write corruption due to bio cloning on raid5/6") Cc: <stable@vger.kernel.org> # v4.12-rc6+ Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Su Yue authored
There is no function named btrfs_get_inode_index_count. Explanation for magic number index_cnt=2 in btrfs_new_inode() is actually located in btrfs_set_inode_index_count(). So replace 'btrfs_get_inode_index_count' in the comment by 'btrfs_set_inode_index_count'. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's not used outside of extent-tree so there is no reason to not be static. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
We call btrfs_free_stale_device() only when we alloc a new struct btrfs_device (ret=1), so move it closer to where we alloc the new device. Also drop the comments. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
I've noticed that the updated item checker stack consumption increased dramatically in 542f5385e20cf97447 ("btrfs: tree-checker: Add checker for dir item") tree-checker.c:check_leaf +552 (176 -> 728) The array is 255 bytes long, dynamic allocation would slow down the sanity checks so it's more reasonable to keep it on-stack. Moving the variable to the scope of use reduces the stack usage again tree-checker.c:check_leaf -264 (728 -> 464) Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Xiongfeng Wang authored
gcc-8 reports: fs/btrfs/ioctl.c: In function 'btrfs_ioctl': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 1024 equals destination size [-Wstringop-truncation] We need one less byte or call strlcpy() to make it a nul-terminated string. This is done on the next line anyway, but we want to avoid the warning. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
It appears from the original commit [1] that there isn't any design specific reason not to fail the mount instead of just warning. This patch will change it to fail. [1] commit 319e4d06 btrfs: Enhance super validation check Fixes: 319e4d06 ("btrfs: Enhance super validation check") Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
The UUID change by btrfstune sets SUPER_FLAG_CHANGING_FSID and resets it only when changing fsid is complete. Its not a good idea to mount the device anything in between, reading metadata blocks would fail with UUID mismatch. This patch doesn't add SUPER_FLAG_CHANGING_FSID into BTRFS_SUPER_FLAG_SUPP list, so mount will fail (along with the fix in the next patch) when SUPER_FLAG_CHANGING_FSID is set. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
btrfs-progs uses super flag bit BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34). So just define that in kernel so that we know its been used. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
We've avoided data losing raid profile when doing balance, but it turns out that deleting a device could also result in the same problem. Say we have 3 disks, and they're created with '-d raid1' profile. - We have chunk P (the only data chunk on the empty btrfs). - Suppose that chunk P's two raid1 copies reside in disk A and disk B. - Now, 'btrfs device remove disk B' btrfs_rm_device() -> btrfs_shrink_device() -> btrfs_relocate_chunk() #relocate any chunk on disk B to other places. - Chunk P will be removed and a new chunk will be created to hold those data, but as chunk P is the only one holding raid1 profile, after it goes away, the new chunk will be created as single profile which is our default profile. This fixes the problem by creating an empty data chunk before relocating the data chunk. Metadata/System chunk are supposed to have non-zero bytes all the time so their raid profile is preserved. Reported-by: James Alandt <James.Alandt@wdc.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
If we do a buffered write after a zero range operation that has an unaligned (with the filesystem's sector size) end which also falls within an unwritten (prealloc) extent that is currently beyond the inode's i_size, and the zero range operation has the flag FALLOC_FL_KEEP_SIZE, we end up leaking data and metadata space. This happens because when zeroing a range we call btrfs_truncate_block(), which does delalloc (loads the page and partially zeroes its content), and in the buffered write path we only clear existing delalloc space reservation for the range we are writing into if that range starts at an offset smaller then the inode's i_size, which makes sense since we can not have delalloc extents beyond the i_size, only unwritten extents are allowed. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "falloc -k 428K 4K" /mnt/foobar $ xfs_io -c "fzero -k 0 430K" /mnt/foobar $ xfs_io -c "pwrite -S 0xaa 428K 4K" /mnt/foobar $ umount /mnt After the unmount we get the metadata and data space leaks reported in dmesg/syslog: [95794.602253] ------------[ cut here ]------------ [95794.603322] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9561 btrfs_destroy_inode+0x4e/0x206 [btrfs] [95794.605167] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.613000] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.614448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.615972] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.617114] RIP: 0010:btrfs_destroy_inode+0x4e/0x206 [btrfs] [95794.618001] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202 [95794.618721] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c [95794.619645] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418 [95794.620711] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000 [95794.621932] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000 [95794.623124] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0 [95794.624188] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.625578] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.626522] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.627647] Call Trace: [95794.628128] destroy_inode+0x3d/0x55 [95794.628573] evict+0x177/0x17e [95794.629010] dispose_list+0x50/0x71 [95794.629478] evict_inodes+0x132/0x141 [95794.630289] generic_shutdown_super+0x3f/0x10b [95794.630864] kill_anon_super+0x12/0x1c [95794.631383] btrfs_kill_super+0x16/0x21 [btrfs] [95794.631930] deactivate_locked_super+0x30/0x68 [95794.632539] deactivate_super+0x36/0x39 [95794.633200] cleanup_mnt+0x49/0x67 [95794.633818] __cleanup_mnt+0x12/0x14 [95794.634416] task_work_run+0x82/0xa6 [95794.634902] prepare_exit_to_usermode+0xe1/0x10c [95794.635525] syscall_return_slowpath+0x18c/0x1af [95794.636122] entry_SYSCALL_64_fastpath+0xab/0xad [95794.636834] RIP: 0033:0x7fa678cb99a7 [95794.637370] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.638672] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.639596] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.640703] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.641773] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.643150] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.644249] Code: ff 4c 8b a8 80 06 00 00 48 8b 87 c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 <0f> ff 83 bb 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00 [95794.646929] ---[ end trace e95877675c6ec007 ]--- [95794.647751] ------------[ cut here ]------------ [95794.648509] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9562 btrfs_destroy_inode+0x59/0x206 [btrfs] [95794.649842] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.654659] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.655894] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.657546] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.658433] RIP: 0010:btrfs_destroy_inode+0x59/0x206 [btrfs] [95794.659279] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202 [95794.660054] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c [95794.660753] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418 [95794.661513] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000 [95794.662289] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000 [95794.663393] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0 [95794.664342] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.665673] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.666593] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.667629] Call Trace: [95794.668065] destroy_inode+0x3d/0x55 [95794.668637] evict+0x177/0x17e [95794.669179] dispose_list+0x50/0x71 [95794.669830] evict_inodes+0x132/0x141 [95794.670416] generic_shutdown_super+0x3f/0x10b [95794.671103] kill_anon_super+0x12/0x1c [95794.671786] btrfs_kill_super+0x16/0x21 [btrfs] [95794.672552] deactivate_locked_super+0x30/0x68 [95794.673393] deactivate_super+0x36/0x39 [95794.674107] cleanup_mnt+0x49/0x67 [95794.674706] __cleanup_mnt+0x12/0x14 [95794.675279] task_work_run+0x82/0xa6 [95794.675795] prepare_exit_to_usermode+0xe1/0x10c [95794.676507] syscall_return_slowpath+0x18c/0x1af [95794.677275] entry_SYSCALL_64_fastpath+0xab/0xad [95794.678006] RIP: 0033:0x7fa678cb99a7 [95794.678600] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.679739] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.680779] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.681837] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.682867] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.683891] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.684843] Code: c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 0f ff 83 bb 40 ff ff ff 00 74 02 <0f> ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff [95794.687156] ---[ end trace e95877675c6ec008 ]--- [95794.687876] ------------[ cut here ]------------ [95794.688579] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9565 btrfs_destroy_inode+0x7d/0x206 [btrfs] [95794.689735] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.695015] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.696396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.697956] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.698925] RIP: 0010:btrfs_destroy_inode+0x7d/0x206 [btrfs] [95794.699763] RSP: 0018:ffffc90001737d00 EFLAGS: 00010206 [95794.700434] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c [95794.701445] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418 [95794.702448] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000 [95794.703557] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000 [95794.704441] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0 [95794.705270] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.706341] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.707001] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.708030] Call Trace: [95794.708466] destroy_inode+0x3d/0x55 [95794.709071] evict+0x177/0x17e [95794.709497] dispose_list+0x50/0x71 [95794.709973] evict_inodes+0x132/0x141 [95794.710564] generic_shutdown_super+0x3f/0x10b [95794.711200] kill_anon_super+0x12/0x1c [95794.711633] btrfs_kill_super+0x16/0x21 [btrfs] [95794.712139] deactivate_locked_super+0x30/0x68 [95794.712608] deactivate_super+0x36/0x39 [95794.713093] cleanup_mnt+0x49/0x67 [95794.713514] __cleanup_mnt+0x12/0x14 [95794.713933] task_work_run+0x82/0xa6 [95794.714543] prepare_exit_to_usermode+0xe1/0x10c [95794.715247] syscall_return_slowpath+0x18c/0x1af [95794.715952] entry_SYSCALL_64_fastpath+0xab/0xad [95794.716653] RIP: 0033:0x7fa678cb99a7 [95794.721100] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.722052] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.722856] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.723698] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.724736] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.725928] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.726728] Code: 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff 00 74 02 0f ff 48 83 bb 30 ff ff ff 00 74 02 <0f> ff 48 83 bb 08 ff ff ff 00 74 02 0f ff 4d 85 e4 0f 84 52 01 [95794.729203] ---[ end trace e95877675c6ec009 ]--- [95794.841054] ------------[ cut here ]------------ [95794.841829] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5831 btrfs_free_block_groups+0x235/0x36a [btrfs] [95794.843425] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.850658] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.852590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.854752] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.855812] RIP: 0010:btrfs_free_block_groups+0x235/0x36a [btrfs] [95794.856811] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206 [95794.857805] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001 [95794.859014] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff [95794.860270] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9 [95794.861525] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000 [95794.862700] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100 [95794.863810] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.865149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.866099] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.867198] Call Trace: [95794.867626] close_ctree+0x1db/0x2b8 [btrfs] [95794.868188] ? evict_inodes+0x132/0x141 [95794.869037] btrfs_put_super+0x15/0x17 [btrfs] [95794.870400] generic_shutdown_super+0x6a/0x10b [95794.871262] kill_anon_super+0x12/0x1c [95794.872046] btrfs_kill_super+0x16/0x21 [btrfs] [95794.872746] deactivate_locked_super+0x30/0x68 [95794.873687] deactivate_super+0x36/0x39 [95794.874639] cleanup_mnt+0x49/0x67 [95794.875504] __cleanup_mnt+0x12/0x14 [95794.876126] task_work_run+0x82/0xa6 [95794.876788] prepare_exit_to_usermode+0xe1/0x10c [95794.877777] syscall_return_slowpath+0x18c/0x1af [95794.878381] entry_SYSCALL_64_fastpath+0xab/0xad [95794.878888] RIP: 0033:0x7fa678cb99a7 [95794.879307] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.880204] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.881640] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.882690] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.883538] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.884562] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.885664] Code: 89 ef e8 07 ec 32 e1 e8 9d c0 ea e0 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 <0f> ff 48 83 bb 88 02 00 00 00 74 02 0f ff 48 83 bb d8 02 00 00 [95794.887980] ---[ end trace e95877675c6ec00a ]--- [95794.888739] ------------[ cut here ]------------ [95794.889405] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5832 btrfs_free_block_groups+0x241/0x36a [btrfs] [95794.891020] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.897551] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.898509] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.899685] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.900592] RIP: 0010:btrfs_free_block_groups+0x241/0x36a [btrfs] [95794.901387] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206 [95794.902300] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001 [95794.903260] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff [95794.904332] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9 [95794.905300] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000 [95794.906439] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100 [95794.907459] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.908625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.909511] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.910630] Call Trace: [95794.911153] close_ctree+0x1db/0x2b8 [btrfs] [95794.911837] ? evict_inodes+0x132/0x141 [95794.912344] btrfs_put_super+0x15/0x17 [btrfs] [95794.912975] generic_shutdown_super+0x6a/0x10b [95794.913788] kill_anon_super+0x12/0x1c [95794.914424] btrfs_kill_super+0x16/0x21 [btrfs] [95794.915142] deactivate_locked_super+0x30/0x68 [95794.915831] deactivate_super+0x36/0x39 [95794.916433] cleanup_mnt+0x49/0x67 [95794.917045] __cleanup_mnt+0x12/0x14 [95794.917665] task_work_run+0x82/0xa6 [95794.918309] prepare_exit_to_usermode+0xe1/0x10c [95794.919021] syscall_return_slowpath+0x18c/0x1af [95794.919722] entry_SYSCALL_64_fastpath+0xab/0xad [95794.920426] RIP: 0033:0x7fa678cb99a7 [95794.921039] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.922303] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.923335] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.924364] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.925435] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.926533] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.927557] Code: 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 0f ff 48 83 bb 88 02 00 00 00 74 02 <0f> ff 48 83 bb d8 02 00 00 00 74 02 0f ff 48 83 bb e0 02 00 00 [95794.930166] ---[ end trace e95877675c6ec00b ]--- [95794.930961] ------------[ cut here ]------------ [95794.931727] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs] [95794.932729] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.938394] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.939842] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.941455] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.942336] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs] [95794.943268] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206 [95794.944127] RAX: ffff8802004fd0e8 RBX: ffff88006145c000 RCX: 0000000000000001 [95794.945211] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff [95794.946316] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9 [95794.947271] R10: ffffc90001737c80 R11: 00000000000337fd R12: ffff8802004fd0e8 [95794.948219] R13: ffff88006145c0c0 R14: ffff88006145e598 R15: ffff88006145c100 [95794.949193] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.950495] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.951338] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95794.952361] Call Trace: [95794.952811] close_ctree+0x1db/0x2b8 [btrfs] [95794.953522] ? evict_inodes+0x132/0x141 [95794.954543] btrfs_put_super+0x15/0x17 [btrfs] [95794.955231] generic_shutdown_super+0x6a/0x10b [95794.955916] kill_anon_super+0x12/0x1c [95794.956414] btrfs_kill_super+0x16/0x21 [btrfs] [95794.956953] deactivate_locked_super+0x30/0x68 [95794.957635] deactivate_super+0x36/0x39 [95794.958256] cleanup_mnt+0x49/0x67 [95794.958701] __cleanup_mnt+0x12/0x14 [95794.959181] task_work_run+0x82/0xa6 [95794.959635] prepare_exit_to_usermode+0xe1/0x10c [95794.960182] syscall_return_slowpath+0x18c/0x1af [95794.960731] entry_SYSCALL_64_fastpath+0xab/0xad [95794.961438] RIP: 0033:0x7fa678cb99a7 [95794.961990] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95794.963111] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95794.963975] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95794.964680] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95794.965763] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95794.966868] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95794.967800] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff [95794.970629] ---[ end trace e95877675c6ec00c ]--- [95794.971451] BTRFS info (device sdi): space_info 1 has 7680000 free, is not full [95794.972351] BTRFS info (device sdi): space_info total=8388608, used=704512, pinned=0, reserved=0, may_use=4096, readonly=0 [95794.973595] ------------[ cut here ]------------ [95794.974353] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs] [95794.980163] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs] [95794.986461] CPU: 0 PID: 31496 Comm: umount Tainted: G W 4.14.0-rc6-btrfs-next-54+ #1 [95794.987591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [95794.988929] task: ffff880075aa0240 task.stack: ffffc90001734000 [95794.989922] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs] [95794.990715] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206 [95794.991431] RAX: ffff88020f6e70e8 RBX: ffff88006145c000 RCX: ffffffff8115a906 [95794.992455] RDX: ffffffff8115a902 RSI: ffff880075aa0b40 RDI: ffff880075aa0b40 [95794.993535] RBP: ffffc90001737d98 R08: 0000000000000020 R09: fffffffffffffff7 [95794.994573] R10: 00000000ffffffc4 R11: ffff8800633b1bc0 R12: ffff88020f6e70e8 [95794.996250] R13: 0000000000000038 R14: ffff88006145e598 R15: 0000000000000000 [95794.997233] FS: 00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [95794.998592] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95794.999484] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0 [95795.000542] Call Trace: [95795.001138] close_ctree+0x1db/0x2b8 [btrfs] [95795.001885] ? evict_inodes+0x132/0x141 [95795.002407] btrfs_put_super+0x15/0x17 [btrfs] [95795.003093] generic_shutdown_super+0x6a/0x10b [95795.003720] kill_anon_super+0x12/0x1c [95795.004353] btrfs_kill_super+0x16/0x21 [btrfs] [95795.005095] deactivate_locked_super+0x30/0x68 [95795.005716] deactivate_super+0x36/0x39 [95795.006388] cleanup_mnt+0x49/0x67 [95795.006939] __cleanup_mnt+0x12/0x14 [95795.007512] task_work_run+0x82/0xa6 [95795.008124] prepare_exit_to_usermode+0xe1/0x10c [95795.008994] syscall_return_slowpath+0x18c/0x1af [95795.009831] entry_SYSCALL_64_fastpath+0xab/0xad [95795.010610] RIP: 0033:0x7fa678cb99a7 [95795.011193] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [95795.012327] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7 [95795.013432] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90 [95795.014558] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015 [95795.015577] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64 [95795.016569] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160 [95795.017662] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff [95795.020538] ---[ end trace e95877675c6ec00d ]--- [95795.021259] BTRFS info (device sdi): space_info 4 has 1072775168 free, is not full [95795.022390] BTRFS info (device sdi): space_info total=1073741824, used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536 Fix this by ensuring the zero range operation does not call btrfs_truncate_block() if the corresponding extent is an unwritten one (it's pointless anyway, since reading from an unwritten extent yields zeroes). Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
For a fallocate's zero range operation that targets a range with an end that is not aligned to the sector size, we can end up not updating the inode's i_size. This happens when the last page of the range maps to an unwritten (prealloc) extent and before that last page we have either a hole or a written extent. This is because in this scenario we relied on a call to btrfs_prealloc_file_range() to update the inode's i_size, however it can only update the i_size to the "down aligned" end of the range. Example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ xfs_io -f -c "pwrite -S 0xff 0 428K" /mnt/foobar $ xfs_io -c "falloc -k 428K 4K" /mnt/foobar $ xfs_io -c "fzero 0 430K" /mnt/foobar $ du --bytes /mnt/foobar 438272 /mnt/foobar The inode's i_size was left as 428Kb (438272 bytes) when it should have been updated to 430Kb (440320 bytes). Fix this by always updating the inode's i_size explicitly after zeroing the range. Fixes: ba6d5887946ff86d93dc ("Btrfs: add support for fallocate's zero range operation") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
During a buffered IO write, we can have an extent state that we got when we locked the range (if the range starts at an offset lower than eof), so always pass it to btrfs_dirty_pages() so that setting the delalloc bit in the range does not need to do a full search in the inode's io tree, saving time and reducing the amount of time we hold the io tree's lock. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
This implements support the zero range operation of fallocate. For now at least it's as simple as possible while reusing most of the existing fallocate and hole punching infrastructure. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Since fail stripe index in rbio would be used to decide which algorithm reconstruction would be run, we cannot merge rbios if their's fail striped indexes are different, otherwise, one of the two reconstructions would fail. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
Given the above ' if (last->operation != cur->operation) return 0; ', it's guaranteed that two operations are same. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Assign ret = -EINVAL where it is actually required. Remove { } around single line if else code. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
No functional change rearrange the mutex_unlock. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ edit subject ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
btrfs_device::scrub_device is not a device which is being scrubbed, but it holds the scrub context, so rename to reflect the same. No functional changes here. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
There is no other consumer for btrfs_handle_error() other than __btrfs_handle_fs_error(), further this function quite small. Merge it into its parent. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ reformat comment ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
__btrfs_handle_fs_error() sets BTRFS_FS_STATE_ERROR, and calls btrfs_handle_error() so no need to check if the BTRFS_FS_STATE_ERROR is set in btrfs_handle_error(). And there is no other user of btrfs_handle_error() as well. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
There is a scenario that can end up with rebuild process failing to return good content, i.e. suppose that all disks can be read without problems and if the content that was read out doesn't match its checksum, currently for raid6 btrfs at most retries twice, - the 1st retry is to rebuild with all other stripes, it'll eventually be a raid5 xor rebuild, - if the 1st fails, the 2nd retry will deliberately fail parity p so that it will do raid6 style rebuild, however, the chances are that another non-parity stripe content also has something corrupted, so that the above retries are not able to return correct content, and users will think of this as data loss. More seriouly, if the loss happens on some important internal btree roots, it could refuse to mount. This extends btrfs to do more retries and each retry fails only one stripe. Since raid6 can tolerate 2 disk failures, if there is one more failure besides the failure on which we're recovering, this can always work. The worst case is to retry as many times as the number of raid6 disks, but given the fact that such a scenario is really rare in practice, it's still acceptable. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
The raid6 corruption is that, suppose that all disks can be read without problems and if the content that was read out doesn't match its checksum, currently for raid6 btrfs at most retries twice, - the 1st retry is to rebuild with all other stripes, it'll eventually be a raid5 xor rebuild, - if the 1st fails, the 2nd retry will deliberately fail parity p so that it will do raid6 style rebuild, however, the chances are that another non-parity stripe content also has something corrupted, so that the above retries are not able to return correct content. We've fixed normal reads to rebuild raid6 correctly with more retries in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix scrub to do the exactly same rebuild process. [1]: https://patchwork.kernel.org/patch/10091755/Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
Update btrfs_check_rw_degradable() to check against the given device if its lost. We can use this function to know if the volume is going to be in degraded mode OR failed state, when the given device fails. Which is needed when we are handling the device failed state. A preparatory patch does not affect the flow as such. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ enhance comment ] Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the parameter to the function, though we lose some of the slightly better semantics of GFP_KERNEL in some places, it's worth cleaning up the callchains. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
There's only one instance where we pass different gfp mask to unlock_extent_cached. Add a separate helper for that and then we can drop the gfp parameter from unlock_extent_cached. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Recent patches reworking the mount path left some unused parameters. We pass a vfsmount to mount_subvol, the flags and data (ie. mount options) have been already applied and we will not need them. Signed-off-by: David Sterba <dsterba@suse.com>
-
Misono, Tomohiro authored
Long ago, commit edf24abe ("btrfs: sanity mount option parsing and early mount code") split the btrfs_parse_options() into two parts (btrfs_parse_early_options() and btrfs_parse_options()). As a result, btrfs_parse_optins no longer gets called twice and is the last one to parse mount option string. Therefore there is no need to dup it. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Liu Bo authored
In fact nobody is waiting on @wait's waitqueue, it can be safely removed. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The bio is not referenced after it has been submitted and the endio is going to consume the sole reference on successful submission. On error, the callers of __btrfs_submit_dio_bio do invoke bio_put so we don't leak it either. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The bio is never referenced after it has been submitted so there is no point in getting an extra reference. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The bio that is passsed is the newly created repair bio which already has a reference count of 1, which is going to be consumed by the endio routine on successful submission. On error the handler also calls bio_put. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-