- 09 Sep, 2019 40 commits
-
-
David Sterba authored
The maximum and default levels do not change and can be defined directly. The set_level callback was a temporary solution and will be removed. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
At ctree.c:get_old_root(), we are accessing a root's header owner field after we have freed the respective extent buffer. This results in an use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC is set, results in a stack trace like the following: [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1 [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs] (...) [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286 [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000 [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246 [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000 [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000 [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8 [ 3876.799620] FS: 00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000 [ 3876.799643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0 [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3876.799723] Call Trace: [ 3876.799735] ? do_raw_spin_unlock+0x49/0xc0 [ 3876.799749] ? _raw_spin_unlock+0x24/0x30 [ 3876.799779] resolve_indirect_refs+0x1eb/0xc80 [btrfs] [ 3876.799810] find_parent_nodes+0x38d/0x1180 [btrfs] [ 3876.799841] btrfs_check_shared+0x11a/0x1d0 [btrfs] [ 3876.799870] ? extent_fiemap+0x598/0x6e0 [btrfs] [ 3876.799895] extent_fiemap+0x598/0x6e0 [btrfs] [ 3876.799913] do_vfs_ioctl+0x45a/0x700 [ 3876.799926] ksys_ioctl+0x70/0x80 [ 3876.799938] ? trace_hardirqs_off_thunk+0x1a/0x20 [ 3876.799953] __x64_sys_ioctl+0x16/0x20 [ 3876.799965] do_syscall_64+0x62/0x220 [ 3876.799977] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3876.799993] RIP: 0033:0x7f23e0013dd7 (...) [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7 [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003 [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80 [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80 (...) [ 3876.805107] ---[ end trace e53161e179ef04f9 ]--- Fix that by saving the root's header owner field into a local variable before freeing the root's extent buffer, and then use that local variable when needed. Fixes: 30b0463a ("Btrfs: fix accessing the root pointer in tree mod log functions") CC: stable@vger.kernel.org # 3.10+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
The BTRFS_DEV_REPLACE_ITEM_STATE_x defines, as shown in [1], are unused in both kernel and btrfs-progs (except for one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED in kernel). [1] btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED 2 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED 3 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED 4 Further these define-values are different form its counterpart BTRFS_IOCTL_DEV_REPLACE_STATE_x series as shown in [2]. [2] btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_SUSPENDED 2 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_FINISHED 3 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_CANCELED 4 So this patch deletes the BTRFS_DEV_REPLACE_ITEM_STATE_x altogether, and one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED is replaced with BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED in the kernel. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We have this weird space flushing loop inside inode.c for evict where we'll do the normal LIMIT flush, and then commit the transaction and hope we get our space. This is super janky, and in fact there's really nothing stopping us from using FLUSH_ALL except that we run delayed iputs, which means we could deadlock. So introduce a new flush state for eviction that does the normal priority flushing with all of the states that are safe for eviction. The nice side-effect of this is that we'll try harder for evictions. Previously if (for example generic/269) you had a bunch of other operations happening on the fs you could race with those reservations when committing the transaction, and eventually miss getting a reservation for the evict. With this code we'll have our ticket in place through the transaction commit, so any pinned bytes will go to our pending evictions first. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
With the eviction flushing stuff we'll want to allow for different states, but still work basically the same way that priority_reclaim_metadata_space works currently. Refactor this to take the flushing states and size as an argument so we can use the same logic for limit flushing and eviction flushing. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We're going to make this logic a little more complicated for evict, so factor the ticket flushing/waiting code out of __reserve_metadata_bytes. This has no functional change. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Currently we handle the cleanup of errored out tickets in both the priority flush path and the normal flushing path. This is the same code in both places, so just refactor so we don't duplicate the cleanup work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Delayed iputs could very well free up enough space without needing to commit the transaction, so make this step it's own step. This will allow us to skip the step for evictions in a later patch. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
These were renamed and exported to facilitate logical migration of different code chunks into block-group.c. Now that all the users are in one file go ahead and rename them back, move the code around, and make them static. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This can now be easily migrated as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh on top of sysfs cleanups ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
These feel more at home in block-group.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh, adjust btrfs_get_alloc_profile exports ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This feels more at home in block-group.c than in extent-tree.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com>i [ refresh ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We can now easily migrate this code as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Want to move these functions into block-group.c, so export them. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This can be easily migrated over now. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This can easily be moved now. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This gets used by a few different logical chunks of the block group code, export it while we move things around. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
All of the prep work has been done so we can now cleanly move this chunk over. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh, add btrfs_get_alloc_profile export, comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This is the removal code and the unused bgs code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ refresh, move clear_incompat_bg_bits ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This is used in a few logical parts of the block group code, temporarily export it so we can move things in pieces. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We can now just copy it over to block-group.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
None of the macros is used outside of sysfs.c. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The kobject should be pulled in via sysfs.h and that needs to include it because it needs various definitions like kobj_attribute or kobject. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Wrap the fsid renaming code and move it to sysfs.c. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The helpers to create block group and space info directories already live in sysfs.c, move the deletion part there too. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The device uevent belongs to the sysfs API. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
In order to unexport the feature type array, add a helper for the enum-to-string conversion. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The last non-sysfs usage of space_info_ktype has been moved to a private helper in previous patch so the variable can be made static. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Move creation of data/metadata/system space info directories to sysfs.c. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The last non-sysfs usage of btrfs_raid_ktype has been moved to a private helper in previous patch so the variable can be made static. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The part of link_block_group that just creates the sysfs object is independent and can be factored out to a helper. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
As the header for sysfs code already exists, use it to clean up ctree.h. Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
__btrfs_reset_dev_stats() is a small helper function to reset devices stat values, and is used only once, instead just open code it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
btrfs_dev_stat_reset() is an overdo in terms of wrapping. So this patch open codes btrfs_dev_stat_reset(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
When we try to delete qgroups, we're pretty cautious, we make sure both qgroups exist and there is a relationship between them, then try to delete the relation. This behavior is OK, but the problem is we need to two relation items, and if we failed the first item deletion, we error out, leaving the other relation item in qgroup tree. Sometimes the error from del_qgroup_relation_item() could just be -ENOENT, thus we can ignore that error and continue without any problem. Further more, such cautious behavior makes qgroup relation deletion impossible for orphan relation items. This patch will enhance __del_qgroup_relation(): - If both qgroups and their relation items exist Go the regular deletion routine and update their accounting if needed. - If any qgroup or relation item doesn't exist Then we still try to delete the orphan items anyway, but don't trigger the accounting update. By this, we try our best to remove relation items, and can handle orphan relation items properly, while still keep the existing behavior for good qgroup tree. Reported-by: Andrei Borzenkov <arvidjaar@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Hans van Kranenburg authored
In commit c11d2c23 ("Btrfs: add ioctl to get and reset the device stats") the get_dev_stats ioctl was added. Shortly thereafter, in commit b27f7c0c ("btrfs: join DEV_STATS ioctls to one") , the flags field was added. However, the calculation for unused padding space was not updated, which also invalidated the comment. Clarify what happened to reduce confusion and wasted time for anyone implementing this. Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
If any call to find_first_clear_extent_bit() returns an unexpected result, the test should fail and not just print an error message, otherwise it makes detection of regressions much harder to notice. Fixes: 1eaebb34 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The test creates an extent io tree and sets several ranges with the CHUNK_ALLOCATED and CHUNK_TRIMMED bits, resulting in the allocation of several extent state structures. However the test never clears those ranges, resulting in memory leaks of the extent state structures. This is detected when CONFIG_BTRFS_DEBUG is set once we remove the btrfs module (rmmod btrfs): [57399.787918] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.790155] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.791941] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.793753] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.795188] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.796453] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.797765] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.799049] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.800142] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.801126] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.802106] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.803119] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 [57399.804153] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1 [57399.805196] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1 [57399.806191] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1 The start and end offsets reported correspond exactly to the ranges used by the test. So fix that by clearing all the ranges when the test finishes. Fixes: 1eaebb34 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Replaced by the sysfs exports that provide a more fine grained interface for filesystem debugging. Signed-off-by: David Sterba <dsterba@suse.com>
-