- 14 Jun, 2013 30 commits
-
-
Miao Xie authored
We used 3 variants to track the state of the transaction, it was complex and wasted the memory space. Besides that, it was hard to understand that which types of the transaction handles should be blocked in each transaction state, so the developers often made mistakes. This patch improved the above problem. In this patch, we define 6 states for the transaction, enum btrfs_trans_state { TRANS_STATE_RUNNING = 0, TRANS_STATE_BLOCKED = 1, TRANS_STATE_COMMIT_START = 2, TRANS_STATE_COMMIT_DOING = 3, TRANS_STATE_UNBLOCKED = 4, TRANS_STATE_COMPLETED = 5, TRANS_STATE_MAX = 6, } and just use 1 variant to track those state. In order to make the blocked handle types for each state more clear, we introduce a array: unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = { [TRANS_STATE_RUNNING] = 0U, [TRANS_STATE_BLOCKED] = (__TRANS_USERSPACE | __TRANS_START), [TRANS_STATE_COMMIT_START] = (__TRANS_USERSPACE | __TRANS_START | __TRANS_ATTACH), [TRANS_STATE_COMMIT_DOING] = (__TRANS_USERSPACE | __TRANS_START | __TRANS_ATTACH | __TRANS_JOIN), [TRANS_STATE_UNBLOCKED] = (__TRANS_USERSPACE | __TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOLOCK), [TRANS_STATE_COMPLETED] = (__TRANS_USERSPACE | __TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | __TRANS_JOIN_NOLOCK), } it is very intuitionistic. Besides that, because we remove ->in_commit in transaction structure, so the lock ->commit_lock which was used to protect it is unnecessary, remove ->commit_lock. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
We checked the commit time to avoid committing the transaction frequently, but it is unnecessary because: - It made the transaction commit spend more time, and delayed the operation of the external writers(TRANS_START/TRANS_USERSPACE). - Except the space that we have to commit transaction, such as snapshot creation, btrfs doesn't commit the transaction on its own initiative. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
We used ->num_joined track if there were some writers which join the current transaction when the committer was sleeping. If some writers joined the current transaction, we has to continue the while loop to do some necessary stuff, such as flush the ordered operations. But it is unnecessary because we will do it after the while loop. Besides that, tracking ->num_joined would make the committer drop into the while loop when there are lots of internal writers(TRANS_JOIN). So we remove ->num_joined and don't track if there are some writers which join the current transaction when the committer is sleeping. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
It is unnecessary to flush the delalloc inodes again and again because we don't care the dirty pages which are introduced after the flush, and they will be flush in the transaction commit. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
btrfs_commit_transaction has the following loop before we commit the transaction. do { // attempt to do some useful stuff and/or sleep } while (atomic_read(&cur_trans->num_writers) > 1 || (should_grow && cur_trans->num_joined != joined)); This is used to prevent from the TRANS_START to get in the way of a committing transaction. But it does not prevent from TRANS_JOIN, that is we would do this loop for a long time if some writers JOIN the current transaction endlessly. Because we need join the current transaction to do some useful stuff, we can not block TRANS_JOIN here. So we introduce a external writer counter, which is used to count the TRANS_USERSPACE/TRANS_START writers. If the external writer counter is zero, we can break the above loop. In order to make the code more clear, we don't use enum variant to define the type of the transaction handle, use bitmask instead. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
If the transaction is removed from the transaction list, it means the transaction has been committed successfully. So it is impossible to call cleanup_transaction(), otherwise there is something wrong with the code logic. Thus, we use BUG_ON() instead of the original handle. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
When we umount a fs with serious errors, we will invoke btrfs_cleanup_transactions() to clean up the residual transaction. At this time, It is impossible to start a new transaction, so we needn't assign trans_no_join to 1, and also needn't clear running transaction every time we destroy a residual transaction. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
Before applying this patch, we need flush all the delalloc inodes in the fs when we want to create a snapshot, it wastes time, and make the transaction commit be blocked for a long time. It means some other user operation would also be blocked for a long time. This patch improves this problem, we just flush the delalloc inodes that in the source trees before snapshot creation, so the transaction commit will complete quickly. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
The reason we introduce per-subvolume ordered extent list is the same as the per-subvolume delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
When we create a snapshot, we need flush all delalloc inodes in the fs, just flushing the inodes in the source tree is OK. So we introduce per-subvolume delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
The grab/put funtions will be used in the next patch, which need grab the root object and ensure it is not freed. We use reference counter instead of the srcu lock is to aovid blocking the memory reclaim task, which invokes synchronize_srcu(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so it is unnecessary to check the orphan item again. So cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
The snapshot/subvolume deletion might spend lots of time, it would make the remount task wait for a long time. This patch improve this problem, we will break the deletion if the fs is remounted to be R/O. It will make the users happy. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
If the fs is remounted to be R/O, it is unnecessary to call btrfs_clean_one_deleted_snapshot(), so move the R/O check out of this function. And besides that, it can make the check logic in the caller more clear. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Miao Xie authored
In order to avoid the R/O remount, we acquired ->s_umount lock during we deleted the dead snapshots and subvolumes. But it is unnecessary, because we have cleaner_mutex. We use cleaner_mutex to protect the process of the dead snapshots/subvolumes deletion. And when we remount the fs to be R/O, we also acquire this mutex to do cleanup after we change the status of the fs. That is this lock can serialize the above operations, the cleaner can be aware of the status of the fs, and if the cleaner is deleting the dead snapshots/subvolumes, the remount task will wait for it. So it is safe to remove ->s_umount in cleaner_kthread(). Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Stefan Behrens authored
btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in six places. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Stefan Behrens authored
No need to check for NULL in send.c and disk-io.c. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Stefan Behrens authored
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Liu Bo authored
We don't need to copy it back to user side as it remains unchanged. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Andreas Philipp authored
Clean up the format of the definitions of BTRFS_BLOCK_GROUP_RAID5 and BTRFS_BLOCK_GROUP_RAID6. Signed-off-by: Andreas Philipp <philipp.andreas@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Tsutomu Itoh authored
sctx is removed from the argument of the function that doesn't use sctx. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Stefan Behrens authored
The size parameter to btrfs_extend_item() is the number of bytes to add to the item, not the size of the item after the operation (like it is for btrfs_truncate_item(), there the size parameter is not the number of bytes to take away, but the total size of the item after truncation). Fix it in the comment. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Jan Schmidt authored
btrfs_qgroup_wait_for_completion waits until the currently running qgroup operation completes. It returns immediately when no rescan process is in progress. This is useful to automate things around the rescan process (e.g. testing). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Wang Shilong authored
When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time when we want to walk qgroup tree. By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free() once. This reduce some sys time to allocate memory, see the measurements below fsstress -p 4 -n 10000 -d $dir With this patch: real 0m50.153s user 0m0.081s sys 0m6.294s real 0m51.113s user 0m0.092s sys 0m6.220s real 0m52.610s user 0m0.096s sys 0m6.125s avg 6.213 ----------------------------------------------------- Without the patch: real 0m54.825s user 0m0.061s sys 0m10.665s real 1m6.401s user 0m0.089s sys 0m11.218s real 1m13.768s user 0m0.087s sys 0m10.665s avg 10.849 we can see the sys time reduce ~43%. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
David Sterba authored
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. Also kill version.h file that serves no purpose, we don't use any version tag for kernel module. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
David Sterba authored
And change the message level to KERN_INFO. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
David Sterba authored
The 'end' value must exactly cover the end of the interval, which means one byte less than the expected block alignment, or in case of a file smaller than one block, one byte less than the inode size. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Henrik Nordvik authored
Code checked for raid 5 flag in two else-if branches, so code would never be reached. Probably a copy-paste bug. Signed-off-by: Henrik Nordvik <henrikno@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
- 08 Jun, 2013 5 commits
-
-
Josef Bacik authored
Dave reported a panic because the extent_root->commit_root was NULL in the caching kthread. That is because we just unset it in free_root_pointers, which is not the correct thing to do, we have to either wait for the caching kthread to complete or hold the extent_commit_sem lock so we know the thread has exited. This patch makes the kthreads all stop first and then we do our cleanup. This should fix the race. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Liu Bo authored
Commit be283b2e ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We've free'ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
Dave reported a NULL pointer deref. This is caused because he thought he'd be smart and add sanity checks to the extent_io bit operations, but he didn't expect a tree to have a NULL mapping. To fix this we just need to init the relocation's processed_blocks with the btree_inode->i_mapping. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Naohiro Aota authored
There is a path where btrfs_drop_inode() is called with its inode's root is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails, iput() is called. We should handle this case before taking look at the root->root_item. Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
We get a use after free if we had a transaction to cleanup since there could be delayed inodes which refer to their respective fs_root. Thanks Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
- 18 May, 2013 5 commits
-
-
-
Chris Mason authored
Btrfs has been pointer tagging bi_private and using bi_bdev to store the stripe index and mirror number of failed IOs. As bios bubble back up through the call chain, we use these to decide if and how to retry our IOs. They are also used to count IO failures on a per device basis. Recently a bio tracepoint was added lead to crashes because we were abusing bi_bdev. This commit adds a btrfs bioset, and creates explicit fields for the mirror number and stripe index. The plan is to extend this structure for all of the fields currently in struct btrfs_bio, which will mean one less kmalloc in our IO path. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Tejun Heo <tj@kernel.org>
-
Josef Bacik authored
If we fail to load the chunk tree we'll call free_root_pointers, except we may not have assigned the roots for the dev_root/extent_root/csum_root yet, so we could NULL pointer deref at this point. Just add checks to make sure these roots are set to keep us from panicing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Stefan Behrens authored
The quota_tree was set up to use the empty_block_rsv before which would be problematic when the filesystem is filled up and ENOSPC happens during internal operations while the quota tree is updated and COWed (when the btrfs_qgroup_info_item items) are written. In fact, use_block_rsv() which is used in btrfs_cow_block() falls back to the global_block_rsv in this case. But just in order to make it more clear what is happening, change it to explicitly use the global_block_rsv. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-
Alexandre Oliva authored
end_bio_extent_readpage computes whole_page based on bv_offset and bv_len, without taking into account that blk_update_request may modify them when some of the blocks to be read into a page produce a read error. This would cause the read to unlock only part of the file range associated with the page, which would in turn leave the entire page locked, which would not only keep the process blocked instead of returning -EIO to it, but also prevent any further access to the file. It turns out that btrfs always issues whole-page reads and writes. The special handling of non-whole_page appears to be a mistake or a left-over from a time when this wasn't the case. Indeed, end_bio_extent_writepage distinguished between whole_page and non-whole_page writes but behaved identically in both cases! I've replaced the whole_page computations with warnings, just to be sure that we're not issuing partial page reads or writes. The warnings should probably just go away some time. Signed-off-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-