Commits · d1d3cd27a32b819a0b0e2b9b6b5f9e80890f9821 · Kirill Smelkov / linux · GitLab

20 Feb, 2013 38 commits

btrfs: list_entry can't return NULL · d1d3cd27

Eric Sandeen authored Jan 31, 2013

No need to test the result, we can't get a
null pointer from list_entry()
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

d1d3cd27

btrfs: remove unused fd in btrfs_ioctl_send() · b4c6f7b7

Eric Sandeen authored Jan 31, 2013

All we do is set it to NULL and test it :)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

b4c6f7b7

Btrfs: do not overcommit if we don't have enough space for global rsv · 96f1bb57

Josef Bacik authored Jan 30, 2013

Because of how little we allocate chunks now we can get really tight on
metadata space before we will allocate a new chunk. This resulted in being
unable to add device extents when allocating a new metadata chunk as we did
not have enough space. This is because we were allowed to overcommit too
much metadata without actually making sure we had enough space to make
allocations. The idea behind overcommit is that we are allowed to say "sure
you can have that reservation" when most of the free space is occupied by
reservations, not actual allocations. But in this case where a majority of
the total space is in use by actual allocations we can screw ourselves by
not being able to make real allocations when it matters. So make sure we
have enough real space for our global reserve, and if not then don't allow
overcommitting. Thanks,
Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

96f1bb57

Btrfs: remove extent mapping if we fail to add chunk · 0f5d42b2

Josef Bacik authored Jan 31, 2013

I got a double free error when unmounting a file system that failed to add a
chunk during its operation.  This is because we will kfree the mapping that
we created but leave the extent_map in the em_tree for chunks.  So to fix
this just remove the extent_map when we error out so we don't run into this
problem.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

0f5d42b2

Btrfs: fix chunk allocation error handling · 04487488

Josef Bacik authored Jan 29, 2013

If we error out allocating a dev extent we will have already created the
block group and such which will cause problems since the allocator may have
tried to allocate out of the block group that no longer exists. This will
cause BUG_ON()'s in the bio submission path. This also makes a failure to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit. Now if we fail to delete the dev extents
we will abort since we can't have half of the dev extents hanging around,
but this will make us much less likely to abort. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

04487488

Btrfs: use bit operation for ->fs_state · 87533c47

Miao Xie authored Jan 29, 2013

There is no lock to protect fs_info->fs_state, it will introduce
some problems, such as the value may be covered by the other task
when several tasks modify it. For example:
	Task0 - CPU0		Task1 - CPU1
	mov %fs_state rax
	or $0x1 rax
				mov %fs_state rax
				or $0x2 rax
	mov rax %fs_state
				mov rax %fs_state
The expected value is 3, but in fact, it is 2.

Though this problem doesn't happen now (because there is only one
flag currently), the code is error prone, if we add other flags,
the above problem will happen to a certainty.

Now we use bit operation for it to fix the above problem.
In this way, we can make the code more robust and be easy to
add new flags.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

87533c47

Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits · de98ced9

Miao Xie authored Jan 29, 2013

There is no lock to protect
  fs_info->avail_{data, metadata, system}_alloc_bits,
it may introduce some problem, such as the wrong profile
information, so we add a seqlock to protect them.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

de98ced9

Btrfs: use the inode own lock to protect its delalloc_bytes · df0af1a5

Miao Xie authored Jan 29, 2013

We need not use a global lock to protect the delalloc_bytes of the
inode, just use its own lock. In this way, we can reduce the lock
contention and ->delalloc_lock will just protect delalloc inode
list.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

df0af1a5

Btrfs: use percpu counter for fs_info->delalloc_bytes · 963d678b

Miao Xie authored Jan 29, 2013

fs_info->delalloc_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant for it to reduce the lock
contention.

This patch also fixed the problem that we access the variant
without the lock protection.At worst, we would not flush the
delalloc inodes, and just return ENOSPC error when we still have
some free space in the fs.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

963d678b

Btrfs: use percpu counter for dirty metadata count · e2d84521

Miao Xie authored Jan 29, 2013

->dirty_metadata_bytes is accessed very frequently, so use percpu
counter instead of the u64 variant to reduce the contention of
the lock.

This patch also fixed the problem that we access it without
lock protection in __btrfs_btree_balance_dirty(), which may
cause we skip the dirty pages flush.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

e2d84521

Btrfs: protect fs_info->alloc_start · c018daec

Miao Xie authored Jan 29, 2013

fs_info->alloc_start is a 64bits variant, can be accessed by
multi-task, but it is not protected strictly, it can be changed
while we are accessing it. On 32bit machine, we will get wrong
value because we access it by two instructions.(In fact, it is
also possible that the same problem happens on the 64bit machine,
because the compiler may split the 64bit operation into two 32bit
operation.)

For example:
Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning,
then we remount and set ->alloc_start to 0x0000 0100 0000 0000.
	Task0 			Task1
				load high 32 bits
	set high 32 bits
	set low 32 bits
				load low 32 bits

Task1 will get 0.

This patch fixes this problem by using two locks to protect it
	fs_info->chunk_mutex
	sb->s_umount
On the read side, we just need get one of these two locks, and on
the write side, we must lock all of them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

c018daec

Btrfs: add a comment for fs_info->max_inline · 8c6a3ee6

Miao Xie authored Jan 29, 2013

Though ->max_inline is a 64bit variant, and may be accessed by
multi-task, but it is just suggestive number, so we needn't add
anything to protect fs_info->max_inline, just add a comment to
explain wny we don't use a lock to protect it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

8c6a3ee6

Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h · 55e301fd

Filipe Brandenburger authored Jan 29, 2013

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

55e301fd

Btrfs: Check CAP_DAC_READ_SEARCH for BTRFS_IOC_INO_PATHS · 82b22ac8

Kusanagi Kouichi authored Jan 28, 2013

CAP_DAC_READ_SEARCH overrides read and search permission check on
file and directory. It seems fit for BTRFS_IOC_INO_PATHS.
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

82b22ac8

Revert "Btrfs: fix permissions of empty files not affected by umask" · fe5fafbe

Josef Bacik authored Jan 24, 2013

This reverts commit 2794ed01.

Wasn't supposed to get used in btrfs_mknod, it was supposed to be in
btrfs_create, which was done in commit
9185aa58.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

fe5fafbe

Btrfs: don't traverse the ordered operation list repeatedly · 5b947f1b

Miao Xie authored Jan 22, 2013

btrfs_run_ordered_operations() needn't traverse the ordered operation list
repeatedly, it is because the transaction commiter will invoke it again when
there is no other writer in this transaction, it can ensure that no one can
add new objects into the ordered operation list.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

5b947f1b

Btrfs: traverse and flush the delalloc inodes once · 63607cc8

Miao Xie authored Jan 22, 2013

btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes
repeatedly. It is because we can regard the data that the users write after
we start delalloc inodes flush as the one which is after the delalloc inodes
flush is done, and we can flush it next time.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

63607cc8

Btrfs: check the return value of btrfs_run_ordered_operations() · eebc6084

Miao Xie authored Jan 22, 2013

We forget to check the return value of btrfs_run_ordered_operations() when
flushing all the pending stuffs, fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

eebc6084

Btrfs: check the return value of btrfs_start_delalloc_inodes() · 3edb2a68

Miao Xie authored Jan 22, 2013

We forget to check the return value of btrfs_start_delalloc_inodes(), fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

3edb2a68

Btrfs: make raid attr array more readable · e6ec716f

Miao Xie authored Jan 17, 2013

The current code of raid attr arry is hard to understand and it is easy to
introduce some problem if we modify the array. So I changed it and made it
more readable.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

e6ec716f

Btrfs: record first logical byte in memory · a1897fdd

Liu Bo authored Dec 27, 2012

This'd save us a rbtree search which may become expensive in large filesystem.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

a1897fdd

Btrfs: save us a read_lock · 39f9d028

Liu Bo authored Dec 27, 2012

This does not change the logic of code, but can save us a read_lock.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

39f9d028

Btrfs: use token to avoid times mapping extent buffer · 51fab693

Liu Bo authored Dec 27, 2012

The API in tree log code has done sort of changes, and it proves that
we can benifit from using token, so do the same thing here.

function_graph tracer's timer shows that it costs nearly half time
of before(39.788us -> 22.391us).
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

51fab693

Btrfs: kill unused argument of btrfs_pin_extent_for_log_replay · dcfac415

Liu Bo authored Dec 27, 2012

Argument 'trans' is not used any more.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

dcfac415

Btrfs: kill unused argument of update_block_group · c53d613e

Liu Bo authored Dec 27, 2012

Argument 'trans' is not used any more.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

c53d613e

Btrfs: kill unused arguments of cache_block_group · f6373bf3

Liu Bo authored Dec 27, 2012

Argument 'trans' and 'root' are not used any more.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

f6373bf3

Btrfs: remove deprecated comments · 17b85495

Liu Bo authored Dec 27, 2012

commit d53ba474
(Btrfs: use commit root when loading free space cache) has remove
the deadlock check, and the related comments can be removed as well.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

17b85495

Btrfs: don't re-enter when allocating a chunk · c6b305a8

Josef Bacik authored Dec 18, 2012

If we start running low on metadata space we will try to allocate a chunk,
which could then try to allocate a chunk to add the device entry. The thing
is we allocate a chunk before we try really hard to make the allocation, so
we should be able to find space for the device entry. Add a flag to the
trans handle so we know we're currently allocating a chunk so we can just
bail out if we try to allocate another chunk. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

c6b305a8

Btrfs: wait on ordered extents at the last possible moment · 2ab28f32

Josef Bacik authored Oct 12, 2012

Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed. So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

2ab28f32

Btrfs: fix trivial error in btrfs_ioctl_resize() · dfd79829

Miao Xie authored Dec 21, 2012

This patch fixes the following problem:
- improper return value
- unnecessary read-only check
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

dfd79829

Btrfs: use wrapper page_offset · 4eee4fa4

Miao Xie authored Dec 21, 2012

Use wrapper page_offset to get byte-offset into filesystem object for page.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

4eee4fa4

Btrfs: flush all dirty inodes if writeback can not start · da633a42

Miao Xie authored Dec 20, 2012

We may try to flush some dirty pages when there is no enough space to reserve.
But it is possible that this operation fails, in order to get enough space to
reserve successfully, we will sync all the delalloc file. This operation is
safe, we needn't worry about the case that the filesystem goes from r/w to r/o.
because the filesystem should guarantee all the dirty pages have been written
into the disk after it becomes readonly, so the sync operation will do nothing
if the filesystem is already readonly. Though it may waste lots of time,
as a corner case, we needn't care.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

da633a42

Btrfs: make delayed ref lock logic more readable · 093486c4

Miao Xie authored Dec 19, 2012

Locking and unlocking delayed ref mutex are in the different functions,
and the name of lock functions is not uniform, so the readability is not
so good, this patch optimizes the lock logic and makes it more readable.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

093486c4

Btrfs: fix lots of orphan inodes when the space is not enough · 0e8c36a9

Miao Xie authored Dec 19, 2012

We're running into having 50-100 orphans left over with xfstests 83
because of ENOSPC when trying to start the transaction for the inode update.
But in fact, it makes no sense in updating the inode for the new size while
we're deleting the stupid thing. This patch fixes this problem.
Reported-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

0e8c36a9

Btrfs: cleanup similar code in delayed inode · 4ea41ce0

Miao Xie authored Dec 19, 2012

The delayed item commit code in several functions is similar, so
cleanup it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

4ea41ce0

Btrfs: use common work instead of delayed work · 7892b5af

Miao Xie authored Nov 15, 2012

Since we do not want to delay the async transaction commit, we should
use common work, not delayed work.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

7892b5af

Btrfs: cleanup unnecessary clear when freeing a transaction or a trans handle · 7b5a1c53

Miao Xie authored Nov 15, 2012

We clear the transaction object and the trans handle when they are about to be
freed, it is unnecessary, cleanup it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

7b5a1c53

Btrfs: use slabs for delayed reference allocation · 78a6184a

Miao Xie authored Nov 21, 2012

The delayed reference allocation is in the fast path of the IO, so use slabs
to improve the speed of the allocation.

And besides that, it can do check for leaked objects when the module is removed.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

78a6184a

15 Feb, 2013 2 commits

btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3

David Sterba authored Feb 15, 2013

btrfs_scan_one_device is calling set_blocksize() which can race
with a concurrent process making dirty page cache pages.  It can end up
dropping dirty page cache pages on the floor, which isn't very nice when
someone is just running btrfs dev scan to find filesystems on the
box.

Now that udev is registering btrfs devices as it discovers them, we can
actually end up racing with our own mkfs program too.  When this
happens, we drop some of the important blocks written by mkfs.

This commit changes scan_one_device to read the super out of the page
cache instead of trying to use bread.  This way we don't have to care
about the blocksize of the device.

This also drops the invalidate_bdev() call.  It wasn't very polite to
invalidate during the scan either.  mkfs is putting the super into the
page cache, there's no reason to invalidate at this point.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

6f60cbd3

Btrfs: fix crash in log replay with qgroups enabled · 2a745b14

Arne Jansen authored Feb 13, 2013

When replaying a log tree with qgroups enabled, tree_mod_log_rewind does a
sanity-check of the number of items against the maximum possible number.
It calculates that number with the nodesize of fs_root. Unfortunately
fs_root is not yet set at this stage. So instead use the nodesize from
tree_root, which is already initialized.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

2a745b14