Commits · 6f60cbd3ae442cb35861bb522f388db123d42ec1 · Kirill Smelkov / linux

15 Feb, 2013 2 commits

btrfs: access superblock via pagecache in scan_one_device · 6f60cbd3

David Sterba authored Feb 15, 2013

btrfs_scan_one_device is calling set_blocksize() which can race
with a concurrent process making dirty page cache pages.  It can end up
dropping dirty page cache pages on the floor, which isn't very nice when
someone is just running btrfs dev scan to find filesystems on the
box.

Now that udev is registering btrfs devices as it discovers them, we can
actually end up racing with our own mkfs program too.  When this
happens, we drop some of the important blocks written by mkfs.

This commit changes scan_one_device to read the super out of the page
cache instead of trying to use bread.  This way we don't have to care
about the blocksize of the device.

This also drops the invalidate_bdev() call.  It wasn't very polite to
invalidate during the scan either.  mkfs is putting the super into the
page cache, there's no reason to invalidate at this point.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

6f60cbd3

Btrfs: fix crash in log replay with qgroups enabled · 2a745b14

Arne Jansen authored Feb 13, 2013

When replaying a log tree with qgroups enabled, tree_mod_log_rewind does a
sanity-check of the number of items against the maximum possible number.
It calculates that number with the nodesize of fs_root. Unfortunately
fs_root is not yet set at this stage. So instead use the nodesize from
tree_root, which is already initialized.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

2a745b14

06 Feb, 2013 3 commits

Btrfs: move d_instantiate outside the transaction during mksubvol · 1a65e24b

Chris Mason authored Feb 06, 2013

Dave Sterba triggered a lockdep complaint about lock ordering
between the sb_internal lock and the cleaner semaphore.

btrfs_lookup_dentry() checks for orphans if we're looking up
the inode for a subvolume, and subvolume creation is triggering
the lookup with a transaction running.

This commit moves the d_instantiate after the transaction closes.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

1a65e24b

Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata · eb6b88d9

Jan Schmidt authored Jan 27, 2013

When btrfs_qgroup_reserve returned a failure, we were missing a counter
operation for BTRFS_I(inode)->outstanding_extents++, leading to warning
messages about outstanding extents and space_info->bytes_may_use != 0.
Additionally, the error handling code didn't take into account that we
dropped the inode lock which might require more cleanup.

Luckily, all the cleanup code we need is already there and can be shared
with reserve_metadata_bytes, which is exactly what this patch does.
Reported-by: Lev Vainblat <lev@zadarastorage.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

eb6b88d9

Merge git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git for-chris into for-linus · 24f8ebe9
Chris Mason authored Feb 05, 2013

24f8ebe9

05 Feb, 2013 6 commits

Btrfs: fix possible stale data exposure · 59fe4f41

Josef Bacik authored Jan 30, 2013

We specifically do not update the disk i_size if there are ordered extents
outstanding for any area between the current disk_i_size and our ordered
extent so that we do not expose stale data. The problem is the check we
have only checks if the ordered extent starts at or after the current
disk_i_size, which doesn't take into account an ordered extent that starts
before the current disk_i_size and ends past the disk_i_size. Fix this by
checking if the extent ends past the disk_i_size. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

59fe4f41

Btrfs: fix missing i_size update · 5d1f4020

Josef Bacik authored Jan 30, 2013

If we have an ordered extent before the ordered extent we are currently
completing that is after the current disk_i_size we will put our i_size
update into that ordered extent so that we do not expose stale data. The
problem is that if our disk i_size is updated past the previous ordered
extent we won't update the i_size with the pending i_size update. So check
the pending i_size update and if its above the current disk i_size we need
to go ahead and try to update. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

5d1f4020

Btrfs: fix race between snapshot deletion and getting inode · 6f1c3605

Liu Bo authored Jan 29, 2013

While running snapshot testscript created by Mitch and David,
the race between autodefrag and snapshot deletion can lead to
corruption of dead_root list so that we can get crash on
btrfs_clean_old_snapshots().

And besides autodefrag, scrub also does the same thing, ie. read
root first and get inode.

Here is the story(take autodefrag as an example):
(1) when we delete a snapshot or subvolume, it will set its root's
refs to zero and do a iput() on its own inode, and if this inode happens
to be the only active in-meory one in root's inode rbtree, it will add
itself to the global dead_roots list for later cleanup.

(2) after (1), the autodefrag thread may read another inode for defrag
and the inode is just in the deleted snapshot/subvolume, but all of these
are without checking if the root is still valid(refs > 0).  So the end up
result is adding the deleted snapshot/subvolume's root to the global
dead_roots list AGAIN.

Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.

So all we need to do is to take the lock to protect 'read root and get inode',
since we synchronize to wait for the rcu grace period before adding something
to the global dead_roots list.
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

6f1c3605

Btrfs: fix missing release of the space/qgroup reservation in start_transaction() · 843fcf35

Miao Xie authored Jan 28, 2013

When we fail to start a transaction, we need to release the reserved free space
and qgroup space, fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

843fcf35

Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() · 0a3404dc

Miao Xie authored Jan 28, 2013

If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
decrease ->sync_writers, because we have not increased it. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

0a3404dc

Btrfs: do not merge logged extents if we've removed them from the tree · 222c81dc

Josef Bacik authored Jan 28, 2013

You can run into this problem where if somebody is fsyncing and writing out
the existing extents you will have removed the extent map from the em tree,
but it's still valid for the current fsync so we go ahead and write it. The
problem is we unconditionally try to merge it back into the em tree, but if
we've removed it from the em tree that will cause use after free problems.
Fix this to only merge if we are still a part of the tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

222c81dc

01 Feb, 2013 1 commit

btrfs: don't try to notify udev about missing devices · 3c911608

Eric Sandeen authored Jan 31, 2013

If we remove a missing device, bdev is null, and if we
send that off to btrfs_kobject_uevent we'll panic.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

3c911608

24 Jan, 2013 8 commits

Btrfs: fix repeated delalloc work allocation · 1eafa6c7

Miao Xie authored Jan 22, 2013

btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
btrfs_queue_worker for this inode, and then it locks the list, checks the
head of the list again. But because we don't delete the first inode that it
deals with before, it will fetch the same inode. As a result, this function
allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.

Fix this problem by splice this delalloc list.
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

1eafa6c7

Btrfs: fix wrong max device number for single profile · c9f01bfe

Miao Xie authored Jan 16, 2013

The max device number of single profile is 1, not 0 (0 means 'as many as
possible'). Fix it.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

c9f01bfe

Btrfs: fix missed transaction->aborted check · 2cba30f1

Miao Xie authored Jan 15, 2013

First, though the current transaction->aborted check can stop the commit early
and avoid unnecessary operations, it is too early, and some transaction handles
don't end, those handles may set transaction->aborted after the check.

Second, when we commit the transaction, we will wake up some worker threads to
flush the space cache and inode cache. Those threads also allocate some transaction
handles and may set transaction->aborted if some serious error happens.

So we need more check for ->aborted when committing the transaction. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

2cba30f1

Btrfs: Add ACCESS_ONCE() to transaction->abort accesses · 8d25a086

Miao Xie authored Jan 15, 2013

We may access and update transaction->aborted on the different CPUs without
lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
unsolicited accesses and make sure we can get the right value.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

8d25a086

Btrfs: put csums on the right ordered extent · e58dd74b

Josef Bacik authored Jan 22, 2013

I noticed a WARN_ON going off when adding csums because we were going over
the amount of csum bytes that should have been allowed for an ordered
extent. This is a leftover from when we used to hold the csums privately
for direct io, but now we use the normal ordered sum stuff so we need to
make sure and check if we've moved on to another extent so that the csums
are added to the right extent. Without this we could end up with csums for
bytenrs that don't have extents to cover them yet. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

e58dd74b

Btrfs: use right range to find checksum for compressed extents · 192000dd

Liu Bo authored Jan 06, 2013

For compressed extents, the range of checksum is covered by disk length,
and the disk length is different with ram length, so we need to use disk
length instead to get us the right checksum.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

192000dd

Btrfs: fix panic when recovering tree log · b0175117

Josef Bacik authored Dec 18, 2012

A user reported a BUG_ON(ret) that occured during tree log replay.  Ret was
-EAGAIN, so what I think happened is that we removed an extent that covered
a bitmap entry and an extent entry.  We remove the part from the bitmap and
return -EAGAIN and then search for the next piece we want to remove, which
happens to be an entire extent entry, so we just free the sucker and return.
The problem is ret is still set to -EAGAIN so we trip the BUG_ON().  The
user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
added a WARN_ON() to catch the other possibility.  Thanks,
Reported-by: Jan Steffens <jan.steffens@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

b0175117

Btrfs: do not allow logged extents to be merged or removed · 201a9038

Josef Bacik authored Jan 24, 2013

We drop the extent map tree lock while we're logging extents, so somebody
could come in and merge another extent into this one and screw up our
logging, or they could even remove us from the list which would keep us from
logging the extent or freeing our ref on it, so we need to make sure to not
clear LOGGING until after the extent is logged, and then we can merge it to
adjacent extents. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

201a9038

22 Jan, 2013 5 commits

Btrfs: fix a regression in balance usage filter · a105bb88

Ilya Dryomov authored Jan 21, 2013

Commit 3fed40cc ("Btrfs: cleanup duplicated division functions"), which
was merged into 3.8-rc1, has introduced a regression by removing logic
that was guarding us against bad user input.  Bring it back.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

a105bb88

Merge branch 'mutex-ops@next-for-chris' of git://github.com/idryomov/btrfs-unstable into linus · 83bfccb5
Chris Mason authored Jan 21, 2013

83bfccb5

Merge branch 'for-chris' of... · daf2c089

Chris Mason authored Jan 21, 2013

Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into linus

daf2c089

Btrfs: prevent qgroup destroy when there are still relations · 2cf68703

Arne Jansen authored Jan 17, 2013

Currently you can just destroy a qgroup even though it is in use by other qgroups
or has qgroups assigned to it. This patch prevents destruction of qgroups unless
they are completely unused. Otherwise destroy will return EBUSY.
Reported-by: Eric Hopper <hopper@omnifarious.org>
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

2cf68703

Btrfs: ignore orphan qgroup relations · ff24858c

Arne Jansen authored Jan 17, 2013

If a qgroup that has still assignments is deleted by the user, the corresponding
relations are left in the tree. This leads to an unmountable filesystem.
With this patch, those relations are simple ignored.
Reported-by: Eric Hopper <hopper@omnifarious.org>
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

ff24858c

20 Jan, 2013 5 commits

Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag · 25122d15

Ilya Dryomov authored Jan 20, 2013

Operation-specific check (whether subvol is readonly or not) should go
after the mutual exclusiveness check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

25122d15

Btrfs: fix unlock order in btrfs_ioctl_rm_dev · 4ac20c70
Ilya Dryomov authored Jan 20, 2013
```
Fix unlock order in btrfs_ioctl_rm_dev().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
4ac20c70
Btrfs: fix unlock order in btrfs_ioctl_resize · 18f39c41
Ilya Dryomov authored Jan 20, 2013
```
Fix unlock order in btrfs_ioctl_resize().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
18f39c41

Btrfs: fix "mutually exclusive op is running" error code · 2c0c9da0

Ilya Dryomov authored Jan 20, 2013

The error code that is returned in response to starting a mutually
exclusive operation when there is one already running got silently
changed from EINVAL to EINPROGRESS by 5ac00add. Returning EINPROGRESS
to, say, add_dev, when rm_dev is running is misleading. Furthermore,
the operation itself may want to use EINPROGRESS for other purposes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

2c0c9da0

Btrfs: bring back balance pause/resume logic · ed0fb78f

Ilya Dryomov authored Jan 20, 2013

Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge). Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away. Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ed0fb78f

14 Jan, 2013 10 commits

btrfs: update timestamps on truncate() · 3972f260

Eric Sandeen authored Jan 12, 2013

truncate() vs. ftruncate() differ in the VFS; truncate()
doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the
fs to do the timestamp updates if the size changes.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>

3972f260

btrfs: fix btrfs_cont_expand() freeing IS_ERR em · f2767956

Zach Brown authored Jan 08, 2013

btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from
btrfs_get_extent() and breaks out of its loop.

An instance of -EEXIST was reported in the wild:

  https://bugzilla.redhat.com/show_bug.cgi?id=874407

I have no idea if that -EEXIST is surprising, or not.  Regardless, this
error handling should be cleaned up to handle other reasonable errors
(ENOMEM, EIO; whatever).

This seemed to be the only buggy freeing of the relatively rare IS_ERR
em so I opted to fix the caller rather than teach free_extent_map() to
use IS_ERR_OR_NULL().
Signed-off-by: Zach Brown <zab@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>

f2767956

Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents · f9e4fb53

Liu Bo authored Jan 07, 2013

xfstests case 285 complains.

It it because btrfs did not try to find unwritten delalloc
bytes(only dirty pages, not yet writeback) behind prealloc
extents, it ends up finding nothing while we're with SEEK_DATA.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

f9e4fb53

Btrfs: fix off-by-one in lseek · 1214b53f

Liu Bo authored Jan 07, 2013

Lock end is inclusive.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

1214b53f

Btrfs: reset path lock state to zero · 3268a246

Liu Bo authored Dec 28, 2012

We forgot to reset the path lock state to zero after we unlock the path block,
and this can lead to the ASSERT checker in tree unlock API.
Reported-by: Slava Barinov <rayslava@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

3268a246

Btrfs: let allocation start from the right raid type · ac5c9300

Liu Bo authored Dec 27, 2012

This'd avoid us empty looping.

Say we have only one disk and the metadata raid type will be defaultly DUP,
and we do not need to start from index=0(RAID10) and get over two empty
loops to index=2(DUP).
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

ac5c9300

Btrfs: add orphan before truncating pagecache · f3fe820c

Josef Bacik authored Jan 07, 2013

Running xfstests 83 in a loop would sometimes fail the fsck. This happens
because if we invalidate a page that already has an ordered extent setup for
it we will complete the ordered extent ourselves, assuming that the truncate
will clean everything up. The problem with this is there is plenty of time
for the truncate to fail after we've done this work. So to fix this we need
to add the orphan item first to make sure the cleanup gets done properly,
and then we can truncate the pagecache and all that stuff and be safe. This
fixes the btrfsck failures I was seeing while running 83 in a loop. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

f3fe820c

Btrfs: set flushing if we're limited flushing · 72bcd99d

Josef Bacik authored Dec 18, 2012

We still need to say we're flushing if we're limit flushing to keep somebody
from coming in and stealing our reservation.  Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

72bcd99d

Btrfs: fix missing write access release in btrfs_ioctl_resize() · 97547676

Miao Xie authored Dec 21, 2012

We forget to give up the write access after we find some device operation
is going on. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

97547676

Btrfs: fix resize a readonly device · dba60f3f

Miao Xie authored Dec 21, 2012

We should not resize a readonly device, fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

dba60f3f