Commits · 7d2b4daa67379960477568abda62b8ba9ee3a8aa · nexedi / linux

25 Sep, 2008 40 commits

Btrfs: Fix the multi-bio code to save the original bio for completion · 7d2b4daa

Chris Mason authored Aug 05, 2008

The multi-bio code is responsible for duplicating blocks in raid1 and
single spindle duplication.  It has counters to make sure all of
the locations for a given extent are properly written before io completion
is returned to the higher layers.

But, it didn't always complete the same bio it was given, sometimes a
clone was completed instead.  This lead to problems with the async
work queues because they saved a pointer to the bio in a struct off
bi_private.

The fix is to remember the original bio and only complete that one.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7d2b4daa

Btrfs: Update clone file ioctl · ae01a0ab

Yan Zheng authored Aug 04, 2008

This patch updates the file clone ioctl for the tree locking and new
data ordered code.

---
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ae01a0ab

Btrfs: Various small fixes. · b48652c1

Yan Zheng authored Aug 04, 2008

This trivial patch contains two locking fixes and a off by one fix.

---
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b48652c1

btrfs_lookup_bio_sums seems broken, go back to the readpage_io_hook for now · 3de9d6b6
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3de9d6b6
Btrfs: Maintain a list of inodes that are delalloc and a way to wait on them · ea8c2819
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
ea8c2819

Btrfs: Don't corrupt ram in shrink_extent_tree, leak it instead · d7a029a8

Chris Mason authored Aug 04, 2008

Far from the perfect fix, but these structs are small.  TODO for the
next release.  The block group cache structs are referenced in many
different places, and it isn't safe to just free them while resizing.

A real fix will be a larger change to the allocator so that it doesn't
have to carry about the block group cache structs to find good places
to search for free blocks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d7a029a8

Btrfs: fix ioctl-initiated transactions vs wait_current_trans() · 9ca9ee09

Sage Weil authored Aug 04, 2008

Commit 597:466b27332893 (btrfs_start_transaction: wait for commits in
progress) breaks the transaction start/stop ioctls by making
btrfs_start_transaction conditionally wait for the next transaction to
start.  If an application artificially is holding a transaction open,
things deadlock.

This workaround maintains a count of open ioctl-initiated transactions in
fs_info, and avoids wait_current_trans() if any are currently open (in
start_transaction() and btrfs_throttle()).  The start transaction ioctl
uses a new btrfs_start_ioctl_transaction() that _does_ call
wait_current_trans(), effectively pushing the join/wait decision to the
outer ioctl-initiated transaction.

This more or less neuters btrfs_throttle() when ioctl-initiated
transactions are in use, but that seems like a pretty fundamental
consequence of wrapping lots of write()'s in a transaction.  Btrfs has no
way to tell if the application considers a given operation as part of it's
transaction.

Obviously, if the transaction start/stop ioctls aren't being used, there
is no effect on current behavior.
Signed-off-by: Sage Weil <sage@newdream.net>
---
 ctree.h       |    1 +
 ioctl.c       |   12 +++++++++++-
 transaction.c |   18 +++++++++++++-----
 transaction.h |    2 ++
 4 files changed, 27 insertions(+), 6 deletions(-)
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9ca9ee09

Btrfs: Add support for HW assisted crc32c · 3117a773

Chris Mason authored Aug 04, 2008

Intel doesn't yet ship hardware to the public with this enabled, but when they
do, they will be ready.  Original code from:

Austin Zhang <austin_zhang@linux.intel.com>

It is currently disabled, but edit crc32c.h to turn it on.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3117a773

Btrfs: Hold csum mutex while reading in sums during readpages · 6dab8157
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
6dab8157

Btrfs: More throttle tuning · 2dd3e67b

Chris Mason authored Aug 04, 2008

* Make walk_down_tree wake up throttled tasks more often
* Make walk_down_tree call cond_resched during long loops
* As the size of the ref cache grows, wait longer in throttle
* Get rid of the reada code in walk_down_tree, the leaves don't get
  read anymore, thanks to the ref cache.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2dd3e67b

btrfs_search_slot: reduce lock contention by cowing in two stages · 65b51a00

Chris Mason authored Aug 01, 2008

A btree block cow has two parts, the first is to allocate a destination
block and the second is to copy the old bock over.

The first part needs locks in the extent allocation tree, and may need to
do IO. This changeset splits that into a separate function that can be
called without any tree locks held.

btrfs_search_slot is changed to drop its path and start over if it has
to COW a contended block. This often means that many writers will
pre-alloc a new destination for a the same contended block, but they
cache their prealloc for later use on lower levels in the tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

65b51a00

Btrfs: Throttle less often waiting for snapshots to delete · 18e35e0a
Chris Mason authored Aug 01, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
18e35e0a

Btrfs: Improve and cleanup locking done by walk_down_tree · f87f057b

Chris Mason authored Aug 01, 2008

While dropping snapshots, walk_down_tree does most of the work of checking
reference counts and limiting tree traversal to just the blocks that
we are freeing.

It dropped and held the allocation mutex in strange and confusing ways,
this commit changes it to only hold the mutex while actually freeing a block.

The rest of the checks around reference counts should be safe without the lock
because we only allow one process in btrfs_drop_snapshot at a time. Other
processes dropping reference counts should not drop it to 1 because
their tree roots already have an extra ref on the block.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f87f057b

Btrfs: Hold a reference on bios during submit_bio, add some extra bio checks · 492bb6de
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
492bb6de
Btrfs: Drop some debugging around the extent_map pinned flag · 3ce7e67a
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3ce7e67a

Btrfs: Fix streaming read performance with checksumming on · 61b49440

Chris Mason authored Jul 31, 2008

Large streaming reads make for large bios, which means each entry on the
list async work queues represents a large amount of data. IO
congestion throttling on the device was kicking in before the async
worker threads decided a single thread was busy and needed some help.

The end result was that a streaming read would result in a single CPU
running at 100% instead of balancing the work off to other CPUs.

This patch also changes the pre-IO checksum lookup done by reads to
work on a per-bio basis instead of a per-page. This results in many
extra btree lookups on large streaming reads. Doing the checksum lookup
right before bio submit allows us to reuse searches while processing
adjacent offsets.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

61b49440

Btrfs: Throttle tuning · 37d1aeee

Chris Mason authored Jul 31, 2008

This avoids waiting for transactions with pages locked by breaking out
the code to wait for the current transaction to close into a function
called by btrfs_throttle.

It also lowers the limits for where we start throttling.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

37d1aeee

Btrfs: Add missing hunk from Yan Zheng's cache reclaim patch · 47ac14fa
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
47ac14fa

Btrfs: Add compatibility for kernels >= 2.6.27-rc1 · 0ee0fda0

Sven Wegener authored Jul 30, 2008

Add a couple of #if's to follow API changes.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0ee0fda0

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

Yan authored Jul 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

bcc63abb

Btrfs: Fix verify_parent_transid · 33958dc6

Chris Mason authored Jul 30, 2008

It was incorrectly clearing the up to date flag on the buffer even
when the buffer properly verified.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

33958dc6

Btrfs: Update and fix mount -o nodatacow · f321e491

Yan Zheng authored Jul 30, 2008

To check whether a given file extent is referenced by multiple snapshots, the
checker walks down the fs tree through dead root and checks all tree blocks in
the path.

We can easily detect whether a given tree block is directly referenced by other
snapshot. We can also detect any indirect reference from other snapshot by
checking reference's generation. The checker can always detect multiple
references, but can't reliably detect cases of single reference. So btrfs may
do file data cow even there is only one reference.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f321e491

Btrfs: async-thread: fix possible memory leak · 3bf10418

Li Zefan authored Jul 30, 2008

When kthread_run() returns failure, this worker hasn't been
added to the list, so btrfs_stop_workers() won't free it.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3bf10418

Btrfs: Throttle operations if the reference cache gets too large · ab78c84d

Chris Mason authored Jul 29, 2008

A large reference cache is directly related to a lot of work pending
for the cleaner thread.  This throttles back new operations based on
the size of the reference cache so the cleaner thread will be able to keep
up.

Overall, this actually makes the FS faster because the cleaner thread will
be more likely to find things in cache.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ab78c84d

Btrfs: Fix version.sh when used outside of an hg repo · 1a3f5d04
Chris Mason authored Jul 29, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
1a3f5d04

Btrfs: Leaf reference cache update · 017e5369

Chris Mason authored Jul 28, 2008

This changes the reference cache to make a single cache per root
instead of one cache per transaction, and to key by the byte number
of the disk block instead of the keys inside.

This makes it much less likely to have cache misses if a snapshot
or something has an extra reference on a higher node or a leaf while
the first transaction that added the leaf into the cache is dropping.

Some throttling is added to functions that free blocks heavily so they
wait for old transactions to drop.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

017e5369

Btrfs: Add a leaf reference cache · 31153d81

Yan Zheng authored Jul 28, 2008

Much of the IO done while dropping snapshots is done looking up
leaves in the filesystem trees to see if they point to any extents and
to drop the references on any extents found.

This creates a cache so that IO isn't required.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

31153d81

Btrfs: Rev the disk format magic · 3a115f52
Chris Mason authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3a115f52

Btrfs: Null terminate strings passed in from userspace · 5516e595

Mark Fasheh authored Jul 24, 2008

The 'char name[BTRFS_PATH_NAME_MAX]' member of struct btrfs_ioctl_vol_args
is passed directly to strlen() after being copied from user. I haven't
verified this, but in theory a userspace program could pass in an
unterminated string and cause a kernel crash as strlen walks off the end of
the array.

This patch terminates the ->name string in all btrfs ioctl functions which
currently use a 'struct btrfs_ioctl_vol_args'. Since the string is now
properly terminated, it's length will never be longer than
BTRFS_PATH_NAME_MAX so that error check has been removed.

By the way, it might be better overall to just have the ioctl pass an
unterminated string + length structure but I didn't bother with that since
it'd change the kernel/user interface.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5516e595

Fix path slots selection in btrfs_search_forward · 9652480b

Yan authored Jul 24, 2008

We should decrease the found slot by one as btrfs_search_slot does
when bin_search return 1 and node level > 0.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9652480b

Btrfs: Fix .. lookup corner case · 445dceb7

Yan authored Jul 24, 2008

Inode ref item can be in the next leaf when we find "path->slots[0] ==
btrfs_header_nritems(...)".
Signed-off-by: Chris Mason <chris.mason@oracle.com>

445dceb7

Btrfs: Properly release lock in pin_down_bytes · 974e35a8

Yan authored Jul 24, 2008

When buffer isn't uptodate, pin_down_bytes may leave the tree locked
after it returns.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

974e35a8

Btrfs: Remove unused variable in fixup_tree_root_location · 45467261

Balaji Rao authored Jul 24, 2008

Remove a unused variable 'path' in fixup_tree_root_location.
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

45467261

Btrfs: Fix a few functions that exit without stopping their transaction · 8e8a1e31
Josef Bacik authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
8e8a1e31
Btrfs: Create orphan inode records to prevent lost files after a crash · 7b128766
Josef Bacik authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
7b128766
Btrfs: Add ACL support · 33268eaf
Josef Bacik authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
33268eaf
Btrfs: Remove unused xattr code · 6099afe8
Josef Bacik authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
6099afe8
Btrfs: Implement new dir index format · aec7477b
Josef Bacik authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
aec7477b

Btrfs: Fix the defragmention code and the block relocation code for data=ordered · 3eaa2885

Chris Mason authored Jul 24, 2008

Before setting an extent to delalloc, the code needs to wait for
pending ordered extents.

Also, the relocation code needs to wait for ordered IO before scanning
the block group again.  This is because the extents are not removed
until the IO for the new extents is finished
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3eaa2885

Btrfs: Use assert_spin_locked instead of spin_trylock · 64f26f74
David Woodhouse authored Jul 24, 2008
```
On UP systems spin_trylock always succeeds
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
64f26f74