Commits · 4854ddd0ed0a687fc2d7c45a529c406232e31e7b · Kirill Smelkov / linux

25 Sep, 2008 40 commits

Btrfs: Wait for kernel threads to make progress during async submission · 4854ddd0

Chris Mason authored Aug 15, 2008

Before this change, btrfs would use a bdi congestion function to make
sure there weren't too many pending async checksum work items.

This change makes the process creating async work items wait instead,
leading to fewer congestion returns from the bdi.  This improves
pdflush background_writeout scanning.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4854ddd0

Btrfs: Give all the worker threads descriptive names · 5443be45
Chris Mason authored Aug 15, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
5443be45

Btrfs: Transaction commit: don't use filemap_fdatawait · 777e6bd7

Chris Mason authored Aug 15, 2008

After writing out all the remaining btree blocks in the transaction,
the commit code would use filemap_fdatawait to make sure it was all
on disk.  This means it would wait for blocks written by other procs
as well.

The new code walks the list of blocks for this transaction again
and waits only for those required by this transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

777e6bd7

Btrfs: Count async bios separately from async checksum work items · 0986fe9e
Chris Mason authored Aug 15, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
0986fe9e
Btrfs: Limit the number of async bio submission kthreads to the number of devices · b720d209
Chris Mason authored Aug 15, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
b720d209

Btrfs: Init address_space->writeback_index properly · db69e0eb

Chris Mason authored Aug 15, 2008

The writeback_index field is used by write_cache_pages to pick up where
writeback on a given inode left off.  But, it is never set to a sane
value, so writeback can often start at a random offset in the file.

Kernels 2.6.28 and higher will have this fixed, but for everyone else,
we also fill in the value in btrfs.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

db69e0eb

Btrfs: Change TestSetPageLocked() to trylock_page() · 2db04966

David Woodhouse authored Aug 07, 2008

Add backwards compatibility in compat.h
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
 compat.h    |    3 +++
 extent_io.c |    3 ++-
 2 files changed, 5 insertions(+), 1 deletions(-)
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2db04966

Btrfs: fix RHEL test for ClearPageFsMisc · 5036f538

Eric Sandeen authored Aug 07, 2008

Newer RHEL5 kernels define both ClearPageFSMisc and
ClearPageChecked, so test for both before redefining.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5036f538

Btrfs: Update version.sh to v0.16 · 5707e3b6
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
5707e3b6
Btrfs: Avoid calling into the FS for the final iput on fake root inodes · 4ca8b41e
Chris Mason authored Aug 05, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
4ca8b41e
Btrfs: Fix nodatacow for the new data=ordered mode · 7ea394f1
Yan Zheng authored Aug 05, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
7ea394f1

Get rid of BTRFS_I(inode)->index and use local vars instead · 00e4e6b3

Chris Mason authored Aug 05, 2008

rename and link don't always have a lock on the source inode, and
our use of a per-inode index variable was racy.  This changes things to
store the index in a local variable instead.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

00e4e6b3

Btrfs: Fix the multi-bio code to save the original bio for completion · 7d2b4daa

Chris Mason authored Aug 05, 2008

The multi-bio code is responsible for duplicating blocks in raid1 and
single spindle duplication.  It has counters to make sure all of
the locations for a given extent are properly written before io completion
is returned to the higher layers.

But, it didn't always complete the same bio it was given, sometimes a
clone was completed instead.  This lead to problems with the async
work queues because they saved a pointer to the bio in a struct off
bi_private.

The fix is to remember the original bio and only complete that one.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7d2b4daa

Btrfs: Update clone file ioctl · ae01a0ab

Yan Zheng authored Aug 04, 2008

This patch updates the file clone ioctl for the tree locking and new
data ordered code.

---
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ae01a0ab

Btrfs: Various small fixes. · b48652c1

Yan Zheng authored Aug 04, 2008

This trivial patch contains two locking fixes and a off by one fix.

---
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b48652c1

btrfs_lookup_bio_sums seems broken, go back to the readpage_io_hook for now · 3de9d6b6
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3de9d6b6
Btrfs: Maintain a list of inodes that are delalloc and a way to wait on them · ea8c2819
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
ea8c2819

Btrfs: Don't corrupt ram in shrink_extent_tree, leak it instead · d7a029a8

Chris Mason authored Aug 04, 2008

Far from the perfect fix, but these structs are small.  TODO for the
next release.  The block group cache structs are referenced in many
different places, and it isn't safe to just free them while resizing.

A real fix will be a larger change to the allocator so that it doesn't
have to carry about the block group cache structs to find good places
to search for free blocks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d7a029a8

Btrfs: fix ioctl-initiated transactions vs wait_current_trans() · 9ca9ee09

Sage Weil authored Aug 04, 2008

Commit 597:466b27332893 (btrfs_start_transaction: wait for commits in
progress) breaks the transaction start/stop ioctls by making
btrfs_start_transaction conditionally wait for the next transaction to
start.  If an application artificially is holding a transaction open,
things deadlock.

This workaround maintains a count of open ioctl-initiated transactions in
fs_info, and avoids wait_current_trans() if any are currently open (in
start_transaction() and btrfs_throttle()).  The start transaction ioctl
uses a new btrfs_start_ioctl_transaction() that _does_ call
wait_current_trans(), effectively pushing the join/wait decision to the
outer ioctl-initiated transaction.

This more or less neuters btrfs_throttle() when ioctl-initiated
transactions are in use, but that seems like a pretty fundamental
consequence of wrapping lots of write()'s in a transaction.  Btrfs has no
way to tell if the application considers a given operation as part of it's
transaction.

Obviously, if the transaction start/stop ioctls aren't being used, there
is no effect on current behavior.
Signed-off-by: Sage Weil <sage@newdream.net>
---
 ctree.h       |    1 +
 ioctl.c       |   12 +++++++++++-
 transaction.c |   18 +++++++++++++-----
 transaction.h |    2 ++
 4 files changed, 27 insertions(+), 6 deletions(-)
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9ca9ee09

Btrfs: Add support for HW assisted crc32c · 3117a773

Chris Mason authored Aug 04, 2008

Intel doesn't yet ship hardware to the public with this enabled, but when they
do, they will be ready.  Original code from:

Austin Zhang <austin_zhang@linux.intel.com>

It is currently disabled, but edit crc32c.h to turn it on.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3117a773

Btrfs: Hold csum mutex while reading in sums during readpages · 6dab8157
Chris Mason authored Aug 04, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
6dab8157

Btrfs: More throttle tuning · 2dd3e67b

Chris Mason authored Aug 04, 2008

* Make walk_down_tree wake up throttled tasks more often
* Make walk_down_tree call cond_resched during long loops
* As the size of the ref cache grows, wait longer in throttle
* Get rid of the reada code in walk_down_tree, the leaves don't get
  read anymore, thanks to the ref cache.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2dd3e67b

btrfs_search_slot: reduce lock contention by cowing in two stages · 65b51a00

Chris Mason authored Aug 01, 2008

A btree block cow has two parts, the first is to allocate a destination
block and the second is to copy the old bock over.

The first part needs locks in the extent allocation tree, and may need to
do IO. This changeset splits that into a separate function that can be
called without any tree locks held.

btrfs_search_slot is changed to drop its path and start over if it has
to COW a contended block. This often means that many writers will
pre-alloc a new destination for a the same contended block, but they
cache their prealloc for later use on lower levels in the tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

65b51a00

Btrfs: Throttle less often waiting for snapshots to delete · 18e35e0a
Chris Mason authored Aug 01, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
18e35e0a

Btrfs: Improve and cleanup locking done by walk_down_tree · f87f057b

Chris Mason authored Aug 01, 2008

While dropping snapshots, walk_down_tree does most of the work of checking
reference counts and limiting tree traversal to just the blocks that
we are freeing.

It dropped and held the allocation mutex in strange and confusing ways,
this commit changes it to only hold the mutex while actually freeing a block.

The rest of the checks around reference counts should be safe without the lock
because we only allow one process in btrfs_drop_snapshot at a time. Other
processes dropping reference counts should not drop it to 1 because
their tree roots already have an extra ref on the block.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f87f057b

Btrfs: Hold a reference on bios during submit_bio, add some extra bio checks · 492bb6de
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
492bb6de
Btrfs: Drop some debugging around the extent_map pinned flag · 3ce7e67a
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3ce7e67a

Btrfs: Fix streaming read performance with checksumming on · 61b49440

Chris Mason authored Jul 31, 2008

Large streaming reads make for large bios, which means each entry on the
list async work queues represents a large amount of data. IO
congestion throttling on the device was kicking in before the async
worker threads decided a single thread was busy and needed some help.

The end result was that a streaming read would result in a single CPU
running at 100% instead of balancing the work off to other CPUs.

This patch also changes the pre-IO checksum lookup done by reads to
work on a per-bio basis instead of a per-page. This results in many
extra btree lookups on large streaming reads. Doing the checksum lookup
right before bio submit allows us to reuse searches while processing
adjacent offsets.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

61b49440

Btrfs: Throttle tuning · 37d1aeee

Chris Mason authored Jul 31, 2008

This avoids waiting for transactions with pages locked by breaking out
the code to wait for the current transaction to close into a function
called by btrfs_throttle.

It also lowers the limits for where we start throttling.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

37d1aeee

Btrfs: Add missing hunk from Yan Zheng's cache reclaim patch · 47ac14fa
Chris Mason authored Jul 31, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
47ac14fa

Btrfs: Add compatibility for kernels >= 2.6.27-rc1 · 0ee0fda0

Sven Wegener authored Jul 30, 2008

Add a couple of #if's to follow API changes.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0ee0fda0

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

Yan authored Jul 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

bcc63abb

Btrfs: Fix verify_parent_transid · 33958dc6

Chris Mason authored Jul 30, 2008

It was incorrectly clearing the up to date flag on the buffer even
when the buffer properly verified.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

33958dc6

Btrfs: Update and fix mount -o nodatacow · f321e491

Yan Zheng authored Jul 30, 2008

To check whether a given file extent is referenced by multiple snapshots, the
checker walks down the fs tree through dead root and checks all tree blocks in
the path.

We can easily detect whether a given tree block is directly referenced by other
snapshot. We can also detect any indirect reference from other snapshot by
checking reference's generation. The checker can always detect multiple
references, but can't reliably detect cases of single reference. So btrfs may
do file data cow even there is only one reference.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f321e491

Btrfs: async-thread: fix possible memory leak · 3bf10418

Li Zefan authored Jul 30, 2008

When kthread_run() returns failure, this worker hasn't been
added to the list, so btrfs_stop_workers() won't free it.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3bf10418

Btrfs: Throttle operations if the reference cache gets too large · ab78c84d

Chris Mason authored Jul 29, 2008

A large reference cache is directly related to a lot of work pending
for the cleaner thread.  This throttles back new operations based on
the size of the reference cache so the cleaner thread will be able to keep
up.

Overall, this actually makes the FS faster because the cleaner thread will
be more likely to find things in cache.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ab78c84d

Btrfs: Fix version.sh when used outside of an hg repo · 1a3f5d04
Chris Mason authored Jul 29, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
1a3f5d04

Btrfs: Leaf reference cache update · 017e5369

Chris Mason authored Jul 28, 2008

This changes the reference cache to make a single cache per root
instead of one cache per transaction, and to key by the byte number
of the disk block instead of the keys inside.

This makes it much less likely to have cache misses if a snapshot
or something has an extra reference on a higher node or a leaf while
the first transaction that added the leaf into the cache is dropping.

Some throttling is added to functions that free blocks heavily so they
wait for old transactions to drop.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

017e5369

Btrfs: Add a leaf reference cache · 31153d81

Yan Zheng authored Jul 28, 2008

Much of the IO done while dropping snapshots is done looking up
leaves in the filesystem trees to see if they point to any extents and
to drop the references on any extents found.

This creates a cache so that IO isn't required.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

31153d81

Btrfs: Rev the disk format magic · 3a115f52
Chris Mason authored Jul 24, 2008
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
3a115f52