Commits · 15e7000095e6fc9ad07e476a100c900c72c14225 · nexedi / linux

11 Jun, 2010 15 commits

Btrfs: avoid BUG when dropping root and reference in same transaction · 15e70000

Sage Weil authored May 17, 2010

If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes
with end_transaction(), the cleaner kthread may come in and
drop the root in the same transaction.  If that's the case, the
root's refs still == 1 in the tree when btrfs_del_root() deletes
the item, because commit_fs_roots() hasn't updated it yet (that
happens during the commit).

This wasn't a problem before only because
btrfs_ioctl_snap_destroy() would commit the transaction before dropping
the dentry reference, so the dead root wouldn't get queued up until
after the fs root item was updated in the btree.

Since it is not an error to drop the root reference and the root in the
same transaction, just drop the BUG_ON() in btrfs_del_root().
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

15e70000

Btrfs: prohibit a operation of changing acl's mask when noacl mount option used · 731e3d1b

Shi Weihua authored May 18, 2010

when used Posix File System Test Suite(pjd-fstest) to test btrfs,
some cases about setfacl failed when noacl mount option used.
I simplified used commands in pjd-fstest, and the following steps
can reproduce it.
------------------------
# cd btrfs-part/
# mkdir aaa
# setfacl -m m::rw aaa    <- successed, but not expected by pjd-fstest.
------------------------
I checked ext3, a warning message occured, like as:
  setfacl: aaa/: Operation not supported
Certainly, it's expected by pjd-fstest.

So, i compared acl.c of btrfs and ext3. Based on that, a patch created.
Fortunately, it works.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

731e3d1b

Btrfs: should add a permission check for setfacl · 2f26afba

Shi Weihua authored May 18, 2010

On btrfs, do the following
------------------
# su user1
# cd btrfs-part/
# touch aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rw-
  group::rw-
  other::r--
# su user2
# cd btrfs-part/
# setfacl -m u::rwx aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rwx           <- successed to setfacl
  group::rw-
  other::r--
------------------
but we should prohibit it that user2 changing user1's acl.
In fact, on ext3 and other fs, a message occurs:
  setfacl: aaa: Operation not permitted

This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2f26afba

Btrfs: btrfs_lookup_dir_item() can return ERR_PTR · cf1e99a4

Dan Carpenter authored May 29, 2010

btrfs_lookup_dir_item() can return either ERR_PTRs or null.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

cf1e99a4

Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs · 3140c9a3

Dan Carpenter authored May 29, 2010

btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a
check for that.  It's not clear to me if it can also return NULL
pointers or not so I left the original NULL pointer check as is.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3140c9a3

Btrfs: unwind after btrfs_start_transaction() errors · d327099a

Dan Carpenter authored May 29, 2010

This was added by a22285a6: "Btrfs: Integrate metadata reservation
with start_transaction".  If we goto out here then we skip all the
unwinding and there are locks still held etc.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d327099a

Btrfs: btrfs_iget() returns ERR_PTR · 4cbd1149

Dan Carpenter authored May 29, 2010

btrfs_iget() returns an ERR_PTR() on failure and not null.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4cbd1149

Btrfs: handle kzalloc() failure in open_ctree() · 676e4c86

Dan Carpenter authored May 29, 2010

Unwind and return -ENOMEM if the allocation fails here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

676e4c86

Btrfs: handle error returns from btrfs_lookup_dir_item() · fb4f6f91

Dan Carpenter authored May 29, 2010

If btrfs_lookup_dir_item() fails, we should can just let the mount fail
with an error.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fb4f6f91

Btrfs: Fix BUG_ON for fs converted from extN · 3bf84a5a

Yan, Zheng authored May 31, 2010

Tree blocks can live in data block groups in FS converted from extN.
So it's easy to trigger the BUG_ON.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3bf84a5a

Btrfs: Fix null dereference in relocation.c · 046f264f

Yan, Zheng authored May 31, 2010

Fix a potential null dereference in relocation.c
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

046f264f

Btrfs: fix remap_file_pages error · 058a457e

Miao Xie authored May 20, 2010

when we use remap_file_pages() to remap a file, remap_file_pages always return
error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

058a457e

Btrfs: uninitialized data is check_path_shared() · 0e4dcbef

Dan Carpenter authored Jun 01, 2010

refs can be used with uninitialized data if btrfs_lookup_extent_info()
fails on the first pass through the loop.  In the original code if that
happens then check_path_shared() probably returns 1, this patch
changes it to return 1 for safety.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0e4dcbef

Btrfs: fix fallocate regression · 83609779

Josef Bacik authored Jun 07, 2010

Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we
dropped passing the mode to the function that actually does the preallocation.
This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

83609779

Btrfs: fix loop device on top of btrfs · 4a001071

Miao Xie authored Jun 07, 2010

We cannot use the loop device which has been connected to a file in the btrf

The reproduce steps is following:
 # dd if=/dev/zero of=vdev0 bs=1M count=1024
 # losetup /dev/loop0 vdev0
 # mkfs.btrfs /dev/loop0
 ...
 failed to zero device start -5

The reason is that the btrfs don't implement either ->write_begin or ->write
the VFS API, so we fix it by setting ->write to do_sync_write().
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4a001071

27 May, 2010 5 commits

Btrfs: add more error checking to btrfs_dirty_inode · 9aeead73

Chris Mason authored May 27, 2010

The ENOSPC code will now return ENOSPC to btrfs_start_transaction.
btrfs_dirty_inode needs to check for this and error out appropriately.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9aeead73

Btrfs: allow unaligned DIO · 5a5f79b5

Chris Mason authored May 26, 2010

In order to support DIO that isn't aligned to the filesystem blocksize,
we fall back to buffered for any unaligned DIOs.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5a5f79b5

Btrfs: drop verbose enospc printk · 933b585f
Chris Mason authored May 26, 2010
```
Less printk is good printk.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
933b585f

Btrfs: Fix block generation verification race · 5bdd3536

Yan, Zheng authored May 26, 2010

After the path is released, the generation number got from block
pointer is no long valid. The race may cause disk corruption, because
verify_parent_transid() calls clear_extent_buffer_uptodate() when
generation numbers mismatch.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5bdd3536

Btrfs: fix preallocation and nodatacow checks in O_DIRECT · 46bfbb5c

Chris Mason authored May 26, 2010

The O_DIRECT code wasn't checking for multiple references
on preallocated or nodatacow extents.  This means it
wasn't honoring snapshots properly.

The fix here is to add an explicit check for multiple references
This also fixes the math for selecting the correct disk block,
making sure not to go past the end of the extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

46bfbb5c

26 May, 2010 3 commits

Btrfs: avoid ENOSPC errors in btrfs_dirty_inode · 94b60442

Chris Mason authored May 26, 2010

btrfs_dirty_inode tries to sneak in without much waiting or
space reservation, mostly for performance reasons.  This
usually works well but can cause problems when there are
many many writers.

When btrfs_update_inode fails with ENOSPC, we fallback
to a slower btrfs_start_transaction call that will reserve
some space.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

94b60442

Btrfs: move O_DIRECT space reservation to btrfs_direct_IO · 3f7c579c

Chris Mason authored May 26, 2010

This moves the delalloc space reservation done for O_DIRECT
into btrfs_direct_IO.  This way we don't leak reserved space
if the generic O_DIRECT write code errors out before it
calls into btrfs_direct_IO.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3f7c579c

Btrfs: rework O_DIRECT enospc handling · 4845e44f

Chris Mason authored May 25, 2010

This changes O_DIRECT write code to mark extents as delalloc
while it is processing them.  Yan Zheng has reworked the
enospc accounting based on tracking delalloc extents and
this makes it much easier to track enospc in the O_DIRECT code.

There are a few space cases with the O_DIRECT code though,
it only sets the EXTENT_DELALLOC bits, instead of doing
EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
we don't want to mess with clearing the dirty and uptodate
bits when things go wrong.  This is important because there
are no pages in the page cache, so any extent state structs
that we put in the tree won't get freed by releasepage.  We have
to clear them ourselves as the DIO ends.

With this commit, we reserve space at in btrfs_file_aio_write,
and then as each btrfs_direct_IO call progresses it sets
EXTENT_DELALLOC on the range.

btrfs_get_blocks_direct is responsible for clearing the delalloc
at the same time it drops the extent lock.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4845e44f

25 May, 2010 17 commits

Btrfs: use async helpers for DIO write checksumming · eaf25d93

Chris Mason authored May 25, 2010

The async helper threads offload crc work onto all the
CPUs, and make streaming writes much faster.  This
changes the O_DIRECT write code to use them.  The only
small complication was that we need to pass in the
logical offset in the file for each bio, because we can't
find it in the bio's pages.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

eaf25d93

Btrfs: don't walk around with task->state != TASK_RUNNING · ed3b3d31

Chris Mason authored May 25, 2010

Yan Zheng noticed two places we were doing a lot of work
without task->state set to TASK_RUNNING.  This sets the state
properly after we get ready to sleep but decide not to.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ed3b3d31

Btrfs: do aio_write instead of write · 11c65dcc

Josef Bacik authored May 23, 2010

In order for AIO to work, we need to implement aio_write. This patch converts
our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and
nothing broke, and the AIO stuff magically started working. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

11c65dcc

Btrfs: add basic DIO read/write support · 4b46fce2

Josef Bacik authored May 23, 2010

This provides basic DIO support for reading and writing.  It does not do the
work to recover from mismatching checksums, that will come later.  A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code.  Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents.  Jim's code did
it's own buffering to make dio with compressed extents work.  Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great.  This patch depends on my
dio and filemap.c patches to work.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4b46fce2

direct-io: do not merge logically non-contiguous requests · c2c6ca41

Josef Bacik authored May 23, 2010

Btrfs cannot handle having logically non-contiguous requests submitted.  For
example if you have

Logical:  [0-4095][HOLE][8192-12287]
Physical: [0-4095]      [4096-8191]

Normally the DIO code would put these into the same BIO's.  The problem is we
need to know exactly what offset is associated with what BIO so we can do our
checksumming and unlocking properly, so putting them in the same BIO doesn't
work.  So add another check where we submit the current BIO if the physical
blocks are not contigous OR the logical blocks are not contiguous.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

c2c6ca41

direct-io: add a hook for the fs to provide its own submit_bio function · facd07b0

Josef Bacik authored May 23, 2010

Because BTRFS can do RAID and such, we need our own submit hook so we can setup
the bio's in the correct fashion, and handle checksum errors properly.  So there
are a few changes here

1) The submit_io hook.  This is straightforward, just call this instead of
submit_bio.

2) Allow the fs to return -ENOTBLK for reads.  Usually this has only worked for
writes, since writes can fallback onto buffered IO.  But BTRFS needs the option
of falling back on buffered IO if it encounters a compressed extent, since we
need to read the entire extent in and decompress it.  So if we get -ENOTBLK back
from get_block we'll return back and fallback on buffered just like the write
case.

I've tested these changes with fsx and everything seems to work.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

facd07b0

fs: allow short direct-io reads to be completed via buffered IO · 66f998f6

Josef Bacik authored May 23, 2010

This is similar to what already happens in the write case. If we have a short
read while doing O_DIRECT, instead of just returning, fallthrough and try to
read the rest via buffered IO. BTRFS needs this because if we encounter a
compressed or inline extent during DIO, we need to fallback on buffered. If the
extent is compressed we need to read the entire thing into memory and
de-compress it into the users pages. I have tested this with fsx and everything
works great. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

66f998f6

Btrfs: Metadata ENOSPC handling for balance · 3fd0a558