Commits · 04e6863b19c72279bcbeffa26d85d649ab9c8205 · Kirill Smelkov / linux

29 Apr, 2019 40 commits

btrfs: split btrfs_setxattr calls regarding transaction · 04e6863b

Anand Jain authored Apr 12, 2019

When the caller has already created the transaction handle,
btrfs_setxattr() will use it. Also adds assert in btrfs_setxattr().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

04e6863b

btrfs: remove redundant readonly root check in btrfs_setxattr_trans · 353c2ea7

Anand Jain authored Apr 12, 2019

btrfs_setxattr_trans() is called by 5 functions as below and all of them
do updates. None of them would be roun on a read-only root.
So its ok to remove the readonly root check here as it's a high-level
conditon.

1.
  __btrfs_set_acl()
    btrfs_init_acl()
      btrfs_init_inode_security()

2.
  __btrfs_set_acl()
    btrfs_set_acl()

3.
  btrfs_set_prop()
    btrfs_set_prop_trans()
      /                       \
      btrfs_ioctl_setflags()   btrfs_xattr_handler_set_prop()

4.
  btrfs_xattr_handler_set()

5.
  btrfs_initxattrs()
    btrfs_xattr_security_init()
      btrfs_init_inode_security()
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

353c2ea7

btrfs: export btrfs_setxattr · 3e125a74

Anand Jain authored Apr 12, 2019

Preparatory patch, as we are going split the calls with and without
transaction to use the respective btrfs_setxattr() and
btrfs_setxattr_trans() functions. Export btrfs_setxattr() for calls
outside of xattr.c.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

3e125a74

btrfs: rename do_setxattr to btrfs_setxattr · 2d74fa3e

Anand Jain authored Apr 12, 2019

When trans is not NULL btrfs_setxattr() calls do_setxattr() directly
with a check for readonly root. Rename do_setxattr() btrfs_setxattr() in
preparation to call do_setxattr() directly instead.  Preparatory patch,
no functional changes.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2d74fa3e

btrfs: rename btrfs_setxattr to btrfs_setxattr_trans · cac237ae

Anand Jain authored Apr 12, 2019

Rename btrfs_setxattr() to btrfs_setxattr_trans(), so that do_setxattr()
can be renamed to btrfs_setxattr().
Preparatory patch, no functional changes.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

cac237ae

btrfs: trace: Introduce trace events for all btrfs tree locking events · 31aab402

Qu Wenruo authored Apr 15, 2019

Unlike btrfs_tree_lock() and btrfs_tree_read_lock(), the remaining
functions in locking.c will not sleep, thus doesn't make much sense to
record their execution time.

Those events are introduced mainly for user space tool to audit and
detect lock leakage or dead lock.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

31aab402

btrfs: trace: Introduce trace events for sleepable tree lock · 34e73cc9

Qu Wenruo authored Apr 15, 2019

There are two tree lock events which can sleep:
- btrfs_tree_read_lock()
- btrfs_tree_lock()

Sometimes we may need to look into the concurrency picture of the fs.
For that case, we need the execution time of above two functions and the
owner of @eb.

Here we introduce a trace events for user space tools like bcc, to get
the execution time of above two functions, and get detailed owner info
where eBPF code can't.

All the overhead is hidden behind the trace events, so if events are not
enabled, there is no overhead.

These trace events also output bytenr and generation, allow them to be
pared with unlock events to pin down deadlock.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

34e73cc9

Btrfs: remove no longer used member num_dirty_bgs from transaction · 74f657d8

Filipe Manana authored Apr 15, 2019

The member num_dirty_bgs of struct btrfs_transaction is not used anymore,
it is set and incremented but nothing reads its value anymore. Its last
read use was removed by commit 64403612 ("btrfs: rework
btrfs_check_space_for_delayed_refs"). So just remove that member.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

74f657d8

btrfs: get fs_info from trans in btrfs_run_dev_replace · 2b584c68

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

2b584c68

btrfs: get fs_info from trans in btrfs_run_dev_stats · 196c9d8d

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

196c9d8d

btrfs: get fs_info from trans in btrfs_finish_sprout · 5c466629

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

5c466629

btrfs: get fs_info from trans in init_first_rw_device · 6f8e0fc7

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

6f8e0fc7

btrfs: get fs_info from trans in copy_for_split · 94f94ad9

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

94f94ad9

btrfs: get fs_info from trans in insert_ptr · 6ad3cf6d

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

6ad3cf6d

btrfs: get fs_info from trans in balance_node_right · 55d32ed8

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

55d32ed8

btrfs: get fs_info from trans in push_node_left · d30a668f

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

d30a668f

btrfs: get fs_info from trans in btrfs_write_out_cache · fe041534

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

fe041534

btrfs: get fs_info from trans in create_free_space_inode · 4ca75f1b

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

4ca75f1b

btrfs: get fs_info from trans in btrfs_set_log_full_commit · 90787766

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

90787766

btrfs: get fs_info from trans in btrfs_need_log_full_commit · 4884b8e8

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

4884b8e8

btrfs: get fs_info from trans in btrfs_create_tree · 9b7a2440

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

9b7a2440

btrfs: get fs_info from trans in update_block_group · 6b279408

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

6b279408

btrfs: get fs_info from trans in btrfs_write_dirty_block_groups · 5742d15f

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

5742d15f

btrfs: get fs_info from trans in btrfs_setup_space_cache · bbebb3e0

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

bbebb3e0

btrfs: get fs_info from trans in write_one_cache_group · 39db232d

David Sterba authored Mar 20, 2019

We can read fs_info from the transaction and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

39db232d

btrfs: Remove redundant inode argument from btrfs_add_ordered_sum · f9756261

Nikolay Borisov authored Apr 10, 2019

Ordered csums are keyed off of a btrfs_ordered_extent, which already has
a reference to the inode. This implies that an explicit inode argument
is redundant. So remove it.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f9756261

btrfs: Do mandatory tree block check before submitting bio · 8d47a0d8

Qu Wenruo authored Apr 04, 2019

There are at least 2 reports about a memory bit flip sneaking into
on-disk data.

Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help users to
detect such problem.

This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.

The design points are:

- Timing of the check: Tree block write hook
  This timing is chosen to reduce the overhead.
  The comprehensive check should be as expensive as a checksum
  calculation.
  Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
  user.

- Loose empty leaf check
  Originally for an empty leaf, tree-checker will report error if it's
  not a tree root.

  The problem for such check at write time is:
  * False alert for tree root created in current transaction
    In that case, the commit root still needs to be written to disk.
    And since current root can differ from commit root, then it will
    cause false alert.
    This happens for log tree.

  * False alert for relocated tree block
    Relocated tree block can be written to disk due to memory pressure,
    in that case an empty csum tree root can be written to disk and
    cause false alert, since csum root node hasn't been updated.

  Previous patch of removing comprehensive empty leaf owner check has
  paved the way for this patch.

The example error output will be something like:

  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3): forced readonly
  BTRFS warning (device dm-3): Skipping commit of aborted transaction.
  BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
  BTRFS info (device dm-3): delayed_refs has NO entry
Reported-by: Leonard Lausen <leonard@lausen.nl>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

8d47a0d8

btrfs: tree-checker: Remove comprehensive root owner check · ff2ac107

Qu Wenruo authored Apr 04, 2019

Commit 1ba98d08 ("Btrfs: detect corruption when non-root leaf has
zero item") introduced comprehensive root owner checker.

However it's pretty expensive tree search to locate the owner root,
especially when it get reused by mandatory read and write time
tree-checker.

This patch will remove that check, and completely rely on owner based
empty leaf check, which is much faster and still works fine for most
case.

And since we skip the old root owner check, now write time tree check
can be merged with btrfs_check_leaf_full().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ff2ac107

Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve · 39ad3173

Robbie Ko authored Mar 26, 2019

When doing fallocate, we first add the range to the reserve_list and
then reserve the quota.  If quota reservation fails, we'll release all
reserved parts of reserve_list.

However, cur_offset is not updated to indicate that this range is
already been inserted into the list.  Therefore, the same range is freed
twice.  Once at list_for_each_entry loop, and once at the end of the
function.  This will result in WARN_ON on bytes_may_use when we free the
remaining space.

At the end, under the 'out' label we have a call to:

   btrfs_free_reserved_data_space(inode, data_reserved, alloc_start, alloc_end - cur_offset);

The start offset, third argument, should be cur_offset.

Everything from alloc_start to cur_offset was freed by the
list_for_each_entry_safe_loop.

Fixes: 18513091 ("btrfs: update btrfs_space_info's bytes_may_use timely")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>

39ad3173

btrfs: get fs_info from eb in read_one_dev · 17850759

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

17850759

btrfs: get fs_info from eb in read_one_chunk · 9690ac09

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

9690ac09

btrfs: get fs_info from eb in btrfs_check_chunk_valid · ddaf1d5a

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

ddaf1d5a

btrfs: get fs_info from eb in should_balance_chunk · 6ec0896c

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

6ec0896c

btrfs: get fs_info from eb in btrfs_check_node · 813fd1dc

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

813fd1dc

btrfs: get fs_info from eb in btrfs_check_leaf_relaxed · cfdaad5e

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

cfdaad5e

btrfs: get fs_info from eb in btrfs_check_leaf_full · 1c4360ee

David Sterba authored Mar 20, 2019

We can read fs_info from extent buffer and can drop it from the
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>

1c4360ee

btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit · 929be17a

Nikolay Borisov authored Mar 27, 2019

Instead of always calling the allocator to search for a free extent,
that satisfies the input criteria, switch btrfs_trim_free_extents to
using find_first_clear_extent_bit. With this change it's no longer
necessary to read the device tree in order to figure out holes in
the devices.

Now the code always searches in-memory data structure to figure out the
space range which contains the requested which should result in speed
improvements.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

929be17a

btrfs: Implement find_first_clear_extent_bit · 45bfcfc1

Nikolay Borisov authored Mar 27, 2019

This function is very similar to find_first_extent_bit except that it
locates the first contiguous span of space which does not have bits set.
It's intended use is in the freespace trimming code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

45bfcfc1

btrfs: Optimize unallocated chunks discard · 8811133d

Nikolay Borisov authored Mar 27, 2019

Currently unallocated chunks are always trimmed. For example
2 consecutive trims on large storage would trim freespace twice
irrespective of whether the space was actually allocated or not between
those trims.

Optimise this behavior by exploiting the newly introduced alloc_state
tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark
those unallocated chunks which have been trimmed and have not been
allocated afterwards. On chunk allocation the respective underlying devices'
physical space will have its CHUNK_TRIMMED flag cleared. This avoids
submitting discards for space which hasn't been changed since the last
time discard was issued.

This applies to the single mount period of the filesystem as the
information is not stored permanently.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

8811133d

btrfs: Factor out in_range macro · e74e3993

Nikolay Borisov authored Mar 27, 2019

This is used in more than one places so let's factor it out in ctree.h.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e74e3993