Commits · 131ce4367a8f37c6609148117a051d86cd55a5d9 · Kirill Smelkov / linux

16 Aug, 2017 40 commits

btrfs: account that we're waiting for IO in scrub_submit_raid56_bio_wait · 131ce436

David Sterba authored Jul 19, 2017

Correctly account for IO when waiting for a submitted bio in scrub. This
only for the accounting purposes and should not change other behaviour.
Signed-off-by: David Sterba <dsterba@suse.com>

131ce436

btrfs: account that we're waiting for DIO read · 9c17f6cd

David Sterba authored Jul 19, 2017

Correctly account for IO when waiting for a submitted DIO read, the case
when we're retrying.  This only for the accounting purposes and should
not change other behaviour.
Signed-off-by: David Sterba <dsterba@suse.com>

9c17f6cd

btrfs: drop chunk locks at the end of close_ctree · 4958aa68

David Sterba authored Jun 22, 2017

The pinned chunks might be left over so we clean them but at this point
of close_ctree, there's noone to race with, the locking can be removed.
Signed-off-by: David Sterba <dsterba@suse.com>

4958aa68

btrfs: remove trivial wrapper btrfs_force_ra · d3c0bab5

David Sterba authored Jun 22, 2017

It's a simple call page_cache_sync_readahead, same arguments in the same
order.
Signed-off-by: David Sterba <dsterba@suse.com>

d3c0bab5

btrfs: drop ancient page flag mappings · 35dc3130

David Sterba authored Jun 22, 2017

There's no PageFsMisc. Added by patch 4881ee5a in 2008, the flag is
not present in current kernels.
Signed-off-by: David Sterba <dsterba@suse.com>

35dc3130

btrfs: fix spelling of snapshotting · ea14b57f
David Sterba authored Jun 22, 2017
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
ea14b57f

btrfs: Make flush_space return void · e38ae7a0

Nikolay Borisov authored Jul 25, 2017

The return value of flush_space was used to have significance in the
early days when the code was first introduced and before the ticketed
enospc rework. Since the latter got introduced the return value lost any
significance whatsoever to its callers. So let's remove it. While at it
also remove the unused ticket variable in
btrfs_async_reclaim_metadata_space. It was used in the initial version
of the ticketed ENOSPC work, however Wang Xiaoguang detected a problem
with this and fixed it in ce129655 ("btrfs: introduce tickets_id to
determine whether asynchronous metadata reclaim work makes progress").
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>

e38ae7a0

btrfs: Deprecate userspace transaction ioctls · 3558d4f8

Nikolay Borisov authored Jul 26, 2017

Userspace transactions were introduced in commit 6bf13c0c ("Btrfs:
transaction ioctls") to provide semantics that Ceph's object store
required. However, things have changed significantly since then, to the
point where btrfs is no longer suitable as a backend for ceph and in
fact it's actively advised against such usages. Considering this, there
doesn't seem to be a widespread, legit use case of userspace
transaction. They also clutter the file->private pointer.

So to end the agony let's nuke the userspace transaction ioctls. As a
first step let's give time for people to voice their objection by just
WARN()ining when the userspace transaction is used.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ move the warning past perm checks, keep the has-been-printed state;
  we're ok with just one warning over all filesystems ]
Signed-off-by: David Sterba <dsterba@suse.com>

3558d4f8

btrfs: use named constant for bdev blocksize · 9f6d2510

David Sterba authored Jun 16, 2017

Superblock is read and written using buffer heads, we need to set the
bdev blocksize. The magic constant has been hardcoded in several places,
so replace it with a named constant.
Signed-off-by: David Sterba <dsterba@suse.com>

9f6d2510

btrfs: split write_dev_supers to two functions · abbb3b8e

David Sterba authored Jun 16, 2017

There are two independent parts, one that writes the superblocks and
another that waits for completion. No functional changes, but cleanups,
reformatting and comment updates.
Signed-off-by: David Sterba <dsterba@suse.com>

abbb3b8e

btrfs: refactor find_device helper · 35c70103

David Sterba authored Jun 15, 2017

Polish the helper:
* drop underscores, no special meaning here
* pass fs_devices, as this is what the API implements
* drop noinline, no apparent reason for such simple helper
* constify uuid
* add comment
Signed-off-by: David Sterba <dsterba@suse.com>

35c70103

btrfs: merge alloc_device helpers · 2dfeca9b

David Sterba authored Jun 14, 2017

There are two helpers called in chain from one location, we can merge the
functionaliy.

Originally, alloc_fs_devices could fill the device uuid randomly if we
we didn't give the uuid buffer. This happens for seed devices but the
fsid is generated in btrfs_prepare_sprout, so we can remove it.
Signed-off-by: David Sterba <dsterba@suse.com>

2dfeca9b

btrfs: merge REQ_OP and REQ_ flags to one parameter in submit_extent_page · 4b81ba48

David Sterba authored Jun 06, 2017

The function submit_extent_page has 15(!) parameters right now, op and
op_flags are effectively one value stored to bio::bi_opf, no need to
pass them separately. So it's 14 parameters now.
Signed-off-by: David Sterba <dsterba@suse.com>

4b81ba48

btrfs: cleanup types storing REQ_* · f1c77c55

David Sterba authored Jun 06, 2017

Unify types of local variables and parameters that store various
REQ_* values to unsigned int.
Signed-off-by: David Sterba <dsterba@suse.com>

f1c77c55

btrfs: get fs_info from eb in btrfs_print_tree, remove argument · abe60ba4
David Sterba authored Jun 29, 2017
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
abe60ba4
btrfs: get fs_info from eb in btrfs_print_leaf, remove argument · a4f78750
David Sterba authored Jun 29, 2017
```
Signed-off-by: David Sterba <dsterba@suse.com>
```
a4f78750

btrfs: simplify btrfs_dev_replace_kthread · f1b8a1e8

David Sterba authored Jun 14, 2017

This function prints an informative message and then continues
dev-replace. The message contains a progress percentage which is read
from the status. The status is allocated dynamically, about 2600 bytes,
just to read the single value. That's an overkill. We'll use the new
helper and drop the allocation.
Signed-off-by: David Sterba <dsterba@suse.com>

f1b8a1e8

btrfs: factor reading progress out of btrfs_dev_replace_status · 74b595fe

David Sterba authored Jun 14, 2017

We'll want to read the percentage value from dev_replace elsewhere, move
the logic to a separate helper.
Signed-off-by: David Sterba <dsterba@suse.com>

74b595fe

btrfs: defrag: make readahead state allocation failure non-fatal · 0a52d108

David Sterba authored Jun 22, 2017

All sorts of readahead errors are not considered fatal. We can continue
defragmentation without it, with some potential slow down, which will
last only for the current inode.
Signed-off-by: David Sterba <dsterba@suse.com>

0a52d108

btrfs: use GFP_KERNEL in btrfs_defrag_file · 63e727ec

David Sterba authored Jun 22, 2017

We can safely use GFP_KERNEL, the function is called from two contexts:

- ioctl handler, called directly, no locks taken
- cleaner thread, running all queued defrag work, outside of any locks
Signed-off-by: David Sterba <dsterba@suse.com>

63e727ec

btrfs: use GFP_KERNEL in mount and remount · 3ec83621

David Sterba authored Jun 22, 2017

We don't need to restrict the allocation flags in btrfs_mount or
_remount. No big filesystem locks are held (possibly s_umount but that
does no count here).
Signed-off-by: David Sterba <dsterba@suse.com>

3ec83621

btrfs: Remove never reached error handling code in __add_reloc_root · e3f3ad12

Nikolay Borisov authored Jul 13, 2017

One of the error handling paths in __add_reloc_root contains btrfs_panic()
followed by some other code. As the name implies what it does is print
some error message and call BUG, naturally what follow afterwards is not
invoked. So remove this extra code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e3f3ad12

btrfs: Remove unused parameters from volume.c functions · e4ff5fb5

Nikolay Borisov authored Jul 19, 2017

This also adjusts the respective callers in other files. Those were
found with -Wunused-parameter.

btrfs_full_stripe_len's mapping_tree - introduced by 53b381b3
("Btrfs: RAID5 and RAID6") but it was never really used even in that
commit

btrfs_is_parity_mirror's mirror_num - same as above

chunk_drange_filter's chunk_offset - introduced by 94e60d5a ("Btrfs:
devid subset filter") and never used.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e4ff5fb5

btrfs: Remove unused variables · 110840bb

Nikolay Borisov authored Jul 19, 2017

clear_super - usage was removed in commit cea67ab9 ("btrfs: clean
the old superblocks before freeing the device") but that change forgot
to remove the actual variable.

max_key - commit 6174d3cb ("Btrfs: remove unused max_key arg from
btrfs_search_forward") removed the max_key parameter but it forgot to
remove references from callers.

stripe_len - this one was added by e06cd3dd ("Btrfs: add validadtion
checks for chunk loading") but even then it wasn't used.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

110840bb

btrfs: Remove find_raid56_stripe_len · 500ceed8

Nikolay Borisov authored Jul 14, 2017

find_raid56_stripe_len statically returns SZ_64K which equals BTRFS_STRIPE_LEN.
It's sole caller is __btrfs_alloc_chunk and it assigns the return value to ai
variable which is already set to BTRFS_STRIPE_LEN. So remove the function
invocation altogether and remove the function itself. Also remove the variable
since it's only aliasing BTRFS_STRIPE_LEN and use the define directly. Use
the occassion to simplify the rounding down of stripe_size now that the value
we want it to align is a power of 2.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

500ceed8

btrfs: Use explicit round_down macro in btrfs resize ioctl handler · 47f08b96

Nikolay Borisov authored Jul 18, 2017

No functional changes, just make the code more self-explanatory.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

47f08b96

btrfs: btrfs_inherit_iflags() can be static · 19aee8de

Anand Jain authored Jul 18, 2017

btrfs_new_inode() is the only consumer move it to inode.c,
from ioctl.c.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

19aee8de

btrfs: Keep one more workspace around · 26b28dce

Nick Terrell authored Jun 29, 2017

find_workspace() allocates up to num_online_cpus() + 1 workspaces.
free_workspace() will only keep num_online_cpus() workspaces. When
(de)compressing we will allocate num_online_cpus() + 1 workspaces, then
free one, and repeat. Instead, we can just keep num_online_cpus() + 1
workspaces around, and never have to allocate/free another workspace in the
common case.

I tested on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. I mounted a
BtrFS partition with -o compress-force={lzo,zlib,zstd} and logged whenever
a workspace was allocated of freed. Then I copied vmlinux (527 MB) to the
partition. Before the patch, during the copy it would allocate and free 5-6
workspaces. After, it only allocated the initial 3. This held true for lzo,
zlib, and zstd. The time it took to execute cp vmlinux /mnt/btrfs && sync
dropped from 1.70s to 1.44s with lzo compression, and from 2.04s to 1.80s
for zstd compression.
Signed-off-by: Nick Terrell <terrelln@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

26b28dce

btrfs: drop newlines from strings when using btrfs_* helpers · 913e1535

David Sterba authored Jul 13, 2017

The helpers append "\n" so we can keep the actual strings shorter. The
extra newline will print an empty line.  Some messages have been
slightly modified to be more consistent with the rest (lowercase first
letter).
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

913e1535

btrfs: qgroups: Fix BUG_ON condition in tree level check · b6e6bca5

Nikolay Borisov authored Jul 12, 2017

The current code was erroneously checking for
root_level > BTRFS_MAX_LEVEL. If we had a root_level of 8 then the check
won't trigger and we could potentially hit a buffer overflow. The
correct check should be root_level >= BTRFS_MAX_LEVEL .
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b6e6bca5

btrfs: Enhance message when a device is missing during mount · c5502451

Qu Wenruo authored Mar 09, 2017

For a missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

This patch will print a new message about the missing device:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

c5502451

btrfs: Cleanup num_tolerated_disk_barrier_failures · bc3cce23

Qu Wenruo authored Mar 09, 2017

As we use per-chunk degradable check, the global
num_tolerated_disk_barrier_failures is of no use.

We can now remove it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

bc3cce23

btrfs: Allow barrier_all_devices to do chunk level device check · d10b82fe

Qu Wenruo authored Jun 27, 2017

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it can be easily changed to the new per-chunk degradable check
framework.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

d10b82fe

btrfs: Do chunk level check for degraded remount · b382cfe8

Qu Wenruo authored Mar 09, 2017

Just the same for mount time check, use btrfs_check_rw_degradable() to
check if we are OK to be remounted rw.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b382cfe8

btrfs: Do chunk level check for degraded rw mount · 4330e183

Qu Wenruo authored Mar 09, 2017

Now use the btrfs_check_rw_degradable() to check if we can mount in the
degraded mode.

With this patch, we can mount in the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only on sdb, so it's OK to mount as
 degraded, as missing one device is OK for RAID1.

But still fail in the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as
 degraded.
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4330e183

btrfs: Introduce a function to check if all chunks a OK for degraded rw mount · 21634a19

Qu Wenruo authored Mar 09, 2017

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copied text from cover letter with more details about the problem being
  solved ]
Signed-off-by: David Sterba <dsterba@suse.com>

21634a19

Btrfs: report errors when checksum is not found · 0d1e0bea

Liu Bo authored Jul 11, 2017

When btrfs fails the checksum check, it'll fill the whole page with
"1".

However, if %csum_expected is 0 (which means there is no checksum), then
for some unknown reason, we just pretend that the read is correct, so
userspace would be confused about the dilemma that read is successful but
getting a page with all content being "1".

This can happen due to a bug in btrfs-convert.

This fixes it by always returning errors if checksum doesn't match.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0d1e0bea

btrfs: Prevent possible ERR_PTR() dereference · 69f03f13

Nikolay Borisov authored Jul 11, 2017

In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which
gets the chunk map for a particular range via get_chunk_map. However,
get_chunk_map can return an ERR_PTR value and while the 2 callers do catch
this with a WARN_ON they then proceed to indiscriminately dereference the
extent map. This of course leads to a crash. Fix the offenders by making the
dereference conditional on IS_ERR.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

69f03f13

btrfs: Remove redundant checks from btrfs_alloc_data_chunk_ondemand · 1174cade

Nikolay Borisov authored Jul 11, 2017

Many commits ago the data space_info in alloc_data_chunk_ondemand used to be
acquired from the inode. At that point commit
33b4d47f ("Btrfs: deal with NULL space info") got introduced to deal with
spurios cases where the space info could be null, following a rebalance.
Nowadays, however, the space info is referenced directly from the btrfs_fs_info
struct which is initialised at filesystem mount time. This makes the null
checks redundant, so remove them.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

1174cade

btrfs: Remove redundant argument of flush_space · 7bdd6277

Nikolay Borisov authored Jul 11, 2017

All callers of flush_space pass the same number for orig/num_bytes
arguments. Let's remove one of the numbers and also modify the trace
point to show only a single number - bytes requested.

Seems that last point where the two parameters were treated differently
is before the ticketed enospc rework.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

7bdd6277