Commits · adf0212396e3af238e25e7c54ecb2959f19def24 · nexedi / linux

19 Jun, 2017 40 commits

btrfs: adjust includes after vmalloc removal · adf02123

David Sterba authored May 31, 2017

As we don't use vmalloc/vzalloc/vfree directly in ctree.c, we can now
use the proper header that defines kvmalloc.
Signed-off-by: David Sterba <dsterba@suse.com>

adf02123

btrfs: use GFP_KERNEL in init_ipath · f54de068

David Sterba authored May 31, 2017

Now that init_ipath is called either from a safe context or with
memalloc_nofs protection, we can switch to GFP_KERNEL allocations in
init_path and init_data_container.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f54de068

btrfs: scrub: add memalloc_nofs protection around init_ipath · de2491fd

David Sterba authored May 31, 2017

init_ipath is called from a safe ioctl context and from scrub when
printing an error.  The protection is added for three reasons:

* init_data_container calls vmalloc and this does not work as expected
  in the GFP_NOFS context, so this silently does GFP_KERNEL and might
  deadlock in some cases
* keep the context constraint of GFP_NOFS, used by scrub
* we want to use GFP_KERNEL unconditionally inside init_ipath or its
  callees
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

de2491fd

btrfs: send: use kvmalloc in iterate_dir_item · f11f7441

David Sterba authored May 31, 2017

We use a growing buffer for xattrs larger than a page size, at some
point vmalloc is unconditionally used for larger buffers. We can still
try to avoid it using the kvmalloc helper.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f11f7441

btrfs: replace opencoded kvzalloc with the helper · 818e010b

David Sterba authored May 31, 2017

The logic of kmalloc and vmalloc fallback is opencoded in
several places, we can now use the existing helper.
Signed-off-by: David Sterba <dsterba@suse.com>

818e010b

Btrfs: lzo: compressed data size must be less then input size · 1e9d7291

Timofey Titovets authored May 30, 2017

Logic already skips if compression makes data bigger, let's sync lzo
with zlib and also return error if compressed size is equal to
input size.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>

1e9d7291

btrfs: simplify code with bio_io_error · 054ec2f6

Guoqing Jiang authored Jun 02, 2017

bio_io_error was introduced in the commit 4246a0b6
("block: add a bi_error field to struct bio"), so use it to simplify
code.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

054ec2f6

Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps · 25ff17e8

Omar Sandoval authored Jun 05, 2017

First, instead of open-coding the vmalloc() fallback, use the new
kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of
GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which
could lead to deadlocks.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

25ff17e8

btrfs: use generic slab for for btrfs_transaction · 4b5faeac

David Sterba authored Mar 28, 2017

Observing the number of slab objects of btrfs_transaction, there's just
one active on an almost quiescent filesystem, and the number of objects
goes to about ten when sync is in progress. Then the nubmer goes down to
1.  This matches the expectations of the transaction lifetime.

For such use the separate slab cache is not justified, as we do not
reuse objects frequently. For the shortlived transaction, the generic
slab (size 512) should be ok. We can optimistically expect that the 512
slabs are not all used (fragmentation) and there are free slots to take
when we do the allocation, compared to potentially allocating a whole new
page for the separate slab.

We'll lose the stats about the object use, which could be added later if
we really need them.
Signed-off-by: David Sterba <dsterba@suse.com>

4b5faeac

btrfs: scrub: embed scrub_wr_ctx into scrub context · 3fb99303

David Sterba authored May 16, 2017

The structure scrub_wr_ctx is not used anywhere just the scrub context,
we can move the members there. The tgtdev is renamed so it's more clear
that it belongs to the "wr" part.
Signed-off-by: David Sterba <dsterba@suse.com>

3fb99303

btrfs: scrub: use fs_info::sectorsize and drop it from scrub context · 25cc1226

David Sterba authored May 16, 2017

As we now have the node/block sizes in fs_info, we can use them and can
drop the local copies.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

25cc1226

Btrfs: add statx support · 04a87e34

Yonghong Song authored May 12, 2017

Return enhanced file attributes from the btrfs, including:
  (1). inode creation time as stx_btime, and
  (2). Certain BTRFS_INODE_xxx flags are mapped to stx_attributes flags.

Example output:
	[root@localhost ~]# cat t.sh
	touch t
	chattr +aic t
	~/linux/samples/statx/test-statx t
	chattr -aic t
	touch t
	echo "========================================"
	~/linux/samples/statx/test-statx t
	/bin/rm t
	[root@localhost ~]# ./t.sh
	statx(t) = 0
	results=fff
  	  Size: 0               Blocks: 0          IO Block: 4096    regular file
	Device: 00:1c           Inode: 63962       Links: 1
	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
	Access: 2017-05-11 16:03:13.999856591-0700
	Modify: 2017-05-11 16:03:13.999856591-0700
	Change: 2017-05-11 16:03:14.000856663-0700
 	 Birth: 2017-05-11 16:03:13.999856591-0700
	Attributes: 0000000000000034 (........ ........ ........ ........ ........ ........ ........ .-ai.c..)
	========================================
	statx(t) = 0
	results=fff
	  Size: 0               Blocks: 0          IO Block: 4096    regular file
	Device: 00:1c           Inode: 63962       Links: 1
	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
	Access: 2017-05-11 16:03:14.006857097-0700
	Modify: 2017-05-11 16:03:14.006857097-0700
	Change: 2017-05-11 16:03:14.006857097-0700
 	Birth: 2017-05-11 16:03:13.999856591-0700
	Attributes: 0000000000000000 (........ ........ ........ ........ ........ ........ ........ .---.-..)
	[root@localhost ~]#
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

04a87e34

Btrfs: lzo: fix typo in error message after failed deflate · 036b0217

Timofey Titovets authored May 25, 2017

Fix copy paste typo in debug message for lzo.c, lzo is not deflate.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

036b0217

btrfs: btrfs_wait_tree_block_writeback can be void return · 3189ff77

Jeff Layton authored May 25, 2017

Nothing checks its return value.

Is it safe to skip checking return value of btrfs_wait_tree_block_writeback?

Liu Bo: I think yes, it's used in walk_log_tree which is called in two
places, free_log_tree and log replay.  For free_log_tree, it waits for
any running writeback of the extent buffer under freeing to finish in
case we need to access the eb pointer from page->private, and it's OK to
not check the return value, while for log replay, it's doesn't wait
because wc->wait is not set. So neither cares about the writeback error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
[ added more explanation to changelog, from Liu Bo ]
Signed-off-by: David Sterba <dsterba@suse.com>

3189ff77

btrfs: remove __BTRFS_LEAF_DATA_SIZE · 118c701e

Nikolay Borisov authored May 22, 2017

__BTRFS_LAF_DATA_SIZE is used only by BTRFS_LEAF_DATA_SIZE. Make the
latter subsume the former.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

118c701e

btrfs: rename btrfs_leaf_data to BTRFS_LEAF_DATA_OFFSET · 3d9ec8c4

Nikolay Borisov authored May 29, 2017

Commit 5f39d397 ("Btrfs: Create extent_buffer interface
for large blocksizes") refactored btrfs_leaf_data function to take
extent_buffer rather than struct btrfs_leaf. However, as it turns out the
parameter being passed is never used. Furthermore this function no longer
returns the leaf data but rather the offset to it. So rename the function
to BTRFS_LEAF_DATA_OFFSET to make it consistent with other BTRFS_LEAF_*
helpers and turn it into a macro.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ removed () from the macro ]
Signed-off-by: David Sterba <dsterba@suse.com>

3d9ec8c4

btrfs: reduce arguments for decompress_bio ops · e1ddce71

Anand Jain authored May 26, 2017

struct compressed_bio pointer can be used instead.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e1ddce71

btrfs: btrfs_decompress_bio() could accept compressed_bio instead · 8140dc30

Anand Jain authored May 26, 2017

Instead of sending each argument of struct compressed_bio, send
the compressed_bio itself.

Also by having struct compressed_bio in btrfs_decompress_bio()
it would help tracing.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

8140dc30

btrfs: Refactor update_space_info · d2006e6d

Nikolay Borisov authored May 22, 2017

Following the factoring out of the creation code udpate_space_info can
only be called for already-existing space_info structs. As such it
cannot fail.  Remove superfluous error handling and make the function
return void.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

d2006e6d

btrfs: Separate space_info create/update · 2be12ef7

Nikolay Borisov authored May 22, 2017

Currently the struct space_info creation code is intermixed in the
udpate_space_info function. There are well-defined points at which the
we actually want to create brand-new space_info structs (e.g. during
mount of the filesystem as well as sometimes when adding/initialising
new chunks). In such cases update_space_info is called with 0 as the
bytes parameter. All of this makes for spaghetti code.

Fix it by factoring out the creation code in a separate
create_space_info structure. This also allows to simplify the internals.
Also remove BUG_ON from do_alloc_chunk since the callers handle errors.
Furthermore it will make the update_space_info function not fail,
allowing us to remove error handling in callers. This will come in a
follow up patch.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2be12ef7

Btrfs: let btrfs_print_leaf print more about block group · 555ba411

Liu Bo authored May 25, 2017

This adds chunk_objectid and flags, with flags we can recognize whether
the block group is about data or metadata.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

555ba411

Btrfs: skip commit transaction if we don't have enough pinned bytes · 28785f70

Liu Bo authored May 19, 2017

We commit transaction in order to reclaim space from pinned bytes because
it could process delayed refs, and in may_commit_transaction(), we check
first if pinned bytes are enough for the required space, we then check if
that plus bytes reserved for delayed insert are enough for the required
space.

This changes the code to the above logic.

Fixes: b150a4f1 ("Btrfs: use a percpu to keep track of possibly pinned bytes")
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

28785f70

btrfs: scrub: simplify cleanup of wr_ctx in scrub_free_ctx · 4e2814ef

David Sterba authored May 16, 2017

We don't need to take the mutex and zero out wr_cur_bio, as this is
called after the scrub finished.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4e2814ef

btrfs: scrub: inline helper scrub_free_wr_ctx · e241ddeb

David Sterba authored May 16, 2017

The helper scrub_free_wr_ctx is used only once and fits into
scrub_free_ctx as it continues sctx shutdown, no need to keep it
separate.
Signed-off-by: David Sterba <dsterba@suse.com>

e241ddeb

btrfs: scrub: inline helper scrub_setup_wr_ctx · 8fcdac3f

David Sterba authored May 16, 2017

The helper scrub_setup_wr_ctx is used only once and fits into
scrub_setup_ctx as it continues intialization, no need to keep it
separate.
Signed-off-by: David Sterba <dsterba@suse.com>

8fcdac3f

btrfs: remove root usage from can_overcommit · c1c4919b

Jeff Mahoney authored May 17, 2017

can_overcommit using the root to determine the allocation profile
is the only use of a root in the call graph below reserve_metadata_bytes.

It turns out that we only need to know whether the allocation is for
the chunk root or not -- and we can pass that around as a bool instead.

This allows us to pull root usage out of the reservation path all the
way up to reserve_metadata_bytes itself, which uses it only to compare
against fs_info->chunk_root to set the bool.  In turn, this eliminates
a bunch of races where we use a particular root too early in the mount
process.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

c1c4919b

btrfs: cleanup root usage by btrfs_get_alloc_profile · 1b86826d

Jeff Mahoney authored May 17, 2017

There are two places where we don't already know what kind of alloc
profile we need before calling btrfs_get_alloc_profile, but we need
access to a root everywhere we call it.

This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile()
and relegates btrfs_system_alloc_profile to a static for use in those
two cases.  The next patch will eliminate one of those.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

1b86826d

btrfs: fix bool type in btrfs_page_exists_in_range · e03733da

David Sterba authored May 12, 2017

We use only a simple bool indicator, int is not a problem here.
Signed-off-by: David Sterba <dsterba@suse.com>

e03733da

btrfs: remove unused member list from btrfs_end_io_wq · c9fed2bb

David Sterba authored Apr 13, 2017

The end io work queue items have been tracked by the work queues since
"Btrfs: Add async worker threads for pre and post IO checksumming"
(8b712842) (2008).
Signed-off-by: David Sterba <dsterba@suse.com>

c9fed2bb

btrfs: remove unused members dir_path from recorded_ref · ee4ea698

David Sterba authored Apr 13, 2017

The two members do not seem to be used since the initial commit.
Signed-off-by: David Sterba <dsterba@suse.com>

ee4ea698

btrfs: remove unused member list from async_submit_bio · b297c9f6

David Sterba authored Apr 13, 2017

The list used to track checksums in the early version (2.6.29), but I
was able not pinpoint the commit that stopped using it. Everything
apparently works without it for a long time.
Signed-off-by: David Sterba <dsterba@suse.com>

b297c9f6

btrfs: remove unused member err from reada_extent · 106204f1

David Sterba authored Apr 13, 2017

Seems to be unused since the initial commit, we ignore readahead errors
anyway, the full read will handle that if necessary.
Signed-off-by: David Sterba <dsterba@suse.com>

106204f1

btrfs: Remove unnecessary branching in free-space-tree.c · 0bef7109

Sahil Kang authored May 17, 2017

Both btrfs_create_free_space_tree and btrfs_clear_free_space_tree
contain:

  if (ret)
          return ret;

  return 0;

The if statement is only false when ret equals zero, and since we return
zero in such cases, we can safely remove the branching.
Signed-off-by: Sahil Kang <sahil.kang@asilaycomputing.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0bef7109

Btrfs: hardcode GFP_NOFS for btrfs_bio_clone_partial · e477094f

Liu Bo authored May 16, 2017

We only pass GFP_NOFS to btrfs_bio_clone_partial, so lets hardcode it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

e477094f

Btrfs: work around maybe-uninitialized warning · 3c91ee69

Arnd Bergmann authored May 18, 2017

A rewrite of btrfs_submit_direct_hook appears to have introduced a warning:

fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook':
fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Where the 'bio' variable was previously initialized unconditionally, it
is now set in the "while (submit_len > 0)" loop that would never execute
if submit_len is zero.

Assuming this cannot happen in practice, we can avoid the warning
by simply replacing the while{} loop with a do{}while() loop so
the compiler knows that it will always be entered at least once.

Fixes changes introduced in "Btrfs: use bio_clone_bioset_partial to
simplify DIO submit".
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Sterba <dsterba@suse.com>

3c91ee69

Btrfs: unify naming of btrfs_io_bio · 3892ac90

Liu Bo authored Apr 17, 2017

All dio endio functions are using io_bio for struct btrfs_io_bio, this
makes btrfs_submit_direct to follow this convention.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

3892ac90

Btrfs: check-integrity use bvec_iter · 11b56165

Liu Bo authored Apr 14, 2017

Some check-integrity code depends on bio->bi_vcnt, this changes it to use
bio segments because some bios passing here may not have a reliable
bi_vcnt.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

11b56165

Btrfs: record error if one block has failed to retry · 629ebf4f

Liu Bo authored May 15, 2017

In the nocsum case of dio read endio, it returns immediately if an error
gets returned when repairing, which leaves the rest blocks unrepaired. The
behavior is different from how buffered read endio works in the same case.
This changes it to record error only and go on repairing the rest blocks.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

629ebf4f

Btrfs: change how we iterate bios in endio · 17347cec

Liu Bo authored May 15, 2017

Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio read's
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.

Also for dio reads which don't get split, we also need to save a copy of
bio iterator in btrfs_bio_clone to let __bio_for_each_segments to access
each bvec in dio read's endio. Note that it doesn't affect other calls of
btrfs_bio_clone() because they don't need to use this iterator.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

17347cec

Btrfs: use bio_clone_bioset_partial to simplify DIO submit · 725130ba

Liu Bo authored May 16, 2017

Currently when mapping bio to limit bio to a single stripe length, we
split bio by adding page to bio one by one, but later we don't modify
the vector of bio at all, thus we can use bio_clone_fast to use the
original bio vector directly.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

725130ba