- 17 Apr, 2023 40 commits
-
-
Filipe Manana authored
At space-info.c:__reserve_bytes(), instead of initializing 'ret' to 0 when it's declared and then shortly after set it to -ENOSPC under the space info's spinlock, initialize it to -ENOSPC when declaring it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
When reserving space, at space-info.c:__reserve_bytes(), we assert that either the current task is not holding a transacion handle, or, if it is, that the flush method is not BTRFS_RESERVE_FLUSH_ALL. This is because that flush method can trigger transaction commits, and therefore could lead to a deadlock. However there are other 2 flush methods that can trigger transaction commits: 1) BTRFS_RESERVE_FLUSH_ALL_STEAL 2) BTRFS_RESERVE_FLUSH_EVICT So update the assertion to check the flush method is also not one those two methods if the current task is holding a transaction handle. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The BTRFS_RESERVE_FLUSH_EVICT flush method can also commit transactions, see the definition of the evict_flush_states const array at space-info.c, but the documentation for it at space-info.h does not mention it. So update the documentation. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The block reserve passed to btrfs_block_rsv_check() is never NULL, so remove the check. In case it can ever become NULL in the future, then we'll get a pretty obvious and clear NULL pointer dereference crash and stack trace. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
At btrfs_delayed_refs_rsv_refill(), we are passing a value of 0 to the 'update_size' argument of btrfs_block_rsv_add_bytes(), which is defined as a boolean. Functionally this is fine because a 0 is, implicitly, converted to a boolean false value. However it's easier to read an explicit 'false' value, so just pass 'false' instead of 0. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
The last argument of btrfs_block_rsv_migrate() is a boolean, but we are passing an integer, with a value of 1, to it at evict_refill_and_join(). While this is not a bug, due to type conversion, it's a lot more clear to simply pass the boolean true value instead. So just do that. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
It's not used anywhere at the moment, but it was used in earlier version of a patch that removed its use in the second version. So just remove btrfs_lru_cache_is_full(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_add_compressed_bio_pages is needlessly complicated. Instead of iterating over the logic disk offset just to add pages to the bio use a simple offset starting at 0, which also removes most of the claiming. Additionally __bio_add_pages already takes care of the assert that the bio is always properly sized, and btrfs_submit_bio called right after asserts that the bio size is non-zero. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Adding pages to a bio has nothing to do with the sector. Move the assignment to the two callers in preparation for cleaning up btrfs_add_compressed_bio_pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Naohiro Aota authored
Currently, /sys/fs/btrfs/<UUID>/bg_reclaim_threshold is limited to 0 (disable) or [50 .. 100]%, so we need to fill 50% of a device to start the auto reclaim process. It is cumbersome to do so when we want to shake out possible race issues of normal write vs reclaim. Relax the threshold check under the BTRFS_DEBUG option. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_split_bio expects a btrfs_bio as argument and always allocates one. Type both the orig_bio argument and the return value as struct btrfs_bio to improve type safety. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Return the containing struct btrfs_bio instead of the less type safe struct bio from btrfs_bio_alloc. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The bio in struct btrfs_bio_ctrl must be a btrfs_bio, so store a pointer to the btrfs_bio for better type checking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
struct btrfs_bio now has an always valid inode pointer that can be used to find the inode in submit_one_bio, so use that and initialize all variables for which it is possible at declaration time. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The original bio must be a btrfs_bio, so store a pointer to the btrfs_bio for better type checking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_submit_compressed_read expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_submit_bio expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
All algorithms have to fill the remainder of the orig_bio with zeroes, so do it in common code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
btrfs_encoded_read_regular_fill_pages has a pretty odd control flow. Unwind it so that there is a single loop over the pages array. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The inode and file_offset members in struct btrfs_encoded_read_private are unused, so remove them. Last used in commit 7959bd44 ("btrfs: remove the start argument to check_data_csum and export") and commit 7609afac ("btrfs: handle checksum validation and repair at the storage layer"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The DREW lock uses percpu variable to track lock counters and for that it needs to allocate the structure. In btrfs_read_tree_root() or btrfs_init_fs_root() this may add another error case or requires the NOFS scope protection. One way is to preallocate the structure as was suggested in https://lore.kernel.org/linux-btrfs/20221214021125.28289-1-robbieko@synology.com/ We may avoid the allocation altogether if we don't use the percpu variables but an atomic for the writer counter. This should not make any difference, the DREW lock is used for truncate and NOCOW writes along with other IO operations. The percpu counter for writers has been there since the original commit 8257b2dc "Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume". The reason could be to avoid hammering the same cacheline from all the readers but then the writers do that anyway. Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
If no discard mount option is specified, including the NODISCARD option, we make the async discard the default option then we don't have to call the clear_opt again to clear the NODISCARD flag. Though this makes no difference, that the call is redundant has been pointed out several times so we better remove it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
BTRFS_FEATURE_INCOMPAT_SUPP is defined twice, once under CONFIG_BTRFS_DEBUG and once without it, resulting in repetitive code. The reason for this is to add experimental features under CONFIG_BTRFS_DEBUG. To avoid repetitive code, add a common list BTRFS_FEATURE_INCOMPAT_SUPP_STABLE, and append experimental features only under CONFIG_BTRFS_DEBUG. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
We don't need to pass the roots as arguments, reading them from the rb-tree is cheap. Thus there is really not much need to pre-fetch it and pass it all the way from scrub_stripe(). And we already have more than enough arguments in scrub_simple_mirror() and scrub_simple_stripe(), it's better to remove them and only grab those roots in scrub_simple_mirror(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The variable @path is no longer passed into any call sites after commit 18d30ab9 ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), thus we can remove the variable completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
[BUG] Currently btrfs can use dev-replace device as an extra mirror for read-repair. But it can lead to NODATASUM corruption in the following case: There is a RAID1 data chunk, and dev-replace is running from dev2 to dev0. |//| = Replaced data X X+1MB X+2MB Dev 2: | | | <- Source dev Dev 0: |///////| | <- Target dev Then a read on dev 2 X+2MB happens. And something wrong happened inside devid 2, causing an -EIO. In that case, read-repair would try the next mirror, and since we can use target device as an extra mirror, we will use that mirror instead. But unfortunately since the read is beyond the current replace cursor, we should not trust it at all, what we get would be just uninitialized garbage. But if this read is for NODATASUM range, then we just trust them and cause data corruption. [CAUSE] We used to have some checks to make sure we only return such extra mirror when the range is before our left cursor. The first commit introducing this behavior is ad6d620e ("Btrfs: allow repair code to include target disk when searching mirrors"). But later a fix, 22ab04e8 ("Btrfs: fix race between device replace and chunk allocation") changed the behavior, to always let btrfs_map_block() include the extra mirror to address a race in dev-replace which can cause missing writes to target device. This means, we lose the tracking of cursor for the extra mirror, thus can lead to above corruption. [FIX] The extra mirror is never a reliable one, at the beginning of dev-replace, the reliability is zero, while only at the end of the replace it's a fully reliable mirror. We either do the complex tracking, or never trust it. IMHO it's much easier to maintain if we don't trust it at all, and the extra mirror can only benefit for a limited period of time (during replace). Thus this patch would completely remove the ability to use target device as an extra mirror. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Currently open_ctree() still uses two variables for error handling, err and ret. This can be confusing and missing some errors and does not conform to current coding style. This patch will fix the problems by: - Use only ret for error handling - Add proper ret assignment Originally we rely on the default value (-EINVAL) of err to handle errors, but that doesn't really reflects the error. This will change it use the correct error number for the following call sites: * subpage_info allocation * btrfs_free_extra_devids() * btrfs_check_rw_degradable() * cleaner_kthread allocation * transaction_kthread allocation - Add an extra ASSERT() To make sure we error out instead of returning 0. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Introduce a bio_offset variable for the current offset into the bio instead of recalculating it over and over. Remove the now only used once search_len and sector_offset variables, and reduce the scope for count and cur_disk_bytenr. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
There is no need to search for a file offset in a bio, it is now always provided in bbio->file_offset (set at bio allocation time since 0d495430 ("btrfs: set bbio->file_offset in alloc_new_bio")). Just use that with the offset into the bio. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
Nowadays calc_bio_boundaries() is a relatively simple function that only guarantees the one bio equals to one ordered extent rule for uncompressed Zone Append bios. Sink it into it's only caller alloc_new_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
bio_add_page always adds either the entire range passed to it or nothing. Based on that btrfs_bio_add_page can only return a length smaller than the passed in one when hitting the ordered extent limit, which can only happen for writes. Given that compressed writes never even use this code path, this means that all the special cases for compressed extent offset handling are dead code. Reflow submit_extent_page to take advantage of this by inlining btrfs_bio_add_page and handling the ordered extent limit by decrementing it for each added range and thus significantly simplifying the loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Different loop iterations in btrfs_bio_add_page not only have the same contiguity parameters, but also any non-initial operation operates on a fresh bio anyway. Factor out the contiguity check into a new btrfs_bio_is_contig and only call it once in submit_extent_page before descending into the bio_add_page loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Remove the has_error and saved_ret variables, and just jump to a goto label for error handling from the only place returning an error from the main loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
submit_extent_page always returns 0 since commit d5e4377d ("btrfs: split zone append bios in btrfs_submit_bio"). Change it to a void return type and remove all the unreachable error handling code in the callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Update the compress_type in the btrfs_bio_ctrl after forcing out the previous bio in btrfs_do_readpage, so that alloc_new_bio can just use the compress_type member in struct btrfs_bio_ctrl instead of passing the same information redundantly as a function argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Rename this_bio_flag to compress_type to match the surrounding code and better document the intent. Also use the proper enum type instead of unsigned long. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The compress_type can only change on a per-extent basis. So instead of checking it for every page in btrfs_bio_add_page, do the check once in btrfs_do_readpage, which is the only caller of btrfs_bio_add_page and submit_extent_page that deals with compressed extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Instead of passing down the wbc pointer the deep call chain, just add it to the btrfs_bio_ctrl structure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The sync_io flag is equivalent to wbc->sync_mode == WB_SYNC_ALL, so just check for that and remove the separate flag. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
The bio op and flags never change over the life time of a bio_ctrl, so move it in there instead of passing it down the deep call chain all the way down to alloc_new_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-