Commit 2280d425 authored by Filipe Manana's avatar Filipe Manana Committed by David Sterba

btrfs: ignore fiemap path cache when there are multiple paths for a node

During fiemap, when walking backreferences to determine if a b+tree
node/leaf is shared, we may find a tree block (leaf or node) for which
two parents were added to the references ulist. This happens if we get
for example one direct ref (shared tree block ref) and one indirect ref
(non-shared tree block ref) for the tree block at the current level,
which can happen during relocation.

In that case the fiemap path cache can not be used since it's meant for
a single path, with one tree block at each possible level, so having
multiple references for a tree block at any level may result in getting
the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the
warning:

   WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)

at lookup_backref_shared_cache() and at store_backref_shared_cache().
This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the
warning is there just to catch any unexpected case like the one described
above. However if a user finds this it may be scary and get reported.

So just ignore the path cache once we find a tree block for which there
are more than one reference, which is the less common case, and update
the cache with the sharedness check result for all levels below the level
for which we found multiple references.
Reported-by: default avatarJarno Pelkonen <jarno.pelkonen@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/
Fixes: 12a824dc ("btrfs: speedup checking for extent sharedness during fiemap")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
parent 2d82a40a
......@@ -1921,8 +1921,7 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
level = -1;
ULIST_ITER_INIT(&uiter);
while (1) {
bool is_shared;
bool cached;
const unsigned long prev_ref_count = ctx->refs.nnodes;
walk_ctx.bytenr = bytenr;
ret = find_parent_nodes(&walk_ctx, &shared);
......@@ -1940,21 +1939,36 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
ret = 0;
/*
* If our data extent was not directly shared (without multiple
* reference items), than it might have a single reference item
* with a count > 1 for the same offset, which means there are 2
* (or more) file extent items that point to the data extent -
* this happens when a file extent item needs to be split and
* then one item gets moved to another leaf due to a b+tree leaf
* split when inserting some item. In this case the file extent
* items may be located in different leaves and therefore some
* of the leaves may be referenced through shared subtrees while
* others are not. Since our extent buffer cache only works for
* a single path (by far the most common case and simpler to
* deal with), we can not use it if we have multiple leaves
* (which implies multiple paths).
* More than one extent buffer (bytenr) may have been added to
* the ctx->refs ulist, in which case we have to check multiple
* tree paths in case the first one is not shared, so we can not
* use the path cache which is made for a single path. Multiple
* extent buffers at the current level happen when:
*
* 1) level -1, the data extent: If our data extent was not
* directly shared (without multiple reference items), then
* it might have a single reference item with a count > 1 for
* the same offset, which means there are 2 (or more) file
* extent items that point to the data extent - this happens
* when a file extent item needs to be split and then one
* item gets moved to another leaf due to a b+tree leaf split
* when inserting some item. In this case the file extent
* items may be located in different leaves and therefore
* some of the leaves may be referenced through shared
* subtrees while others are not. Since our extent buffer
* cache only works for a single path (by far the most common
* case and simpler to deal with), we can not use it if we
* have multiple leaves (which implies multiple paths).
*
* 2) level >= 0, a tree node/leaf: We can have a mix of direct
* and indirect references on a b+tree node/leaf, so we have
* to check multiple paths, and the extent buffer (the
* current bytenr) may be shared or not. One example is
* during relocation as we may get a shared tree block ref
* (direct ref) and a non-shared tree block ref (indirect
* ref) for the same node/leaf.
*/
if (level == -1 && ctx->refs.nnodes > 1)
if ((ctx->refs.nnodes - prev_ref_count) > 1)
ctx->use_path_cache = false;
if (level >= 0)
......@@ -1964,18 +1978,45 @@ int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
if (!node)
break;
bytenr = node->val;
level++;
cached = lookup_backref_shared_cache(ctx, root, bytenr, level,
&is_shared);
if (cached) {
ret = (is_shared ? 1 : 0);
break;
if (ctx->use_path_cache) {
bool is_shared;
bool cached;
level++;
cached = lookup_backref_shared_cache(ctx, root, bytenr,
level, &is_shared);
if (cached) {
ret = (is_shared ? 1 : 0);
break;
}
}
shared.share_count = 0;
shared.have_delayed_delete_refs = false;
cond_resched();
}
/*
* If the path cache is disabled, then it means at some tree level we
* got multiple parents due to a mix of direct and indirect backrefs or
* multiple leaves with file extent items pointing to the same data
* extent. We have to invalidate the cache and cache only the sharedness
* result for the levels where we got only one node/reference.
*/
if (!ctx->use_path_cache) {
int i = 0;
level--;
if (ret >= 0 && level >= 0) {
bytenr = ctx->path_cache_entries[level].bytenr;
ctx->use_path_cache = true;
store_backref_shared_cache(ctx, root, bytenr, level, ret);
i = level + 1;
}
for ( ; i < BTRFS_MAX_LEVEL; i++)
ctx->path_cache_entries[i].bytenr = 0;
}
/*
* Cache the sharedness result for the data extent if we know our inode
* has more than 1 file extent item that refers to the data extent.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment