Commits · 494036d862dfff1de9782492692da225479b7146 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: BCH_WATERMARK_reclaim · 494036d8

Kent Overstreet authored Jun 27, 2023

Add another watermark for journal reclaim - this is needed for the next
patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

494036d8

bcachefs: struct bch_extent_rebalance · 2766876d

Kent Overstreet authored Jun 27, 2023

This adds the extent entry for extents that rebalance needs to do
something with.

We're adding this ahead of the main rebalance_work patchset, because
adding new extent entries can't be done in a forwards-compatible way.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2766876d

bcachefs: Expand BTREE_NODE_ID · 4e1430a7

Kent Overstreet authored Jun 27, 2023

We now have 20 bits for the btree ID in the on disk format - sufficient
for 1 million distinct btrees.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4e1430a7

bcachefs: Fix btree node write error message · e4eb661d

Kent Overstreet authored Jun 27, 2023

Error messages should include the error code, when available.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4eb661d

bcachefs: fsck: Break walk_inode() up into multiple functions · 06dcca51

Kent Overstreet authored Jun 25, 2023

Some refactoring, prep work for algorithm improvements related to
snapshots.

we need to add a bitmap to the list of inodes for "seen this snapshot";
for this bitmap to correctly be available, we'll need to gather the list
of inodes first, and later look up the inode for a given snapshot.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

06dcca51

bcachefs: Fix leak in backpointers fsck · 1fa3e87a

Kent Overstreet authored Jun 27, 2023

We were forgetting to exit a printbuf - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1fa3e87a

bcachefs: unregister_shrinker() now safe on not-registered shrinker · b3591acc
Kent Overstreet authored Jun 26, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b3591acc

bcachefs: Add a missing rhashtable_destroy() call · 0ce4e0e7

Kent Overstreet authored Jun 26, 2023

Fixes https://lore.kernel.org/linux-bcachefs/784c3e6a-75bd-e6ca-535a-43b3e1daf643@kernel.dk/T/#mbf7caf005f960018eba23b58795d06c06c947411Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0ce4e0e7

bcachefs: Improve bch2_bkey_make_mut() · 0fb3355d

Kent Overstreet authored Jun 26, 2023

bch2_bkey_make_mut() now takes the bkey_s_c by reference and points it
at the new, mutable key.

This helps in some fsck paths that may have multiple repair operations
on the same key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0fb3355d

bcachefs: Reduce stack frame size of bch2_check_alloc_info() · 298ac24e

Kent Overstreet authored Jun 26, 2023

Excessive inlining may (on some versions of gcc?) cause excessive stack
usage; this turns off some inlining in bch2_check_alloc_info.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

298ac24e

bcachefs: fsck needs BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE · 75da9764

Kent Overstreet authored Jun 25, 2023

A few fsck paths weren't using BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

75da9764

bcachefs: Improve error message for overlapping extents · 454377d8

Kent Overstreet authored Jun 24, 2023

We now print out the full previous extent we overlapping with, to aid in
debugging and searching through the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

454377d8

bcachefs: Fix check_pos_snapshot_overwritten() · 8f507f89
Kent Overstreet authored Jun 24, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
8f507f89

bcachefs: Rename enum alloc_reserve -> bch_watermark · e53a961c

Kent Overstreet authored Jun 24, 2023

This is prep work for consolidating with JOURNAL_WATERMARK.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e53a961c

bcachefs: BCH_ERR_fsck -> EINVAL · e9d01723

Kent Overstreet authored Jun 24, 2023

When we return errors outside of bcachefs, we need to return a standard
error code - fix this for BCH_ERR_fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e9d01723

bcachefs: bch2_trans_mark_pointer() refactoring · 3a63b32f

Kent Overstreet authored Jun 24, 2023

bch2_bucket_backpointer_mod() doesn't need to update the alloc key, we
can exit the alloc iter earlier.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3a63b32f

bcachefs: Fix more lockdep splats in debug.c · 9473cff9

Kent Overstreet authored Jun 21, 2023

Similar to previous fixes, we can't incur page faults while holding
btree locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9473cff9

bcachefs: Fix lockdep splat in bch2_readdir · 462f494b

Kent Overstreet authored Jun 21, 2023

dir_emit() can fault (taking mmap_lock); thus we can't be holding btree
locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

462f494b

bcachefs: Check for ERR_PTR() from filemap_lock_folio() · b6898917
Kent Overstreet authored Jun 21, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b6898917

bcachefs: New error message helpers · 1bb3c2a9

Kent Overstreet authored Jun 20, 2023

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
 - bch_err_fn
 - bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1bb3c2a9

bcachefs: fiemap: Fix a lockdep splat · a83e108f

Kent Overstreet authored Jun 19, 2023

As with the previous patch, we generally can't hold btree locks while
copying to userspace, as that may incur a page fault and require
mmap_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a83e108f

bcachefs: seqmutex; fix a lockdep splat · a5b696ee

Kent Overstreet authored Jun 19, 2023

We can't be holding btree_trans_lock while copying to user space, which
might incur a page fault. To fix this, convert it to a seqmutex so we
can unlock/relock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a5b696ee

bcachefs: Don't call lock_graph_descend() with wait lock held · 6547ebab

Kent Overstreet authored Jun 19, 2023

This fixes a deadlock:

01305 WARNING: possible circular locking dependency detected
01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G        W
01305 ------------------------------------------------------
01305 cat/14658 is trying to acquire lock:
01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278
01305
01305 but task is already holding lock:
01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58
01305
01305 which lock already depends on the new lock.
01305
01305
01305 the existing dependency chain (in reverse order) is:
01305
01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}:
01305        _raw_spin_lock+0x54/0x70
01305        __six_lock_wakeup+0x40/0x1b0
01305        six_unlock_ip+0xe8/0x248
01305        bch2_btree_key_cache_scan+0x720/0x940
01305        shrink_slab.constprop.0+0x284/0x770
01305        shrink_node+0x390/0x828
01305        balance_pgdat+0x390/0x6d0
01305        kswapd+0x2e4/0x718
01305        kthread+0x184/0x1a8
01305        ret_from_fork+0x10/0x20
01305
01305 -> #1 (&c->lock#2){+.+.}-{3:3}:
01305        __mutex_lock+0x104/0x14a0
01305        mutex_lock_nested+0x30/0x40
01305        bch2_btree_key_cache_scan+0x5c/0x940
01305        shrink_slab.constprop.0+0x284/0x770
01305        shrink_node+0x390/0x828
01305        balance_pgdat+0x390/0x6d0
01305        kswapd+0x2e4/0x718
01305        kthread+0x184/0x1a8
01305        ret_from_fork+0x10/0x20
01305
01305 -> #0 (fs_reclaim){+.+.}-{0:0}:
01305        __lock_acquire+0x19d0/0x2930
01305        lock_acquire+0x1dc/0x458
01305        fs_reclaim_acquire+0x9c/0xe0
01305        __kmem_cache_alloc_node+0x48/0x278
01305        __kmalloc_node_track_caller+0x5c/0x278
01305        krealloc+0x94/0x180
01305        bch2_printbuf_make_room.part.0+0xac/0x118
01305        bch2_prt_printf+0x150/0x1e8
01305        bch2_btree_bkey_cached_common_to_text+0x170/0x298
01305        bch2_btree_trans_to_text+0x244/0x348
01305        print_cycle+0x7c/0xb0
01305        break_cycle+0x254/0x528
01305        bch2_check_for_deadlock+0x59c/0xa58
01305        bch2_btree_deadlock_read+0x174/0x200
01305        full_proxy_read+0x94/0xf0
01305        vfs_read+0x15c/0x3a8
01305        ksys_read+0xb8/0x148
01305        __arm64_sys_read+0x48/0x60
01305        invoke_syscall.constprop.0+0x64/0x138
01305        do_el0_svc+0x84/0x138
01305        el0_svc+0x34/0x80
01305        el0t_64_sync_handler+0xb0/0xb8
01305        el0t_64_sync+0x14c/0x150
01305
01305 other info that might help us debug this:
01305
01305 Chain exists of:
01305   fs_reclaim --> &c->lock#2 --> &lock->wait_lock
01305
01305  Possible unsafe locking scenario:
01305
01305        CPU0                    CPU1
01305        ----                    ----
01305   lock(&lock->wait_lock);
01305                                lock(&c->lock#2);
01305                                lock(&lock->wait_lock);
01305   lock(fs_reclaim);
01305
01305  *** DEADLOCK ***
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6547ebab

bcachefs: Fix bch2_check_discard_freespace_key() · e96f5a61

Kent Overstreet authored Jun 18, 2023

We weren't correctly checking the freespace btree - it's an extents
btree, which means we need to iterate over each bucket in a freespace
extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e96f5a61

bcachefs: bch2_trans_unlock_noassert() · 25aa8c21

Kent Overstreet authored Jun 18, 2023

This fixes a spurious assert in the btree node read path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

25aa8c21

bcachefs: Fix bch2_btree_update_start() · 45a1ab57

Kent Overstreet authored Jun 16, 2023

The calculation for number of nodes to allocate in
bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the
small nodes test.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

45a1ab57

bcachefs: bch2_extent_ptr_desired_durability() · 91ecd41b

Kent Overstreet authored Jun 13, 2023

This adds a new helper for getting a pointer's durability irrespective
of the device state, and uses it in the the data update path.

This fixes a bug where we do a data update but request 0 replicas to be
allocated, because the replica being rewritten is on a device marked as
failed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

91ecd41b

bcachefs: snapshot_to_text() includes snapshot tree · 253748a2
Kent Overstreet authored Jun 13, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
253748a2

bcachefs: Fix try_decrease_writepoints() · 995f9128

Kent Overstreet authored Mar 16, 2023

 - We may need to drop btree locks before taking the writepoint_lock, as
   is done in other places.
 - We should be using open_bucket_free_unused(), so that we don't waste
   space.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

995f9128

bcachefs: Delete weird hacky transaction restart injection · 25c70097

Kent Overstreet authored Jun 11, 2023

since we currently don't have a good fault injection library,
bch2_btree_insert_node() was randomly injecting faults based on
local_clock().

At the very least this should have been a debug mode only thing, but
this is a brittle method so let's just delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

25c70097

bcachefs: Write buffer flush needs BTREE_INSERT_NOCHECK_RW · 8e5b1115

Kent Overstreet authored Jun 11, 2023

btree write buffer flush is only invoked from contexts that already hold
a write ref, and checking if we're still RW could cause us to fail to
completely flush the write buffer when shutting down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8e5b1115

bcachefs: New assertions when marking filesystem clean · 7724664f
Kent Overstreet authored Jun 11, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
7724664f
bcachefs: ec: Fix a lost wakeup · 99a3d398
Kent Overstreet authored Jun 10, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
99a3d398

bcachefs: fix NULL pointer dereference in try_alloc_bucket · 954ed17e

Mikulas Patocka authored May 30, 2023

On Mon, 29 May 2023, Mikulas Patocka wrote:

> The oops happens in set_btree_iter_dontneed and it is caused by the fact
> that iter->path is NULL. The code in try_alloc_bucket is buggy because it
> sets "struct btree_iter iter = { NULL };" and then jumps to the "err"
> label that tries to dereference values in "iter".

Here I'm sending a patch for it.

From: Mikulas Patocka <mpatocka@redhat.com>

The function try_alloc_bucket sets the variable "iter" to NULL and then
(on various error conditions) jumps to the label "err". On the "err"
label, it calls "set_btree_iter_dontneed" that tries to dereference
"iter->trans" and "iter->path".

So, we get an oops on error condition.

This patch fixes the crash by testing that iter.trans and iter.path is
non-zero before calling set_btree_iter_dontneed.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

954ed17e

bcachefs: Fix subvol deletion deadlock · b0e8c75e

Kent Overstreet authored Jun 09, 2023

d_prune_aliases() may call bch2_evict_inode(), which needs
c->vfs_inodes_list_lock.

Fix this by always calling igrab() before putting the inodes onto our
disposal list, and then calling d_prune_aliases() with
c->vfs_inodes_lock dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b0e8c75e

bcachefs: don't spin in rebalance when background target is not usable · 5bc74082

Brian Foster authored May 30, 2023

If a bcachefs filesystem is configured with a background device
(disk group), rebalance will relocate data to this device in the
background by checking extent keys for whether they currently reside
in the specified target. For keys that do not, rebalance performs a
read/write cycle to allow the write path to properly relocate data.

If the background target is not usable (read-only, for example),
however, the write path doesn't actually move data to another
device. Instead, rebalance spins indefinitely reading and rewriting
the same data over and over to the same device. If the background
target is made available again, the rebalance picks this up,
relocates the data, and eventually terminates.

To avoid this spinning behavior, update the rebalance background
target logic to not only check whether the extent is not in the
target, but whether the target is actually usable as well. If not,
then don't mark the key for rewrite.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5bc74082

bcachefs: push rcu lock down into bch2_target_to_mask() · a1dd428b

Brian Foster authored May 30, 2023

We have one caller that cycles the rcu lock solely for this call
(via target_rw_devs()), and we'd like to add another. Simplify
things by pushing the rcu lock down into bch2_target_to_mask(),
similar to how bch2_dev_in_target() works.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a1dd428b

bcachefs: create internal disk_groups sysfs file · fec4fc82

Brian Foster authored May 30, 2023

We have bch2_sb_disk_groups_to_text() to dump disk group labels, but
no good information on device group membership at runtime. Add
bch2_disk_groups_to_text() and an associated 'disk_groups' sysfs
file to print group and device relationships.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fec4fc82

bcachefs: Clean up tests code · 28551613

Kent Overstreet authored Jun 05, 2023

 - delete redundant error messages
 - convert various code to bch2_trans_run
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

28551613

bcachefs: Improve backpointers error message · bc166d71

Kent Overstreet authored Jun 05, 2023

the error message here dated from when backpointers could be stored in
alloc keys; now, we should always print the full key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bc166d71