Commits · e057a290ef715d2765560778625e1660b7352994 · Kirill Smelkov / linux

28 Sep, 2024 29 commits

Alan Huang authored Aug 27, 2024

If the reader acquires the read lock and then the writer enters the slow
path, while the reader proceeds to the unlock path, the following scenario
can occur without the change:

writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0)
reader: this_cpu_dec(*lock->readers)
reader: smp_mb()
reader: state = atomic_read(&lock->state) (there is no waiting flag set)
writer: six_set_bitmask()

then the writer will sleep forever.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e057a290

bcachefs: Check for logged ops when clean · d50d7a5f

Kent Overstreet authored Sep 26, 2024

If we shut down successfully, there shouldn't be any logged ops to
resume.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d50d7a5f

bcachefs: BCH_FS_clean_recovery · 1c0ee43b

Kent Overstreet authored Sep 26, 2024

Add a filesystem flag to indicate whether we did a clean recovery -
using c->sb.clean after we've got rw is incorrect, since c->sb is
updated whenever we write the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1c0ee43b

bcachefs: Convert disk accounting BUG_ON() to WARN_ON() · 9773547b

Kent Overstreet authored Sep 27, 2024

We had a bug where disk accounting keys didn't always have their version
field set in journal replay; change the BUG_ON() to a WARN(), and
exclude this case since it's now checked for elsewhere (in the bkey
validate function).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9773547b

bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply · a3581ca3

Kent Overstreet authored Sep 26, 2024

This was added to avoid double-counting accounting keys in journal
replay. But applied incorrectly (easily done since it applies to the
transaction commit, not a particular update), it leads to skipping
in-mem accounting for real accounting updates, and failure to give them
a version number - which leads to journal replay becoming very confused
the next time around.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a3581ca3

bcachefs: Check for accounting keys with bversion=0 · f8911ad8
Kent Overstreet authored Sep 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
f8911ad8

bcachefs: rename version -> bversion · cf49f8a8

Kent Overstreet authored Sep 26, 2024

give bversions a more distinct name, to aid in grepping
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cf49f8a8

bcachefs: Don't delete unlinked inodes before logged op resume · fd65378d

Kent Overstreet authored Sep 26, 2024

Previously, check_inode() would delete unlinked inodes if they weren't
on the deleted list - this code dating from before there was a deleted
list.

But, if we crash during a logged op (truncate or finsert/fcollapse) of
an unlinked file, logged op resume will get confused if the inode has
already been deleted - instead, just add it to the deleted list if it
needs to be there; delete_dead_inodes runs after logged op resume.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fd65378d

bcachefs: Fix BCH_SB_ERRS() so we can reorder · 8d65b15f

Kent Overstreet authored Sep 26, 2024

BCH_SB_ERRS() has a field for the actual enum val so that we can reorder
to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for
this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8d65b15f

bcachefs: Fix fsck warnings from bkey validation · 5612daaf

Kent Overstreet authored Sep 26, 2024

__bch2_fsck_err() warns if the current task has a btree_trans object and
it wasn't passed in, because if it has to prompt for user input it has
to be able to unlock it.

But plumbing the btree_trans through bkey_validate(), as well as
transaction restarts, is problematic - so instead make bkey fsck errors
FSCK_AUTOFIX, which doesn't need to warn.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5612daaf

bcachefs: Move transaction commit path validation to as late as possible · 7c980a43

Kent Overstreet authored Sep 26, 2024

In order to check for accounting keys with version=0, we need to run
validation after they've been assigned version numbers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7c980a43

bcachefs: Fix disk accounting attempting to mark invalid replicas entry · 431312b5

Kent Overstreet authored Sep 25, 2024

This fixes the following bug, where a disk accounting key has an invalid
replicas entry, and we attempt to add it to the superblock:

bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read...
accounting not marked in superblock replicas
replicas cached: 1/1 [0], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 88):
user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4]

bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644
accounting not marked in superblock replicas
replicas user: 1/1 [3], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0]
replicas_v0 (size 96):
user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5]

accounting not marked in superblock replicas
replicas user: 1/2 [3 7], fixing
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7]
replicas_v0 (size 96):
user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5]

done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done
bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

431312b5

bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate() · 49fd90b2
Kent Overstreet authored Sep 25, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
49fd90b2

bcachefs: Fix accounting read + device removal · 9104fc19

Kent Overstreet authored Sep 25, 2024

accounting read was checking if accounting replicas entries were marked
in the superblock prior to applying accounting from the journal,
which meant that a recently removed device could spuriously trigger a
"not marked in superblocked" error (when journal entries zero out the
offending counter).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9104fc19

bcachefs: bch_accounting_mode · 1e0272ef

Kent Overstreet authored Sep 24, 2024

Minor refactoring - replace multiple bool arguments with an enum; prep
work for fixing a bug in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1e0272ef

bcachefs: fix transaction restart handling in check_extents(), check_dirents() · 3672bda8

Kent Overstreet authored Sep 23, 2024

Dealing with outside state within a btree transaction is always tricky.

check_extents() and check_dirents() have to accumulate counters for
i_sectors and i_nlink (for subdirectories). There were two bugs:

- transaction commit may return a restart; therefore we have to commit
  before accumulating to those counters
- get_inode_all_snapshots() may return a transaction restart, before
  updating w->last_pos; then, on the restart,
  check_i_sectors()/check_subdir_count() would see inodes that were not
  for w->last_pos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3672bda8

bcachefs: kill inode_walker_entry.seen_this_pos · 22a507d6
Kent Overstreet authored Sep 23, 2024
```
dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
22a507d6

bcachefs: Fix incorrect IS_ERR_OR_NULL usage · b29c30ab

Kent Overstreet authored Sep 24, 2024

Returning a positive integer instead of an error code causes error paths
to become very confused.

Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b29c30ab

bcachefs: fix the memory leak in exception case · dc5bfdf8

Hongbo Li authored Sep 24, 2024

The pointer clean points the memory allocated by kmemdup, when the
return value of bch2_sb_clean_validate_late is not zero. The memory
pointed by clean is leaked. So we should free it in this case.

Fixes: a37ad1a3 ("bcachefs: sb-clean.c")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dc5bfdf8

bcachefs: fast exit when darray_make_room failed · 3125c95e

Hongbo Li authored Sep 24, 2024

In downgrade_table_extra, the return value is needed. When it
return failed, we should exit immediately.

Fixes: 7773df19 ("bcachefs: metadata version bucket_stripe_sectors")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3125c95e

bcachefs: Fix iterator leak in check_subvol() · 951dd86e
Kent Overstreet authored Sep 23, 2024
```
A couple small error handling fixes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
951dd86e

bcachefs: Add snapshot to bch_inode_unpacked · 2a1df873

Kent Overstreet authored Sep 23, 2024

this allows for various cleanups in fsck
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2a1df873

bcachefs: assign return error when iterating through layout · 40d40c6b

Diogo Jahchan Koike authored Sep 23, 2024

syzbot reported a null ptr deref in __copy_user [0]

In __bch2_read_super, when a corrupt backup superblock matches the
default opts offset, no error is assigned to ret and the freed superblock
gets through, possibly being assigned as the best sb in bch2_fs_open and
being later dereferenced, causing a fault. Assign EINVALID to ret when
iterating through layout.

[0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876

Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

40d40c6b

bcachefs: Fix srcu warning in check_topology · c6040447

Kent Overstreet authored Sep 23, 2024

check_topology doesn't need the srcu lock and doesn't use normal btree
transactions - we can just drop the srcu lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c6040447

bcachefs: Fix error path in check_dirent_inode_dirent() · 18c520f4

Kent Overstreet authored Sep 23, 2024

fsck_err() jumps to the fsck_err label when bailing out; need to make
sure bp_iter was initialized...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

18c520f4

bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping · 0696a18a

Piotr Zalewski authored Sep 22, 2024

Zero-initialize part of allocated bounce buffer which wasn't touched by
subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value
use KMSAN bug[1].

After applying the patch reproducer still triggers stack overflow[2] but
it seems unrelated to the uninit-value use warning. After further
investigation it was found that stack overflow occurs because KMSAN adds
too many function calls[3]. Backtrace of where the stack magic number gets
smashed was added as a reply to syzkaller thread[3].

It was confirmed that task's stack magic number gets smashed after the code
path where KSMAN detects uninit-value use is executed, so it can be assumed
that it doesn't contribute in any way to uninit-value use detection.

[1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
[2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com
[3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me

Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718
Fixes: ec4edd7b ("bcachefs: Prep work for variable size btree node buffers")
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0696a18a

bcachefs: Improve bch2_is_inode_open() warning message · 51b7cc7c
Kent Overstreet authored Sep 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
51b7cc7c

bcachefs: Add extra padding in bkey_make_mut_noupdate() · 4a8f8faf

Kent Overstreet authored Sep 23, 2024

This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4a8f8faf

bcachefs: Mark inode errors as autofix · f890c851

Kent Overstreet authored Sep 23, 2024

Most or all errors will be autofix in the future, we're currently just
doing the ones that we know are well tested.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f890c851

23 Sep, 2024 2 commits

bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves() · 7eb4a319

Kent Overstreet authored Sep 23, 2024

As we iterate we need to mark that we no longer need iterators -
otherwise we'll infinite loop via the "too many iters" check when
there's many snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7eb4a319

bcachefs: Ensure BCH_FS_accounting_replay_done is always set · 6d12d7ac

Kent Overstreet authored Sep 22, 2024

if it doesn't get set we'll never be able to flush the btree write
buffer; this only happens in fake rw mode, but prevents us from shutting
down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6d12d7ac

21 Sep, 2024 9 commits

bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol() · 39c3aad4

Ahmed Ehab authored Sep 22, 2024

Syzbot reports a problem that a warning is triggered due to suspicious
use of rcu_dereference_check(). That is triggered by a call of
bch2_snapshot_tree_oldest_subvol().

The cause of the warning is that inside
bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls
rcu_dereference() that requires a read lock to be held. Also, the call
of bch2_snapshot_tree_next() eventually calls snapshot_t().

To fix this, call rcu_read_lock() before calling snapshot_t(). Then,
release the lock after the termination of the while loop.

Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com>
Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

39c3aad4

bcachefs: return err ptr instead of null in read sb clean · 025c55a4

Diogo Jahchan Koike authored Sep 10, 2024

syzbot reported a null-ptr-deref in bch2_fs_start. [0]

When a sb is marked clear but doesn't have a clean section
bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO
lets through, eventually leading to a null ptr dereference down
the line. Adjust read sb clean to return an ERR_PTR indicating the
invalid clean section.

[0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543

Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

025c55a4

bcachefs: Remove duplicated include in backpointers.c · abb43dd6

Yang Li authored Sep 09, 2024

The header files bbpos.h is included twice in backpointers.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

abb43dd6

bcachefs: Don't drop devices with stripe pointers · d5c5b337
Kent Overstreet authored Sep 06, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
d5c5b337

bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices · 035d72f7

Kent Overstreet authored Sep 06, 2024

This factors out ec_strie_head_devs_update(), which initializes the
bitmap of devices we're allocating from, and runs it every time
c->rw_devs_change_count changes.

We also cancel pending, not allocated stripes, since they may refer to
devices that are no longer available.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

035d72f7

bcachefs: bch_fs.rw_devs_change_count · 83ccd9b3

Kent Overstreet authored Sep 06, 2024

Add a counter that's incremented whenever rw devices change; this will
be used for erasure coding so that it can keep ec_stripe_head in sync
and not deadlock on a new stripe when a device it wants goes away.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

83ccd9b3

bcachefs: bch2_dev_remove_stripes() · ad8d1f77

Kent Overstreet authored Sep 01, 2024

We can now correctly force-remove a device that has stripes on it; this
uses the new BCH_SB_MEMBER_INVALID sentinal value.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ad8d1f77

bcachefs: bch2_trigger_ptr() calculates sectors even when no device · 934137b0

Kent Overstreet authored Sep 07, 2024

This is necessary for erasure coded pointers to devices that have been
removed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

934137b0

bcachefs: improve error messages in bch2_ec_read_extent() · 2aee59eb
Kent Overstreet authored Sep 07, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
2aee59eb