Commits · b5cbb42dc59f519fa3cf49b9afbd5ee4805be01b · Kirill Smelkov / linux

29 Jun, 2024 6 commits

bcachefs: Repair fragmentation_lru in alloc_write_key() · b5cbb42d

Kent Overstreet authored Jun 29, 2024

fragmentation_lru derives from dirty_sectors, and wasn't being checked.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b5cbb42d

bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref() · d39881d2

Kent Overstreet authored Jun 29, 2024

We need to make sure we're not missing any fragmenation entries in the
LRU BTREE after repairing ALLOC BTREE

Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was
only working without it before since bucket invalidation (usually)
wasn't happening while fsck was running.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d39881d2

bcachefs: bch2_btree_write_buffer_maybe_flush() · 92e1c29a

Kent Overstreet authored Jun 29, 2024

Add a new helper for checking references to write buffer btrees, where
we need a flush before we definitively know we have an inconsistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

92e1c29a

bcachefs: Add missing printbuf_tabstops_reset() calls · ef05bdf5

Kent Overstreet authored Jun 29, 2024

Fixes warnings from bch2_print_allocator_stuck()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ef05bdf5

bcachefs: Fix loop restart in bch2_btree_transactions_read() · 67c56411

Kent Overstreet authored Jun 28, 2024

Accidental infinite loop; also fix btree_deadlock_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

67c56411

bcachefs: Fix bch2_read_retry_nodecode() · 1539bdf5

Kent Overstreet authored Jun 28, 2024

BCH_READ_NODECODE mode - used by the move paths - really wants to use
only the original rbio, but the retry path really wants to clone - oof.

Make sure to copy the crc of the pointer we read from back to the
original rbio, or we'll see spurious checksum errors later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1539bdf5

28 Jun, 2024 5 commits

bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs · 44ec5990

Kent Overstreet authored Jun 28, 2024

On a new filesystem or device we have to allocate the journal with a
bump allocator, because allocation info isn't ready yet - but when
hot-adding a device that doesn't have a journal, we don't want to use
that path.

Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

44ec5990

bcachefs: Fix shift greater than integer size · a0bd30e4

Kent Overstreet authored Jun 28, 2024

Reported-by: syzbot+e5292b50f1957164a4b6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a0bd30e4

bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning · 600b8be5
Kent Overstreet authored Jun 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
600b8be5

bcachefs: Delete old faulty bch2_trans_unlock() call · 84db6000

Kent Overstreet authored Jun 28, 2024

the unlock is now in read_extent, this fixes an assertion pop in
read_from_stale_dirty_pointer()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84db6000

bcachefs: Switch online_reserved shutdown assert to WARN() · 759b2e80
Kent Overstreet authored Jun 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
759b2e80

26 Jun, 2024 1 commit

bcachefs: Fix kmalloc bug in __snapshot_t_mut · 64cd7de9

Pei Li authored Jun 25, 2024

When allocating too huge a snapshot table, we should fail gracefully
in __snapshot_t_mut() instead of fail in kmalloc().

Reported-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1
Tested-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com
Signed-off-by: Pei Li <peili.dev@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

64cd7de9

25 Jun, 2024 3 commits

bcachefs: Discard, invalidate workers are now per device · 64ee1431

Kent Overstreet authored Jun 23, 2024

There's no reason for discards to be single threaded across all devices;
this will improve performance on multi device setups.

Additionally, making them per-device simplifies the refcounting on
bch_dev->io_ref; we now hold it for the duration that the discard path
is running, which fixes a race between the discard path and device
removal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

64ee1431

bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc · 472237b6

Pei Li authored Jun 25, 2024

This series fix the shift-out-of-bounds issue in
bch2_blacklist_entries_gc().

Instead of passing 0 to eytzinger0_first() when iterating the entries,
we explicitly check 0 and initialize i to be 0.

syzbot has tested the proposed patch and the reproducer did not trigger
any issue:

Reported-and-tested-by: syzbot+835d255ad6bc7f29ee12@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=835d255ad6bc7f29ee12Signed-off-by: Pei Li <peili.dev@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

472237b6

bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu · 211c581d

Pei Li authored Jun 25, 2024

Acquire fsck_error_counts_lock before accessing the critical section
protected by this lock.

syzbot has tested the proposed patch and the reproducer did not trigger
any issue.

Reported-by: syzbot+a2bc0e838efd7663f4d9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a2bc0e838efd7663f4d9Signed-off-by: Pei Li <peili.dev@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

211c581d

23 Jun, 2024 8 commits

bcachefs: Add missing bch2_journal_do_writes() call · 89d21b69

Kent Overstreet authored Jun 23, 2024

This fixes a rare deadlock when we're doing an emergency shutdown due to
failure to do a journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

89d21b69

bcachefs: Fix null ptr deref in journal_pins_to_text() · d6b52f68
Kent Overstreet authored Jun 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
d6b52f68

bcachefs: Add missing recalc_capacity() call · 36da8e38

Kent Overstreet authored Jun 23, 2024

This fixes filesystem size not changing on device removal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

36da8e38

bcachefs: Fix btree_trans list ordering · 1aaf5cb4

Kent Overstreet authored Jun 22, 2024

The debug code relies on btree_trans_list being ordered so that it can
resume on subsequent calls or lock restarts.

However, it was using trans->locknig_wait.task.pid, which is incorrect
since btree_trans objects are cached and reused - typically by different
tasks.

Fix this by switching to pointer order, and also sort them lazily when
required - speeding up the btree_trans_get() fastpath.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1aaf5cb4

bcachefs: Fix race between trans_put() and btree_transactions_read() · de611ab6

Kent Overstreet authored Jun 22, 2024

debug.c was using closure_get() on a different thread's closure where
the we don't know if the object being refcounted is alive.

We keep btree_trans objects on a list so they can be printed by debug
code, and because it is cost prohibitive to touch the btree_trans list
every time we allocate and free btree_trans objects, cached objects are
also on this list.

However, we do not want the debug code to see cached but not in use
btree_trans objects - critically because the btree_paths array will have
been freed (if it was reallocated).

closure_get() is also incorrect to use when that get may race with it
hitting zero, i.e. we must already have a ref on the object or know the
ref can't currently hit 0 for other reasons (as used in the cycle
detector).

to fix this, use the previously introduced closure_get_not_zero(),
closure_return_sync(), and closure_init_stack_release(); the debug code
now can only take a ref on a trans object if it's alive and in use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

de611ab6

closures: closure_get_not_zero(), closure_return_sync() · 06efa5f3

Kent Overstreet authored Jun 22, 2024

Provide new primitives for solving a lifetime issue with bcachefs
btree_trans objects.

closure_sync_return(): like closure_sync(), wait synchronously for any
outstanding gets. like closure_return, the closure is considered
"finished" and the ref left at 0.

closure_get_not_zero(): get a ref on a closure if it's alive, i.e. the
ref is not zero.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

06efa5f3

bcachefs: Make btree_deadlock_to_text() clearer · 18e92841

Kent Overstreet authored Jun 22, 2024

btree_deadlock_to_text() searches the list of btree transactions to find
a deadlock - when it finds one it's done; it's not like other *_read()
functions that's printing each object.

Factor out btree_deadlock_to_text() to make this clearer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

18e92841

bcachefs: fix seqmutex_relock() · f44cc269

Kent Overstreet authored Jun 22, 2024

We were grabbing the sequence number before unlock incremented it - fix
this by moving the increment to seqmutex_lock() (so the seqmutex_relock()
failure path skips the mutex_trylock()), and returning the sequence
number from unlock(), to make the API simpler and safer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f44cc269

22 Jun, 2024 1 commit

bcachefs: Fix freeing of error pointers · 9bd01500

Kent Overstreet authored Jun 22, 2024

This fixes incorrect/missign checking of strndup_user() returns.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9bd01500

21 Jun, 2024 7 commits

bcachefs: Move the ei_flags setting to after initialization · bd4da046

Youling Tang authored Jun 04, 2024

`inode->ei_flags` setting and cleaning should be done after initialization,
otherwise the operation is invalid.

Fixes: 9ca4853b ("bcachefs: Fix quota support for snapshots")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bd4da046

bcachefs: Fix a UAF after write_super() · 2fe79ce7

Kent Overstreet authored Jun 20, 2024

write_super() may reallocate the superblock buffer - but
bch_sb_field_ext was referencing it; don't use it after the write_super
call.

Reported-by: syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2fe79ce7

bcachefs: Use bch2_print_string_as_lines for long err · e6b3a655

Kent Overstreet authored Jun 20, 2024

printk strings get truncated to 1024 bytes; if we have a long error
message (journal debug info) we need to use a helper.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e6b3a655

bcachefs: Fix I_NEW warning in race path in bch2_inode_insert() · dd908648

Kent Overstreet authored Jun 20, 2024

discard_new_inode() is the correct interface for tearing down an indoe
that was fully created but not made visible to other threads, but it
expects I_NEW to be set, which we don't use.

Reported-by: https://github.com/koverstreet/bcachefs/issues/690
Fixes: bcachefs: Fix race path in bch2_inode_insert()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dd908648

bcachefs: Replace bare EEXIST with private error codes · 50479406
Kent Overstreet authored May 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
50479406

bcachefs: Fix missing alloc_data_type_set() · f648b6c1

Kent Overstreet authored Jun 20, 2024

Incorrect bucket state transition in the discard path; when incrementing
a bucket's generation number that had already been discarded, we were
forgetting to check if it should be need_gc_gens, not free.

This was caught by the .invalid checks in the transaction commit path,
causing us to go emergency read only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f648b6c1

closures: Change BUG_ON() to WARN_ON() · 339b84ab

Kent Overstreet authored Jun 20, 2024

If a BUG_ON() can be hit in the wild, it shouldn't be a BUG_ON()

For reference, this has popped up once in the CI, and we'll need more
info to debug it:

03240 ------------[ cut here ]------------
03240 kernel BUG at lib/closure.c:21!
03240 kernel BUG at lib/closure.c:21!
03240 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
03240 Modules linked in:
03240 CPU: 15 PID: 40534 Comm: kworker/u80:1 Not tainted 6.10.0-rc4-ktest-ga56da697 #25570
03240 Hardware name: linux,dummy-virt (DT)
03240 Workqueue: btree_update btree_interior_update_work
03240 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
03240 pc : closure_put+0x224/0x2a0
03240 lr : closure_put+0x24/0x2a0
03240 sp : ffff0000d12071c0
03240 x29: ffff0000d12071c0 x28: dfff800000000000 x27: ffff0000d1207360
03240 x26: 0000000000000040 x25: 0000000000000040 x24: 0000000000000040
03240 x23: ffff0000c1f20180 x22: 0000000000000000 x21: ffff0000c1f20168
03240 x20: 0000000040000000 x19: ffff0000c1f20140 x18: 0000000000000001
03240 x17: 0000000000003aa0 x16: 0000000000003ad0 x15: 1fffe0001c326974
03240 x14: 0000000000000a1e x13: 0000000000000000 x12: 1fffe000183e402d
03240 x11: ffff6000183e402d x10: dfff800000000000 x9 : ffff6000183e402e
03240 x8 : 0000000000000001 x7 : 00009fffe7c1bfd3 x6 : ffff0000c1f2016b
03240 x5 : ffff0000c1f20168 x4 : ffff6000183e402e x3 : ffff800081391954
03240 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 00000000a8000000
03240 Call trace:
03240  closure_put+0x224/0x2a0
03240  bch2_check_for_deadlock+0x910/0x1028
03240  bch2_six_check_for_deadlock+0x1c/0x30
03240  six_lock_slowpath.isra.0+0x29c/0xed0
03240  six_lock_ip_waiter+0xa8/0xf8
03240  __bch2_btree_node_lock_write+0x14c/0x298
03240  bch2_trans_lock_write+0x6d4/0xb10
03240  __bch2_trans_commit+0x135c/0x5520
03240  btree_interior_update_work+0x1248/0x1c10
03240  process_scheduled_works+0x53c/0xd90
03240  worker_thread+0x370/0x8c8
03240  kthread+0x258/0x2e8
03240  ret_from_fork+0x10/0x20
03240 Code: aa1303e0 d63f0020 a94363f7 17ffff8c (d4210000)
03240 ---[ end trace 0000000000000000 ]---
03240 Kernel panic - not syncing: Oops - BUG: Fatal exception
03240 SMP: stopping secondary CPUs
03241 SMP: failed to stop secondary CPUs 13,15
03241 Kernel Offset: disabled
03241 CPU features: 0x00,00000003,80000008,4240500b
03241 Memory Limit: none
03241 ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
03246 ========= FAILED TIMEOUT copygc_torture_no_checksum in 7200s
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

339b84ab

20 Jun, 2024 2 commits

bcachefs: fix alignment of VMA for memory mapped files on THP · c6cab97c

Youling Tang authored Jun 20, 2024

With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs
for read-only mmapped files, such as shared libraries. However, the
kernel makes no attempt to actually align those mappings on 2MB
boundaries, which makes it impossible to use those THPs most of the
time. This issue applies to general file mapping THP as well as
existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily
fixed by using thp_get_unmapped_area for the unmapped_area function
in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use.

Similar to commit b0c58223 ("btrfs: fix alignment of VMA for
memory mapped files on THP").
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c6cab97c

bcachefs: Fix safe errors by default · 33dfafa9

Kent Overstreet authored Jun 19, 2024

i.e. the start of automatic self healing:

If errors=continue or fix_safe, we now automatically fix simple errors
without user intervention.

New error action option: fix_safe

This replaces the existing errors=ro option, which gets a new slot, i.e.
existing errors=ro users now get errors=fix_safe.

This is currently only enabled for a limited set of errors - initially
just disk accounting; errors we would never not want to fix, and we
don't want to require user intervention (i.e. to make sure a bug report
gets filed).

Errors will still be counted in the superblock, so we (developers) will
still know they've been occuring if a bug report gets filed (as bug
reports typically include the errors superblock section).

Eventually we'll be enabling this for a much wider set of errors, after
we've done thorough error injection testing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

33dfafa9

19 Jun, 2024 7 commits

bcachefs: Fix bch2_trans_put() · a56da697

Kent Overstreet authored Jun 19, 2024

reference: https://github.com/koverstreet/bcachefs/issues/692

trans->ref is the reference used by the cycle detector, which walks
btree_trans objects of other threads to walk the graph of held locks and
issue wakeups when an abort is required.

We have to wait for the ref to go to 1 before freeing trans->paths or
clearing trans->locking_wait.task.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a56da697

bcachefs: set_worker_desc() for delete_dead_snapshots · 0a2a507d

Kent Overstreet authored Jun 19, 2024

this is long running - help users see what's going on
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0a2a507d

bcachefs: Fix bch2_sb_downgrade_update() · ddd118ab
Kent Overstreet authored Jun 17, 2024
```
Missing enum conversion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
ddd118ab

bcachefs: Handle cached data LRU wraparound · 2e9940d4

Kent Overstreet authored Jun 17, 2024

We only have 48 bits for the LRU time field, which is insufficient to
prevent wraparound.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2e9940d4

bcachefs: Guard against overflowing LRU_TIME_BITS · cff07e27

Kent Overstreet authored Jun 17, 2024

LRUs only have 48 bits for the time field (i.e. LRU order); thus we need
overflow checks and guards.

Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cff07e27

bcachefs: delete_dead_snapshots() doesn't need to go RW · 1ba44217

Kent Overstreet authored Jun 17, 2024

We've been moving away from going RW lazily; if we want to go RW we do
that in set_may_go_rw(), and if we didn't go RW we don't need to delete
dead snapshots.

Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1ba44217

bcachefs: Fix early init error path in journal code · dbf4d79b

Kent Overstreet authored Jun 17, 2024

We shouln't be running the journal shutdown sequence if we never fully
initialized the journal.

Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dbf4d79b