Commits · 3e48999816b1d1dba3ca40b1d7dbc324adb72fe2 · Kirill Smelkov / linux

14 Mar, 2024 40 commits

bcachefs: Prefer struct_size over open coded arithmetic · 3e489998

Erick Archer authored Mar 10, 2024

This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

As the "op" variable is a pointer to "struct promote_op" and this
structure ends in a flexible array:

struct promote_op {
	[...]
	struct bio_vec bi_inline_vecs[];
};

and the "t" variable is a pointer to "struct journal_seq_blacklist_table"
and this structure also ends in a flexible array:

struct journal_seq_blacklist_table {
	[...]
	struct journal_seq_blacklist_table_entry {
		u64		start;
		u64		end;
		bool		dirty;
	}			entries[];
};

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the argument "size + size * count" in the
kzalloc() functions.

This way, the code is more readable and safer.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e489998

bcachefs: Kill unused flags argument to btree_split() · 1fdb9685
Kent Overstreet authored Mar 08, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
1fdb9685

bcachefs: Check for writing superblocks with nonsense member seq fields · c4200645

Kent Overstreet authored Mar 08, 2024

We're seeing some unmountable filesystems due to split brain detection
going awry; it seems we somehow wrote out superblocks where we updated
the superblock seq without updating any member seq fields.

A given device's superblock should always have the main seq equal to
it's member seq field, so this is easy to check for.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c4200645

bcachefs: fix bch2_journal_buf_to_text() · 5e105fb8
Kent Overstreet authored Mar 08, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
5e105fb8

lib/generic-radix-tree.c: Make nodes more reasonably sized · 3a319a24

Kent Overstreet authored Mar 07, 2024

this code originally used the page allocator directly, but most code
shouldn't do that - PAGE_SIZE varies with architecture, and slab is
faster.

4k is also on the large side for typical usage, 512 bytes is a better
choice for typical usage that might be somewhat sparse.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3a319a24

bcachefs: copy_(to|from)_user_errcode() · d6454799

Kent Overstreet authored Mar 02, 2024

we've got some helpers that return errors sanely, move them to a more
common location for use in fs-ioctl.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d6454799

bcachefs: Split out bkey_types.h · ba81523e

Kent Overstreet authored Mar 01, 2024

We're going to need bkey_types.h in bcachefs_ioctl.h in a future patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ba81523e

bcachefs: fix lost journal buf wakeup due to improved pipelining · ada02c20

Brian Foster authored Mar 01, 2024

The journal_write_done() handler was reworked into a loop in commit
746a33c96b7a ("bcachefs: better journal pipelining"). As part of this,
the journal buffer wake was factored into a post-loop branch that
executes if at least one journal buffer has completed.

The journal buffer processing loop iterates on the journal buffer
pointer, however. This means that w refers to the last buffer processed
by the loop, which may or may not be done. This also means that if
multiple buffers are processed by the loop, only the last is awoken.
This lost wakeup behavior has lead to stalling problems in various CI
and fstests, such as generic/703.

Lift the wake into the loop so each done buffer sees a wake call as
it is processed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ada02c20

bcachefs: intercept mountoption value for bool type · 2a68d611

Hongbo Li authored Mar 01, 2024

For mount option with bool type, the value must be 0 or 1 (See
bch2_opt_parse). But this seems does not well intercepted cause
for other value(like 2...), it returns the unexpect return code
with error message printed.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2a68d611

bcachefs: avoid returning private error code in bch2_xattr_bcachefs_set · 7e23c174

Hongbo Li authored Mar 01, 2024

Avoid the private error code return to caller. The error code
should be transformed into genernal error code.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7e23c174

bcachefs: Buffered write path now can avoid the inode lock · 7e64c86c

Kent Overstreet authored Feb 28, 2024

Non append, non extending buffered writes can now avoid taking the inode
lock.

To ensure atomicity of writes w.r.t. other writes, we lock every folio
that we'll be writing to, and if this fails we fall back to taking the
inode lock.

Extensive comments are provided as to corner cases.

Link: https://lore.kernel.org/linux-fsdevel/Zdkxfspq3urnrM6I@bombadil.infradead.org/Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7e64c86c

fs: file_remove_privs_flags() · 66a67c86

Kent Overstreet authored Feb 28, 2024

Rename and export __file_remove_privs(); for a buffered write path that
doesn't take the inode lock we need to be able to check if the operation
needs to do work first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>

66a67c86

bcachefs: Fix bch2_journal_noflush_seq() · 7efa2875

Kent Overstreet authored Feb 28, 2024

Improved journal pipelining broke journal_noflush_seq(); it implicitly
assumed only the oldest outstanding journal buf could be in flight, but
that's no longer true.

Make this more straightforward by just setting buf->must_flush whenever
we know a journal buf is going to be flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7efa2875

bcachefs: fix the error code when mounting with incorrect options. · 79162e82

Hongbo Li authored Feb 19, 2024

When mount with incorrect options such as:
"mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/".
It rebacks the error "mount: /mnt/bcachefs: permission denied."
 cause bch2_parse_mount_opts returns -1 and bch2_mount throws
it up. This is unreasonable.

The real error message should be like this:
"mount: /mnt/bcachefs: wrong fs type, bad option, bad
superblock on /dev/loop1, missing codepage or helper program,
or other error."

Adding three private error codes for mounting error. Here are:
  - BCH_ERR_mount_option as the parent class for option error.
  - BCH_ERR_option_name represents the invalid option name.
  - BCH_ERR_option_value represents the invalid option value.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

79162e82

bcachefs: split out ignore_blacklisted, ignore_not_dirty · 2cce3752
Kent Overstreet authored Feb 25, 2024
```
prep work for replaying the journal backwards
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
2cce3752
bcachefs: improve move_gap() · 69426613
Kent Overstreet authored Feb 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
69426613
bcachefs: journal_keys now uses darray helpers · 95ffc7fb
Kent Overstreet authored Feb 24, 2024
```
nice bit of code cleanup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
95ffc7fb

bcachefs: Rename journal_keys.d -> journal_keys.data · 894d0622

Kent Overstreet authored Feb 24, 2024

This will let us use some darray helpers in the next patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

894d0622

bcachefs: jset_entry for loops declare loop iter · 0b5961b0
Kent Overstreet authored Feb 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
0b5961b0

bcachefs: Errcode tracepoint, documentation · eb386617

Kent Overstreet authored Feb 21, 2024

Add a tracepoint for downcasting private errors to standard errors, so
they can be recovered even when not logged; also, add some
documentation.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

eb386617

bcachefs: remove redundant assignment to variable ret · 150194cd

Colin Ian King authored Feb 21, 2024

Variable ret is being assigned a value that is never read, it is
being re-assigned a couple of statements later on. The assignment
is redundant and can be removed.

Cleans up clang scan build warning:
fs/bcachefs/super-io.c:806:2: warning: Value stored to 'ret' is
never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

150194cd

bcachefs: Silence gcc warnings about arm arch ABI drift · c7cad231

Calvin Owens authored Feb 18, 2024

32-bit arm builds emit a lot of spam like this:

fs/bcachefs/backpointers.c: In function ‘extent_matches_bp’:
fs/bcachefs/backpointers.c:15:13: note: parameter passing for argument of type ‘struct bch_backpointer’ changed in GCC 9.1

Apply the change from commit ebcc5928 ("arm64: Silence gcc warnings
about arch ABI drift") to fs/bcachefs/ to silence them.
Signed-off-by: Calvin Owens <jcalvinowens@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c7cad231

bcachefs: Add journal.blocked to journal_debug_to_text() · 90aa35c4
Kent Overstreet authored Feb 17, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
90aa35c4

bcachefs: Fix journal_buf bitfield accesses · d9290c99

Kent Overstreet authored Feb 17, 2024

All jounal_buf bitfield updates must happen under the journal lock -
perhaps we should just switch these to atomic bit flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d9290c99

bcachefs: Split out discard fastpath · a393f331

Kent Overstreet authored Feb 16, 2024

Buckets usually can't be discarded until the transaction that made them
empty has been committed in the journal.

Tracing has indicated that we're queuing the discard worker excessively,
only for it to skip over many buckets that are still waiting on a
journal commit, discarding only one or two buckets per iteration.

We want to switch to only queuing the discard worker after a journal
flush write, but there's an important optimization we need to preserve:
if a bucket becomes empty and it was never committed in the journal
while it was in use, we want to discard it and reuse it right away -
since overwriting it before the previous writes are flushed from the
device cache eans those writes only cost bus bandwidth.

So, this patch implements a fast path for buckets that can be discarded
right away. We need new locking between the two discard workers; the new
list of buckets being discarded provides that locking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a393f331

bcachefs: improve bch2_journal_buf_to_text() · 06d493fe
Kent Overstreet authored Feb 17, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
06d493fe

bcachefs: Drop redundant btree_path_downgrade()s · 29e11f96

Kent Overstreet authored Feb 16, 2024

If a path doesn't have any active references, we shouldn't downgrade it;
it'll either be reused, possibly with intent refs again, or dropped at
bch2_trans_begin() time.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

29e11f96

bcachefs: rebalance_status now shows correct units · ba78af9e

Daniel Hill authored Jan 19, 2024

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ba78af9e

bcachefs: more informative write path error message · 3235e04a
Kent Overstreet authored Feb 16, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
3235e04a

bcachefs: check_path() now only needs to walk up to subvolume root · 74406f66

Kent Overstreet authored Feb 15, 2024

Now that checking subvolume structure is a separate pass, the main
check_directory_connectivity() pass only needs to walk up to a given
inode's subvolume root.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

74406f66

bcachefs: bch2_check_subvolume_structure() · 663db5a5

Kent Overstreet authored Feb 15, 2024

Now that we've got bch_subvolume.fs_path_parent, it's easy to write
subvolume
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

663db5a5

bcachefs: omit alignment attribute on big endian struct bkey · b07ce726

Thomas Bertschinger authored Feb 15, 2024

This is needed for building Rust bindings on big endian architectures
like s390x. Currently this is only done in userspace, but it might
happen in-kernel in the future. When creating a Rust binding for struct
bkey, the "packed" attribute is needed to get a type with the correct
member offsets in the big endian case. However, rustc does not allow
types to have both a "packed" and "align" attribute. Thus, in order to
get a Rust type compatible with the C type, we must omit the "aligned"
attribute in C.

This does not affect the struct's size or member offsets, only its
toplevel alignment, which should be an acceptable impact.

The little endian version can have the "align" attribute because the
"packed" attr is redundant, and rust-bindgen will omit the "packed" attr
when an "align" attr is present and it can do so without changing a
type's layout
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b07ce726

bcachefs: bch2_trigger_alloc() handles state changes better · 6e9d0558

Kent Overstreet authored Feb 15, 2024

bch2_trigger_alloc() kicks off certain tasks on bucket state changes;
e.g. triggering the bucket discard worker and the invalidate worker.

We've observed the discard worker running too often - most runs it
doesn't do any work, according to the tracepoint - so clearly, we're
kicking it off too often.

This adds an explicit statechange() macro to make these checks more
precise.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6e9d0558

bcachefs: bch2_print_opts() · b63570f7

Kent Overstreet authored Feb 12, 2024

Make sure early error messages get redirected, for
kernel-fsck-from-userland.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b63570f7

bcachefs: Improve error messages in device remove path · 130d229f
Kent Overstreet authored Feb 12, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
130d229f

bcachefs: Use kvzalloc() when dynamically allocating btree paths · 5ca8ff15

Kent Overstreet authored Feb 12, 2024

THis silences a mm/page_alloc.c warning about allocating more than a
page with GFP_NOFAIL - and there's no reason for this to not have a
vmalloc fallback anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5ca8ff15

bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter() · 83bd5985
Kent Overstreet authored Feb 09, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
83bd5985

bcachefs: Save key_cache_path in peek_slot() · 3254c1b0

Kent Overstreet authored Feb 09, 2024

When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3254c1b0

bcachefs: Pin btree cache in ram for random access in fsck · 91dcad18

Kent Overstreet authored Jan 23, 2024

Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.

This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.

This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.

This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

91dcad18

bcachefs: Check for subvolume children when deleting subvolumes · 835cd3e1

Kent Overstreet authored Feb 09, 2024

Recursively destroying subvolumes isn't allowed yet.

Fixes: https://github.com/koverstreet/bcachefs/issues/634Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

835cd3e1