Commit 16dbfae8 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs updates from Kent Overstreet:

 - More safety fixes, primarily found by syzbot

 - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is
   primarily for testing fsck/recovery in dry run mode, so it shouldn't
   change anything besides disabling writes and holding dirty metadata
   in memory.

   The idea here was to reduce the amount of activity if we can't write
   anything out, so that bringing up a filesystem in "super ro" mode
   would be more lilkely to work for data recovery - but norecovery is
   the correct option for this.

 - btree_trans->locked; we now track whether a btree_trans has any btree
   nodes locked, and this is used for improved assertions related to
   trans_unlock() and trans_relock(). We'll also be using it for
   improving how we work with lockdep in the future: we don't want
   lockdep to be tracking individual btree node locks because we take
   too many for lockdep to track, and it's not necessary since we have a
   cycle detector.

 - Trigger improvements that are prep work for online fsck

 - BTREE_TRIGGER_check_repair; this regularizes how we do some repair
   work for extents that goes with running triggers in fsck, and fixes
   some subtle issues with transaction restarts there.

 - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot
   equivalence classes are for when snapshot deletion leaves behind
   redundant snapshot nodes, but snapshot deletion now cleans this up
   right away, so the abstraction doesn't need to leak.

 - Improvements to how we resume writing to the journal in recovery. The
   code for picking the new place to write when reading the journal is
   greatly simplified and we also store the position in the superblock
   for when we don't read the journal; this means that we preserve more
   of the journal for list_journal debugging.

 - Improvements to sysfs btree_cache and btree_node_cache, for debugging
   memory reclaim.

 - We now detect when we've blocked for 10 seconds on the allocator in
   the write path and dump some useful info.

 - Safety fixes for devices references: this is a big series that
   changes almost all device lookups to properly check if the device
   exists and take a reference to it.

   Previously we assumed that if a bkey exists that references a device
   then the device must exist, and this was enforced in .invalid
   methods, but this was incorrect because it meant device removal
   relied on accounting being correct to not leave keys pointing to
   invalid devices, and that's not something we can assume.

   Getting the "pointer to invalid device" checks out of our .invalid()
   methods fixes some long standing device removal bugs; the only
   outstanding bug with device removal now is a race between the discard
   path and deleting alloc info, which should be easily fixed.

 - The allocator now prefers not to expand the new
   member_info.btree_allocated bitmap, meaning if repair ever requires
   scanning for btree nodes (because of a corrupt interior nodes) we
   won't have to scan the whole device(s).

 - New coding style document, which among other things talks about the
   correct usage of assertions

* tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits)
  bcachefs: add no_invalid_checks flag
  bcachefs: add counters for failed shrinker reclaim
  bcachefs: Fix sb_field_downgrade validation
  bcachefs: Plumb bch_validate_flags to sb_field_ops.validate()
  bcachefs: s/bkey_invalid_flags/bch_validate_flags
  bcachefs: fsync() should not return -EROFS
  bcachefs: Invalid devices are now checked for by fsck, not .invalid methods
  bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs()
  bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio()
  bcachefs: bch2_dev_get_ioref() checks for device not present
  bcachefs: bch2_dev_get_ioref2(); io_read.c
  bcachefs: bch2_dev_get_ioref2(); debug.c
  bcachefs: bch2_dev_get_ioref2(); journal_io.c
  bcachefs: bch2_dev_get_ioref2(); io_write.c
  bcachefs: bch2_dev_get_ioref2(); btree_io.c
  bcachefs: bch2_dev_get_ioref2(); backpointers.c
  bcachefs: bch2_dev_get_ioref2(); alloc_background.c
  bcachefs: for_each_bset() declares loop iter
  bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h
  bcachefs: Improve sysfs internal/btree_cache
  ...
parents a90f1cd1 07f9a27f
.. SPDX-License-Identifier: GPL-2.0
bcachefs coding style
=====================
Good development is like gardening, and codebases are our gardens. Tend to them
every day; look for little things that are out of place or in need of tidying.
A little weeding here and there goes a long way; don't wait until things have
spiraled out of control.
Things don't always have to be perfect - nitpicking often does more harm than
good. But appreciate beauty when you see it - and let people know.
The code that you are afraid to touch is the code most in need of refactoring.
A little organizing here and there goes a long way.
Put real thought into how you organize things.
Good code is readable code, where the structure is simple and leaves nowhere
for bugs to hide.
Assertions are one of our most important tools for writing reliable code. If in
the course of writing a patchset you encounter a condition that shouldn't
happen (and will have unpredictable or undefined behaviour if it does), or
you're not sure if it can happen and not sure how to handle it yet - make it a
BUG_ON(). Don't leave undefined or unspecified behavior lurking in the codebase.
By the time you finish the patchset, you should understand better which
assertions need to be handled and turned into checks with error paths, and
which should be logically impossible. Leave the BUG_ON()s in for the ones which
are logically impossible. (Or, make them debug mode assertions if they're
expensive - but don't turn everything into a debug mode assertion, so that
we're not stuck debugging undefined behaviour should it turn out that you were
wrong).
Assertions are documentation that can't go out of date. Good assertions are
wonderful.
Good assertions drastically and dramatically reduce the amount of testing
required to shake out bugs.
Good assertions are based on state, not logic. To write good assertions, you
have to think about what the invariants on your state are.
Good invariants and assertions will hold everywhere in your codebase. This
means that you can run them in only a few places in the checked in version, but
should you need to debug something that caused the assertion to fail, you can
quickly shotgun them everywhere to find the codepath that broke the invariant.
A good assertion checks something that the compiler could check for us, and
elide - if we were working in a language with embedded correctness proofs that
the compiler could check. This is something that exists today, but it'll likely
still be a few decades before it comes to systems programming languages. But we
can still incorporate that kind of thinking into our code and document the
invariants with runtime checks - much like the way people working in
dynamically typed languages may add type annotations, gradually making their
code statically typed.
Looking for ways to make your assertions simpler - and higher level - will
often nudge you towards making the entire system simpler and more robust.
Good code is code where you can poke around and see what it's doing -
introspection. We can't debug anything if we can't see what's going on.
Whenever we're debugging, and the solution isn't immediately obvious, if the
issue is that we don't know where the issue is because we can't see what's
going on - fix that first.
We have the tools to make anything visible at runtime, efficiently - RCU and
percpu data structures among them. Don't let things stay hidden.
The most important tool for introspection is the humble pretty printer - in
bcachefs, this means `*_to_text()` functions, which output to printbufs.
Pretty printers are wonderful, because they compose and you can use them
everywhere. Having functions to print whatever object you're working with will
make your error messages much easier to write (therefore they will actually
exist) and much more informative. And they can be used from sysfs/debugfs, as
well as tracepoints.
Runtime info and debugging tools should come with clear descriptions and
labels, and good structure - we don't want files with a list of bare integers,
like in procfs. Part of the job of the debugging tools is to educate users and
new developers as to how the system works.
Error messages should, whenever possible, tell you everything you need to debug
the issue. It's worth putting effort into them.
Tracepoints shouldn't be the first thing you reach for. They're an important
tool, but always look for more immediate ways to make things visible. When we
have to rely on tracing, we have to know which tracepoints we're looking for,
and then we have to run the troublesome workload, and then we have to sift
through logs. This is a lot of steps to go through when a user is hitting
something, and if it's intermittent it may not even be possible.
The humble counter is an incredibly useful tool. They're cheap and simple to
use, and many complicated internal operations with lots of things that can
behave weirdly (anything involving memory reclaim, for example) become
shockingly easy to debug once you have counters on every distinct codepath.
Persistent counters are even better.
When debugging, try to get the most out of every bug you come across; don't
rush to fix the initial issue. Look for things that will make related bugs
easier the next time around - introspection, new assertions, better error
messages, new debug tools, and do those first. Look for ways to make the system
better behaved; often one bug will uncover several other bugs through
downstream effects.
Fix all that first, and then the original bug last - even if that means keeping
a user waiting. They'll thank you in the long run, and when they understand
what you're doing you'll be amazed at how patient they're happy to be. Users
like to help - otherwise they wouldn't be reporting the bug in the first place.
Talk to your users. Don't isolate yourself.
Users notice all sorts of interesting things, and by just talking to them and
interacting with them you can benefit from their experience.
Spend time doing support and helpdesk stuff. Don't just write code - code isn't
finished until it's being used trouble free.
This will also motivate you to make your debugging tools as good as possible,
and perhaps even your documentation, too. Like anything else in life, the more
time you spend at it the better you'll get, and you the developer are the
person most able to improve the tools to make debugging quick and easy.
Be wary of how you take on and commit to big projects. Don't let development
become product-manager focused. Often time an idea is a good one but needs to
wait for its proper time - but you won't know if it's the proper time for an
idea until you start writing code.
Expect to throw a lot of things away, or leave them half finished for later.
Nobody writes all perfect code that all gets shipped, and you'll be much more
productive in the long run if you notice this early and shift to something
else. The experience gained and lessons learned will be valuable for all the
other work you do.
But don't be afraid to tackle projects that require significant rework of
existing code. Sometimes these can be the best projects, because they can lead
us to make existing code more general, more flexible, more multipurpose and
perhaps more robust. Just don't hesitate to abandon the idea if it looks like
it's going to make a mess of things.
Complicated features can often be done as a series of refactorings, with the
final change that actually implements the feature as a quite small patch at the
end. It's wonderful when this happens, especially when those refactorings are
things that improve the codebase in their own right. When that happens there's
much less risk of wasted effort if the feature you were going for doesn't work
out.
Always strive to work incrementally. Always strive to turn the big projects
into little bite sized projects that can prove their own merits.
Instead of always tackling those big projects, look for little things that
will be useful, and make the big projects easier.
The question of what's likely to be useful is where junior developers most
often go astray - doing something because it seems like it'll be useful often
leads to overengineering. Knowing what's useful comes from many years of
experience, or talking with people who have that experience - or from simply
reading lots of code and looking for common patterns and issues. Don't be
afraid to throw things away and do something simpler.
Talk about your ideas with your fellow developers; often times the best things
come from relaxed conversations where people aren't afraid to say "what if?".
Don't neglect your tools.
The most important tools (besides the compiler and our text editor) are the
tools we use for testing. The shortest possible edit/test/debug cycle is
essential for working productively. We learn, gain experience, and discover the
errors in our thinking by running our code and seeing what happens. If your
time is being wasted because your tools are bad or too slow - don't accept it,
fix it.
Put effort into your documentation, commmit messages, and code comments - but
don't go overboard. A good commit message is wonderful - but if the information
was important enough to go in a commit message, ask yourself if it would be
even better as a code comment.
A good code comment is wonderful, but even better is the comment that didn't
need to exist because the code was so straightforward as to be obvious;
organized into small clean and tidy modules, with clear and descriptive names
for functions and variable, where every line of code has a clear purpose.
......@@ -8,4 +8,5 @@ bcachefs Documentation
:maxdepth: 2
:numbered:
CodingStyle
errorcodes
......@@ -282,18 +282,12 @@ struct posix_acl *bch2_get_acl(struct mnt_idmap *idmap,
struct btree_trans *trans = bch2_trans_get(c);
struct btree_iter iter = { NULL };
struct posix_acl *acl = NULL;
struct bkey_s_c k;
int ret;
retry:
bch2_trans_begin(trans);
ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
&hash, inode_inum(inode), &search, 0);
if (ret)
goto err;
k = bch2_btree_iter_peek_slot(&iter);
ret = bkey_err(k);
struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
&hash, inode_inum(inode), &search, 0);
int ret = bkey_err(k);
if (ret)
goto err;
......@@ -366,7 +360,7 @@ int bch2_set_acl(struct mnt_idmap *idmap,
ret = bch2_subvol_is_ro_trans(trans, inode->ei_subvol) ?:
bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode),
BTREE_ITER_INTENT);
BTREE_ITER_intent);
if (ret)
goto btree_err;
......@@ -414,39 +408,30 @@ int bch2_acl_chmod(struct btree_trans *trans, subvol_inum inum,
struct bch_hash_info hash_info = bch2_hash_info_init(trans->c, inode);
struct xattr_search_key search = X_SEARCH(KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS, "", 0);
struct btree_iter iter;
struct bkey_s_c_xattr xattr;
struct bkey_i_xattr *new;
struct posix_acl *acl = NULL;
struct bkey_s_c k;
int ret;
ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
&hash_info, inum, &search, BTREE_ITER_INTENT);
struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
&hash_info, inum, &search, BTREE_ITER_intent);
int ret = bkey_err(k);
if (ret)
return bch2_err_matches(ret, ENOENT) ? 0 : ret;
k = bch2_btree_iter_peek_slot(&iter);
ret = bkey_err(k);
if (ret)
goto err;
xattr = bkey_s_c_to_xattr(k);
struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k);
acl = bch2_acl_from_disk(trans, xattr_val(xattr.v),
le16_to_cpu(xattr.v->x_val_len));
ret = PTR_ERR_OR_ZERO(acl);
if (IS_ERR_OR_NULL(acl))
if (ret)
goto err;
ret = allocate_dropping_locks_errcode(trans,
__posix_acl_chmod(&acl, _gfp, mode));
ret = allocate_dropping_locks_errcode(trans, __posix_acl_chmod(&acl, _gfp, mode));
if (ret)
goto err;
new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS);
if (IS_ERR(new)) {
ret = PTR_ERR(new);
struct bkey_i_xattr *new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS);
ret = PTR_ERR_OR_ZERO(new);
if (ret)
goto err;
}
new->k.p = iter.pos;
ret = bch2_trans_update(trans, &iter, &new->k_i, 0);
......
This diff is collapsed.
......@@ -8,21 +8,18 @@
#include "debug.h"
#include "super.h"
enum bkey_invalid_flags;
enum bch_validate_flags;
/* How out of date a pointer gen is allowed to be: */
#define BUCKET_GC_GEN_MAX 96U
static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos)
{
struct bch_dev *ca;
if (!bch2_dev_exists2(c, pos.inode))
return false;
ca = bch_dev_bkey_exists(c, pos.inode);
return pos.offset >= ca->mi.first_bucket &&
pos.offset < ca->mi.nbuckets;
rcu_read_lock();
struct bch_dev *ca = bch2_dev_rcu(c, pos.inode);
bool ret = ca && bucket_valid(ca, pos.offset);
rcu_read_unlock();
return ret;
}
static inline u64 bucket_to_u64(struct bpos bucket)
......@@ -40,38 +37,50 @@ static inline u8 alloc_gc_gen(struct bch_alloc_v4 a)
return a.gen - a.oldest_gen;
}
static inline enum bch_data_type __alloc_data_type(u32 dirty_sectors,
u32 cached_sectors,
u32 stripe,
struct bch_alloc_v4 a,
enum bch_data_type data_type)
static inline void alloc_to_bucket(struct bucket *dst, struct bch_alloc_v4 src)
{
if (stripe)
return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe;
if (dirty_sectors)
return data_type;
if (cached_sectors)
return BCH_DATA_cached;
if (BCH_ALLOC_V4_NEED_DISCARD(&a))
return BCH_DATA_need_discard;
if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX)
return BCH_DATA_need_gc_gens;
return BCH_DATA_free;
dst->gen = src.gen;
dst->data_type = src.data_type;
dst->dirty_sectors = src.dirty_sectors;
dst->cached_sectors = src.cached_sectors;
dst->stripe = src.stripe;
}
static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a,
enum bch_data_type data_type)
static inline void __bucket_m_to_alloc(struct bch_alloc_v4 *dst, struct bucket src)
{
return __alloc_data_type(a.dirty_sectors, a.cached_sectors,
a.stripe, a, data_type);
dst->gen = src.gen;
dst->data_type = src.data_type;
dst->dirty_sectors = src.dirty_sectors;
dst->cached_sectors = src.cached_sectors;
dst->stripe = src.stripe;
}
static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b)
{
struct bch_alloc_v4 ret = {};
__bucket_m_to_alloc(&ret, b);
return ret;
}
static inline enum bch_data_type bucket_data_type(enum bch_data_type data_type)
{
return data_type == BCH_DATA_stripe ? BCH_DATA_user : data_type;
switch (data_type) {
case BCH_DATA_cached:
case BCH_DATA_stripe:
return BCH_DATA_user;
default:
return data_type;
}
}
static inline bool bucket_data_type_mismatch(enum bch_data_type bucket,
enum bch_data_type ptr)
{
return !data_type_is_empty(bucket) &&
bucket_data_type(bucket) != bucket_data_type(ptr);
}
static inline unsigned bch2_bucket_sectors(struct bch_alloc_v4 a)
static inline unsigned bch2_bucket_sectors_total(struct bch_alloc_v4 a)
{
return a.dirty_sectors + a.cached_sectors;
}
......@@ -89,6 +98,27 @@ static inline unsigned bch2_bucket_sectors_fragmented(struct bch_dev *ca,
return d ? max(0, ca->mi.bucket_size - d) : 0;
}
static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a,
enum bch_data_type data_type)
{
if (a.stripe)
return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe;
if (a.dirty_sectors)
return data_type;
if (a.cached_sectors)
return BCH_DATA_cached;
if (BCH_ALLOC_V4_NEED_DISCARD(&a))
return BCH_DATA_need_discard;
if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX)
return BCH_DATA_need_gc_gens;
return BCH_DATA_free;
}
static inline void alloc_data_type_set(struct bch_alloc_v4 *a, enum bch_data_type data_type)
{
a->data_type = alloc_data_type(*a, data_type);
}
static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a)
{
return a.data_type == BCH_DATA_cached ? a.io_time[READ] : 0;
......@@ -147,7 +177,9 @@ static inline void set_alloc_v4_u64s(struct bkey_i_alloc_v4 *a)
}
struct bkey_i_alloc_v4 *
bch2_trans_start_alloc_update(struct btree_trans *, struct btree_iter *, struct bpos);
bch2_trans_start_alloc_update_noupdate(struct btree_trans *, struct btree_iter *, struct bpos);
struct bkey_i_alloc_v4 *
bch2_trans_start_alloc_update(struct btree_trans *, struct bpos);
void __bch2_alloc_to_v4(struct bkey_s_c, struct bch_alloc_v4 *);
......@@ -173,13 +205,13 @@ struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut(struct btree_trans *, struct bkey_s
int bch2_bucket_io_time_reset(struct btree_trans *, unsigned, size_t, int);
int bch2_alloc_v1_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int bch2_alloc_v2_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int bch2_alloc_v3_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int bch2_alloc_v4_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
void bch2_alloc_v4_swab(struct bkey_s);
void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
......@@ -213,7 +245,7 @@ void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
})
int bch2_bucket_gens_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
void bch2_bucket_gens_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
#define bch2_bkey_ops_bucket_gens ((struct bkey_ops) { \
......@@ -233,7 +265,8 @@ static inline bool bkey_is_alloc(const struct bkey *k)
int bch2_alloc_read(struct bch_fs *);
int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s, unsigned);
struct bkey_s_c, struct bkey_s,
enum btree_iter_update_trigger_flags);
int bch2_check_alloc_info(struct bch_fs *);
int bch2_check_alloc_to_lru_refs(struct bch_fs *);
void bch2_do_discards(struct bch_fs *);
......
This diff is collapsed.
......@@ -30,8 +30,14 @@ void bch2_dev_stripe_increment(struct bch_dev *, struct dev_stripe_state *);
long bch2_bucket_alloc_new_fs(struct bch_dev *);
static inline struct bch_dev *ob_dev(struct bch_fs *c, struct open_bucket *ob)
{
return bch2_dev_have_ref(c, ob->dev);
}
struct open_bucket *bch2_bucket_alloc(struct bch_fs *, struct bch_dev *,
enum bch_watermark, struct closure *);
enum bch_watermark, enum bch_data_type,
struct closure *);
static inline void ob_push(struct bch_fs *c, struct open_buckets *obs,
struct open_bucket *ob)
......@@ -184,7 +190,7 @@ bch2_alloc_sectors_append_ptrs_inlined(struct bch_fs *c, struct write_point *wp,
wp->sectors_allocated += sectors;
open_bucket_for_each(c, &wp->ptrs, ob, i) {
struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev);
struct bch_dev *ca = ob_dev(c, ob);
struct bch_extent_ptr ptr = bch2_ob_ptr(c, ob);
ptr.cached = cached ||
......@@ -221,4 +227,9 @@ void bch2_open_buckets_partial_to_text(struct printbuf *, struct bch_fs *);
void bch2_write_points_to_text(struct printbuf *, struct bch_fs *);
void bch2_fs_alloc_debug_to_text(struct printbuf *, struct bch_fs *);
void bch2_dev_alloc_debug_to_text(struct printbuf *, struct bch_dev *);
void bch2_print_allocator_stuck(struct bch_fs *);
#endif /* _BCACHEFS_ALLOC_FOREGROUND_H */
......@@ -9,11 +9,18 @@
#include "fifo.h"
struct bucket_alloc_state {
enum {
BTREE_BITMAP_NO,
BTREE_BITMAP_YES,
BTREE_BITMAP_ANY,
} btree_bitmap;
u64 buckets_seen;
u64 skipped_open;
u64 skipped_need_journal_commit;
u64 skipped_nocow;
u64 skipped_nouse;
u64 skipped_mi_btree_bitmap;
};
#define BCH_WATERMARKS() \
......
This diff is collapsed.
......@@ -6,6 +6,7 @@
#include "btree_iter.h"
#include "btree_update.h"
#include "buckets.h"
#include "error.h"
#include "super.h"
static inline u64 swab40(u64 x)
......@@ -18,7 +19,7 @@ static inline u64 swab40(u64 x)
}
int bch2_backpointer_invalid(struct bch_fs *, struct bkey_s_c k,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
void bch2_backpointer_to_text(struct printbuf *, const struct bch_backpointer *);
void bch2_backpointer_k_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
void bch2_backpointer_swab(struct bkey_s);
......@@ -36,15 +37,29 @@ void bch2_backpointer_swab(struct bkey_s);
* Convert from pos in backpointer btree to pos of corresponding bucket in alloc
* btree:
*/
static inline struct bpos bp_pos_to_bucket(const struct bch_fs *c,
struct bpos bp_pos)
static inline struct bpos bp_pos_to_bucket(const struct bch_dev *ca, struct bpos bp_pos)
{
struct bch_dev *ca = bch_dev_bkey_exists(c, bp_pos.inode);
u64 bucket_sector = bp_pos.offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT;
return POS(bp_pos.inode, sector_to_bucket(ca, bucket_sector));
}
static inline bool bp_pos_to_bucket_nodev_noerror(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket)
{
rcu_read_lock();
struct bch_dev *ca = bch2_dev_rcu(c, bp_pos.inode);
if (ca)
*bucket = bp_pos_to_bucket(ca, bp_pos);
rcu_read_unlock();
return ca != NULL;
}
static inline bool bp_pos_to_bucket_nodev(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket)
{
return !bch2_fs_inconsistent_on(!bp_pos_to_bucket_nodev_noerror(c, bp_pos, bucket),
c, "backpointer for missing device %llu", bp_pos.inode);
}
static inline struct bpos bucket_pos_to_bp_noerror(const struct bch_dev *ca,
struct bpos bucket,
u64 bucket_offset)
......@@ -57,32 +72,32 @@ static inline struct bpos bucket_pos_to_bp_noerror(const struct bch_dev *ca,
/*
* Convert from pos in alloc btree + bucket offset to pos in backpointer btree:
*/
static inline struct bpos bucket_pos_to_bp(const struct bch_fs *c,
static inline struct bpos bucket_pos_to_bp(const struct bch_dev *ca,
struct bpos bucket,
u64 bucket_offset)
{
struct bch_dev *ca = bch_dev_bkey_exists(c, bucket.inode);
struct bpos ret = bucket_pos_to_bp_noerror(ca, bucket, bucket_offset);
EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(c, ret)));
EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(ca, ret)));
return ret;
}
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bpos bucket,
struct bch_backpointer, struct bkey_s_c, bool);
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bch_dev *,
struct bpos bucket, struct bch_backpointer, struct bkey_s_c, bool);
static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans,
struct bch_dev *ca,
struct bpos bucket,
struct bch_backpointer bp,
struct bkey_s_c orig_k,
bool insert)
{
if (unlikely(bch2_backpointers_no_use_write_buffer))
return bch2_bucket_backpointer_mod_nowritebuffer(trans, bucket, bp, orig_k, insert);
return bch2_bucket_backpointer_mod_nowritebuffer(trans, ca, bucket, bp, orig_k, insert);
struct bkey_i_backpointer bp_k;
bkey_backpointer_init(&bp_k.k_i);
bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
bp_k.k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset);
bp_k.v = bp;
if (!insert) {
......@@ -120,7 +135,7 @@ static inline enum bch_data_type bch2_bkey_ptr_data_type(struct bkey_s_c k,
}
}
static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
static inline void bch2_extent_ptr_to_bp(struct bch_fs *c, struct bch_dev *ca,
enum btree_id btree_id, unsigned level,
struct bkey_s_c k, struct extent_ptr_decoded p,
const union bch_extent_entry *entry,
......@@ -130,7 +145,7 @@ static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
s64 sectors = level ? btree_sectors(c) : k.k->size;
u32 bucket_offset;
*bucket_pos = PTR_BUCKET_POS_OFFSET(c, &p.ptr, &bucket_offset);
*bucket_pos = PTR_BUCKET_POS_OFFSET(ca, &p.ptr, &bucket_offset);
*bp = (struct bch_backpointer) {
.btree_id = btree_id,
.level = level,
......@@ -142,7 +157,7 @@ static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
};
}
int bch2_get_next_backpointer(struct btree_trans *, struct bpos, int,
int bch2_get_next_backpointer(struct btree_trans *, struct bch_dev *ca, struct bpos, int,
struct bpos *, struct bch_backpointer *, unsigned);
struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *, struct btree_iter *,
struct bpos, struct bch_backpointer,
......
......@@ -359,6 +359,8 @@ do { \
#define BCH_DEBUG_PARAMS_ALWAYS() \
BCH_DEBUG_PARAM(key_merging_disabled, \
"Disables merging of extents") \
BCH_DEBUG_PARAM(btree_node_merging_disabled, \
"Disables merging of btree nodes") \
BCH_DEBUG_PARAM(btree_gc_always_rewrite, \
"Causes mark and sweep to compact and rewrite every " \
"btree node it traverses") \
......@@ -468,6 +470,7 @@ enum bch_time_stats {
#include "quota_types.h"
#include "rebalance_types.h"
#include "replicas_types.h"
#include "sb-members_types.h"
#include "subvolume_types.h"
#include "super_types.h"
#include "thread_with_file_types.h"
......@@ -516,8 +519,8 @@ enum gc_phase {
struct gc_pos {
enum gc_phase phase;
u16 level;
struct bpos pos;
unsigned level;
};
struct reflink_gc {
......@@ -534,7 +537,13 @@ struct io_count {
struct bch_dev {
struct kobject kobj;
#ifdef CONFIG_BCACHEFS_DEBUG
atomic_long_t ref;
bool dying;
unsigned long last_put;
#else
struct percpu_ref ref;
#endif
struct completion ref_completion;
struct percpu_ref io_ref;
struct completion io_ref_completion;
......@@ -560,14 +569,11 @@ struct bch_dev {
struct bch_devs_mask self;
/* biosets used in cloned bios for writing multiple replicas */
struct bio_set replica_set;
/*
* Buckets:
* Per-bucket arrays are protected by c->mark_lock, bucket_lock and
* gc_lock, for device resize - holding any is sufficient for access:
* Or rcu_read_lock(), but only for ptr_stale():
* Or rcu_read_lock(), but only for dev_ptr_stale():
*/
struct bucket_array __rcu *buckets_gc;
struct bucket_gens __rcu *bucket_gens;
......@@ -581,7 +587,7 @@ struct bch_dev {
/* Allocator: */
u64 new_fs_bucket_idx;
u64 alloc_cursor;
u64 alloc_cursor[3];
unsigned nr_open_buckets;
unsigned nr_btree_reserve;
......@@ -627,12 +633,12 @@ struct bch_dev {
x(clean_shutdown) \
x(fsck_running) \
x(initial_gc_unfixed) \
x(need_another_gc) \
x(need_delete_dead_snapshots) \
x(error) \
x(topology_error) \
x(errors_fixed) \
x(errors_not_fixed)
x(errors_not_fixed) \
x(no_invalid_checks)
enum bch_fs_flags {
#define x(n) BCH_FS_##n,
......@@ -715,6 +721,7 @@ struct btree_trans_buf {
x(discard_fast) \
x(invalidate) \
x(delete_dead_snapshots) \
x(gc_gens) \
x(snapshot_delete_pagecache) \
x(sysfs) \
x(btree_write_buffer)
......@@ -926,7 +933,6 @@ struct bch_fs {
/* JOURNAL SEQ BLACKLIST */
struct journal_seq_blacklist_table *
journal_seq_blacklist_table;
struct work_struct journal_seq_blacklist_gc_work;
/* ALLOCATOR */
spinlock_t freelist_lock;
......@@ -957,8 +963,7 @@ struct bch_fs {
struct work_struct discard_fast_work;
/* GARBAGE COLLECTION */
struct task_struct *gc_thread;
atomic_t kick_gc;
struct work_struct gc_gens_work;
unsigned long gc_count;
enum btree_id gc_gens_btree;
......@@ -988,6 +993,7 @@ struct bch_fs {
struct bio_set bio_read;
struct bio_set bio_read_split;
struct bio_set bio_write;
struct bio_set replica_set;
struct mutex bio_bounce_pages_lock;
mempool_t bio_bounce_pages;
struct bucket_nocow_lock_table
......@@ -1115,7 +1121,6 @@ struct bch_fs {
u64 counters_on_mount[BCH_COUNTER_NR];
u64 __percpu *counters;
unsigned btree_gc_periodic:1;
unsigned copy_gc_enabled:1;
bool promote_whole_extents;
......@@ -1250,11 +1255,6 @@ static inline s64 bch2_current_time(const struct bch_fs *c)
return timespec_to_bch2_time(c, now);
}
static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
{
return dev < c->sb.nr_devices && c->devs[dev];
}
static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
{
struct stdio_redirect *stdio = c->stdio;
......
......@@ -76,6 +76,7 @@
#include <asm/byteorder.h>
#include <linux/kernel.h>
#include <linux/uuid.h>
#include <uapi/linux/magic.h>
#include "vstructs.h"
#ifdef __KERNEL__
......@@ -589,6 +590,13 @@ struct bch_member {
__le64 errors_reset_time;
__le64 seq;
__le64 btree_allocated_bitmap;
/*
* On recovery from a clean shutdown we don't normally read the journal,
* but we still want to resume writing from where we left off so we
* don't overwrite more than is necessary, for list journal debugging:
*/
__le32 last_journal_bucket;
__le32 last_journal_bucket_offset;
};
/*
......@@ -1283,7 +1291,7 @@ enum bch_compression_opts {
UUID_INIT(0xc68573f6, 0x66ce, 0x90a9, \
0xd9, 0x6a, 0x60, 0xcf, 0x80, 0x3d, 0xf7, 0xef)
#define BCACHEFS_STATFS_MAGIC 0xca451a4e
#define BCACHEFS_STATFS_MAGIC BCACHEFS_SUPER_MAGIC
#define JSET_MAGIC __cpu_to_le64(0x245235c1a3625032ULL)
#define BSET_MAGIC __cpu_to_le64(0x90135c78b99e07f5ULL)
......
......@@ -640,7 +640,7 @@ struct bkey_format bch2_bkey_format_done(struct bkey_format_state *s)
int bch2_bkey_format_invalid(struct bch_fs *c,
struct bkey_format *f,
enum bkey_invalid_flags flags,
enum bch_validate_flags flags,
struct printbuf *err)
{
unsigned i, bits = KEY_PACKED_BITS_START;
......@@ -656,20 +656,17 @@ int bch2_bkey_format_invalid(struct bch_fs *c,
* unpacked format:
*/
for (i = 0; i < f->nr_fields; i++) {
if (!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) {
if ((!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) &&
bch2_bkey_format_field_overflows(f, i)) {
unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i];
u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1));
u64 packed_max = f->bits_per_field[i]
? ~((~0ULL << 1) << (f->bits_per_field[i] - 1))
: 0;
u64 field_offset = le64_to_cpu(f->field_offset[i]);
if (packed_max + field_offset < packed_max ||
packed_max + field_offset > unpacked_max) {
prt_printf(err, "field %u too large: %llu + %llu > %llu",
i, packed_max, field_offset, unpacked_max);
return -BCH_ERR_invalid;
}
prt_printf(err, "field %u too large: %llu + %llu > %llu",
i, packed_max, le64_to_cpu(f->field_offset[i]), unpacked_max);
return -BCH_ERR_invalid;
}
bits += f->bits_per_field[i];
......
......@@ -9,10 +9,10 @@
#include "util.h"
#include "vstructs.h"
enum bkey_invalid_flags {
BKEY_INVALID_WRITE = (1U << 0),
BKEY_INVALID_COMMIT = (1U << 1),
BKEY_INVALID_JOURNAL = (1U << 2),
enum bch_validate_flags {
BCH_VALIDATE_write = (1U << 0),
BCH_VALIDATE_commit = (1U << 1),
BCH_VALIDATE_journal = (1U << 2),
};
#if 0
......@@ -574,8 +574,31 @@ static inline void bch2_bkey_format_add_key(struct bkey_format_state *s, const s
void bch2_bkey_format_add_pos(struct bkey_format_state *, struct bpos);
struct bkey_format bch2_bkey_format_done(struct bkey_format_state *);
static inline bool bch2_bkey_format_field_overflows(struct bkey_format *f, unsigned i)
{
unsigned f_bits = f->bits_per_field[i];
unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i];
u64 unpacked_mask = ~((~0ULL << 1) << (unpacked_bits - 1));
u64 field_offset = le64_to_cpu(f->field_offset[i]);
if (f_bits > unpacked_bits)
return true;
if ((f_bits == unpacked_bits) && field_offset)
return true;
u64 f_mask = f_bits
? ~((~0ULL << (f_bits - 1)) << 1)
: 0;
if (((field_offset + f_mask) & unpacked_mask) < field_offset)
return true;
return false;
}
int bch2_bkey_format_invalid(struct bch_fs *, struct bkey_format *,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
void bch2_bkey_format_to_text(struct printbuf *, const struct bkey_format *);
#endif /* _BCACHEFS_BKEY_H */
......@@ -27,7 +27,7 @@ const char * const bch2_bkey_types[] = {
};
static int deleted_key_invalid(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags, struct printbuf *err)
enum bch_validate_flags flags, struct printbuf *err)
{
return 0;
}
......@@ -41,7 +41,7 @@ static int deleted_key_invalid(struct bch_fs *c, struct bkey_s_c k,
})
static int empty_val_key_invalid(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags, struct printbuf *err)
enum bch_validate_flags flags, struct printbuf *err)
{
int ret = 0;
......@@ -58,7 +58,7 @@ static int empty_val_key_invalid(struct bch_fs *c, struct bkey_s_c k,
})
static int key_type_cookie_invalid(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags, struct printbuf *err)
enum bch_validate_flags flags, struct printbuf *err)
{
return 0;
}
......@@ -82,7 +82,7 @@ static void key_type_cookie_to_text(struct printbuf *out, struct bch_fs *c,
})
static int key_type_inline_data_invalid(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags, struct printbuf *err)
enum bch_validate_flags flags, struct printbuf *err)
{
return 0;
}
......@@ -123,9 +123,12 @@ const struct bkey_ops bch2_bkey_null_ops = {
};
int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags,
enum bch_validate_flags flags,
struct printbuf *err)
{
if (test_bit(BCH_FS_no_invalid_checks, &c->flags))
return 0;
const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
int ret = 0;
......@@ -159,9 +162,12 @@ const char *bch2_btree_node_type_str(enum btree_node_type type)
int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
enum btree_node_type type,
enum bkey_invalid_flags flags,
enum bch_validate_flags flags,
struct printbuf *err)
{
if (test_bit(BCH_FS_no_invalid_checks, &c->flags))
return 0;
int ret = 0;
bkey_fsck_err_on(k.k->u64s < BKEY_U64s, c, err,
......@@ -172,7 +178,7 @@ int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
return 0;
bkey_fsck_err_on(k.k->type < KEY_TYPE_MAX &&
(type == BKEY_TYPE_btree || (flags & BKEY_INVALID_COMMIT)) &&
(type == BKEY_TYPE_btree || (flags & BCH_VALIDATE_commit)) &&
!(bch2_key_types_allowed[type] & BIT_ULL(k.k->type)), c, err,
bkey_invalid_type_for_btree,
"invalid key type for btree %s (%s)",
......@@ -224,7 +230,7 @@ int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
int bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
enum btree_node_type type,
enum bkey_invalid_flags flags,
enum bch_validate_flags flags,
struct printbuf *err)
{
return __bch2_bkey_invalid(c, k, type, flags, err) ?:
......
......@@ -22,14 +22,15 @@ extern const struct bkey_ops bch2_bkey_null_ops;
*/
struct bkey_ops {
int (*key_invalid)(struct bch_fs *c, struct bkey_s_c k,
enum bkey_invalid_flags flags, struct printbuf *err);
enum bch_validate_flags flags, struct printbuf *err);
void (*val_to_text)(struct printbuf *, struct bch_fs *,
struct bkey_s_c);
void (*swab)(struct bkey_s);
bool (*key_normalize)(struct bch_fs *, struct bkey_s);
bool (*key_merge)(struct bch_fs *, struct bkey_s, struct bkey_s_c);
int (*trigger)(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_s, unsigned);
struct bkey_s_c, struct bkey_s,
enum btree_iter_update_trigger_flags);
void (*compat)(enum btree_id id, unsigned version,
unsigned big_endian, int write,
struct bkey_s);
......@@ -48,11 +49,11 @@ static inline const struct bkey_ops *bch2_bkey_type_ops(enum bch_bkey_type type)
}
int bch2_bkey_val_invalid(struct bch_fs *, struct bkey_s_c,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int __bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type,
enum bkey_invalid_flags, struct printbuf *);
enum bch_validate_flags, struct printbuf *);
int bch2_bkey_in_btree_node(struct bch_fs *, struct btree *,
struct bkey_s_c, struct printbuf *);
......@@ -76,56 +77,10 @@ static inline bool bch2_bkey_maybe_mergable(const struct bkey *l, const struct b
bool bch2_bkey_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
enum btree_update_flags {
__BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE = __BTREE_ITER_FLAGS_END,
__BTREE_UPDATE_NOJOURNAL,
__BTREE_UPDATE_KEY_CACHE_RECLAIM,
__BTREE_TRIGGER_NORUN,
__BTREE_TRIGGER_TRANSACTIONAL,
__BTREE_TRIGGER_ATOMIC,
__BTREE_TRIGGER_GC,
__BTREE_TRIGGER_INSERT,
__BTREE_TRIGGER_OVERWRITE,
__BTREE_TRIGGER_BUCKET_INVALIDATE,
};
#define BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE (1U << __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE)
#define BTREE_UPDATE_NOJOURNAL (1U << __BTREE_UPDATE_NOJOURNAL)
#define BTREE_UPDATE_KEY_CACHE_RECLAIM (1U << __BTREE_UPDATE_KEY_CACHE_RECLAIM)
/* Don't run triggers at all */
#define BTREE_TRIGGER_NORUN (1U << __BTREE_TRIGGER_NORUN)
/*
* If set, we're running transactional triggers as part of a transaction commit:
* triggers may generate new updates
*
* If cleared, and either BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE are set,
* we're running atomic triggers during a transaction commit: we have our
* journal reservation, we're holding btree node write locks, and we know the
* transaction is going to commit (returning an error here is a fatal error,
* causing us to go emergency read-only)
*/
#define BTREE_TRIGGER_TRANSACTIONAL (1U << __BTREE_TRIGGER_TRANSACTIONAL)
#define BTREE_TRIGGER_ATOMIC (1U << __BTREE_TRIGGER_ATOMIC)
/* We're in gc/fsck: running triggers to recalculate e.g. disk usage */
#define BTREE_TRIGGER_GC (1U << __BTREE_TRIGGER_GC)
/* @new is entering the btree */
#define BTREE_TRIGGER_INSERT (1U << __BTREE_TRIGGER_INSERT)
/* @old is leaving the btree */
#define BTREE_TRIGGER_OVERWRITE (1U << __BTREE_TRIGGER_OVERWRITE)
/* signal from bucket invalidate path to alloc trigger */
#define BTREE_TRIGGER_BUCKET_INVALIDATE (1U << __BTREE_TRIGGER_BUCKET_INVALIDATE)
static inline int bch2_key_trigger(struct btree_trans *trans,
enum btree_id btree, unsigned level,
struct bkey_s_c old, struct bkey_s new,
unsigned flags)
enum btree_iter_update_trigger_flags flags)
{
const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type);
......@@ -135,8 +90,9 @@ static inline int bch2_key_trigger(struct btree_trans *trans,
}
static inline int bch2_key_trigger_old(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s_c old, unsigned flags)
enum btree_id btree_id, unsigned level,
struct bkey_s_c old,
enum btree_iter_update_trigger_flags flags)
{
struct bkey_i deleted;
......@@ -144,12 +100,13 @@ static inline int bch2_key_trigger_old(struct btree_trans *trans,
deleted.k.p = old.k->p;
return bch2_key_trigger(trans, btree_id, level, old, bkey_i_to_s(&deleted),
BTREE_TRIGGER_OVERWRITE|flags);
BTREE_TRIGGER_overwrite|flags);
}
static inline int bch2_key_trigger_new(struct btree_trans *trans,
enum btree_id btree_id, unsigned level,
struct bkey_s new, unsigned flags)
enum btree_id btree_id, unsigned level,
struct bkey_s new,
enum btree_iter_update_trigger_flags flags)
{
struct bkey_i deleted;
......@@ -157,7 +114,7 @@ static inline int bch2_key_trigger_new(struct btree_trans *trans,
deleted.k.p = new.k->p;
return bch2_key_trigger(trans, btree_id, level, bkey_i_to_s_c(&deleted), new,
BTREE_TRIGGER_INSERT|flags);
BTREE_TRIGGER_insert|flags);
}
void bch2_bkey_renumber(enum btree_node_type, struct bkey_packed *, int);
......
......@@ -6,9 +6,9 @@
#include "bset.h"
#include "extents.h"
typedef int (*sort_cmp_fn)(struct btree *,
struct bkey_packed *,
struct bkey_packed *);
typedef int (*sort_cmp_fn)(const struct btree *,
const struct bkey_packed *,
const struct bkey_packed *);
static inline bool sort_iter_end(struct sort_iter *iter)
{
......@@ -70,9 +70,9 @@ static inline struct bkey_packed *sort_iter_next(struct sort_iter *iter,
/*
* If keys compare equal, compare by pointer order:
*/
static inline int key_sort_fix_overlapping_cmp(struct btree *b,
struct bkey_packed *l,
struct bkey_packed *r)
static inline int key_sort_fix_overlapping_cmp(const struct btree *b,
const struct bkey_packed *l,
const struct bkey_packed *r)
{
return bch2_bkey_cmp_packed(b, l, r) ?:
cmp_int((unsigned long) l, (unsigned long) r);
......@@ -154,46 +154,59 @@ bch2_sort_repack(struct bset *dst, struct btree *src,
return nr;
}
static inline int sort_keys_cmp(struct btree *b,
struct bkey_packed *l,
struct bkey_packed *r)
static inline int keep_unwritten_whiteouts_cmp(const struct btree *b,
const struct bkey_packed *l,
const struct bkey_packed *r)
{
return bch2_bkey_cmp_packed_inlined(b, l, r) ?:
(int) bkey_deleted(r) - (int) bkey_deleted(l) ?:
(int) l->needs_whiteout - (int) r->needs_whiteout;
(long) l - (long) r;
}
unsigned bch2_sort_keys(struct bkey_packed *dst,
struct sort_iter *iter,
bool filter_whiteouts)
#include "btree_update_interior.h"
/*
* For sorting in the btree node write path: whiteouts not in the unwritten
* whiteouts area are dropped, whiteouts in the unwritten whiteouts area are
* dropped if overwritten by real keys:
*/
unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *dst, struct sort_iter *iter)
{
const struct bkey_format *f = &iter->b->format;
struct bkey_packed *in, *next, *out = dst;
sort_iter_sort(iter, sort_keys_cmp);
sort_iter_sort(iter, keep_unwritten_whiteouts_cmp);
while ((in = sort_iter_next(iter, sort_keys_cmp))) {
bool needs_whiteout = false;
while ((in = sort_iter_next(iter, keep_unwritten_whiteouts_cmp))) {
if (bkey_deleted(in) && in < unwritten_whiteouts_start(iter->b))
continue;
if (bkey_deleted(in) &&
(filter_whiteouts || !in->needs_whiteout))
if ((next = sort_iter_peek(iter)) &&
!bch2_bkey_cmp_packed_inlined(iter->b, in, next))
continue;
while ((next = sort_iter_peek(iter)) &&
!bch2_bkey_cmp_packed_inlined(iter->b, in, next)) {
BUG_ON(in->needs_whiteout &&
next->needs_whiteout);
needs_whiteout |= in->needs_whiteout;
in = sort_iter_next(iter, sort_keys_cmp);
}
bkey_p_copy(out, in);
out = bkey_p_next(out);
}
if (bkey_deleted(in)) {
memcpy_u64s_small(out, in, bkeyp_key_u64s(f, in));
set_bkeyp_val_u64s(f, out, 0);
} else {
bkey_p_copy(out, in);
}
out->needs_whiteout |= needs_whiteout;
return (u64 *) out - (u64 *) dst;
}
/*
* Main sort routine for compacting a btree node in memory: we always drop
* whiteouts because any whiteouts that need to be written are in the unwritten
* whiteouts area:
*/
unsigned bch2_sort_keys(struct bkey_packed *dst, struct sort_iter *iter)
{
struct bkey_packed *in, *out = dst;
sort_iter_sort(iter, bch2_bkey_cmp_packed_inlined);
while ((in = sort_iter_next(iter, bch2_bkey_cmp_packed_inlined))) {
if (bkey_deleted(in))
continue;
bkey_p_copy(out, in);
out = bkey_p_next(out);
}
......
......@@ -48,7 +48,7 @@ bch2_sort_repack(struct bset *, struct btree *,
struct btree_node_iter *,
struct bkey_format *, bool);
unsigned bch2_sort_keys(struct bkey_packed *,
struct sort_iter *, bool);
unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *, struct sort_iter *);
unsigned bch2_sort_keys(struct bkey_packed *, struct sort_iter *);
#endif /* _BCACHEFS_BKEY_SORT_H */
......@@ -103,8 +103,6 @@ void bch2_dump_bset(struct bch_fs *c, struct btree *b,
void bch2_dump_btree_node(struct bch_fs *c, struct btree *b)
{
struct bset_tree *t;
console_lock();
for_each_bset(b, t)
bch2_dump_bset(c, b, bset(b, t), t - b->set);
......@@ -136,7 +134,6 @@ void bch2_dump_btree_node_iter(struct btree *b,
struct btree_nr_keys bch2_btree_node_count_keys(struct btree *b)
{
struct bset_tree *t;
struct bkey_packed *k;
struct btree_nr_keys nr = {};
......@@ -198,7 +195,6 @@ void bch2_btree_node_iter_verify(struct btree_node_iter *iter,
{
struct btree_node_iter_set *set, *s2;
struct bkey_packed *k, *p;
struct bset_tree *t;
if (bch2_btree_node_iter_end(iter))
return;
......@@ -213,12 +209,14 @@ void bch2_btree_node_iter_verify(struct btree_node_iter *iter,
/* Verify that set->end is correct: */
btree_node_iter_for_each(iter, set) {
for_each_bset(b, t)
if (set->end == t->end_offset)
if (set->end == t->end_offset) {
BUG_ON(set->k < btree_bkey_first_offset(t) ||
set->k >= t->end_offset);
goto found;
}
BUG();
found:
BUG_ON(set->k < btree_bkey_first_offset(t) ||
set->k >= t->end_offset);
do {} while (0);
}
/* Verify iterator is sorted: */
......@@ -377,11 +375,9 @@ static struct bkey_float *bkey_float(const struct btree *b,
return ro_aux_tree_base(b, t)->f + idx;
}
static void bset_aux_tree_verify(const struct btree *b)
static void bset_aux_tree_verify(struct btree *b)
{
#ifdef CONFIG_BCACHEFS_DEBUG
const struct bset_tree *t;
for_each_bset(b, t) {
if (t->aux_data_offset == U16_MAX)
continue;
......@@ -685,20 +681,20 @@ static __always_inline void make_bfloat(struct btree *b, struct bset_tree *t,
}
/* bytes remaining - only valid for last bset: */
static unsigned __bset_tree_capacity(const struct btree *b, const struct bset_tree *t)
static unsigned __bset_tree_capacity(struct btree *b, const struct bset_tree *t)
{
bset_aux_tree_verify(b);
return btree_aux_data_bytes(b) - t->aux_data_offset * sizeof(u64);
}
static unsigned bset_ro_tree_capacity(const struct btree *b, const struct bset_tree *t)
static unsigned bset_ro_tree_capacity(struct btree *b, const struct bset_tree *t)
{
return __bset_tree_capacity(b, t) /
(sizeof(struct bkey_float) + sizeof(u8));
}
static unsigned bset_rw_tree_capacity(const struct btree *b, const struct bset_tree *t)
static unsigned bset_rw_tree_capacity(struct btree *b, const struct bset_tree *t)
{
return __bset_tree_capacity(b, t) / sizeof(struct rw_aux_tree);
}
......@@ -1374,8 +1370,6 @@ void bch2_btree_node_iter_init(struct btree_node_iter *iter,
void bch2_btree_node_iter_init_from_start(struct btree_node_iter *iter,
struct btree *b)
{
struct bset_tree *t;
memset(iter, 0, sizeof(*iter));
for_each_bset(b, t)
......@@ -1481,7 +1475,6 @@ struct bkey_packed *bch2_btree_node_iter_prev_all(struct btree_node_iter *iter,
{
struct bkey_packed *k, *prev = NULL;
struct btree_node_iter_set *set;
struct bset_tree *t;
unsigned end = 0;
if (bch2_expensive_debug_checks)
......@@ -1550,9 +1543,7 @@ struct bkey_s_c bch2_btree_node_iter_peek_unpack(struct btree_node_iter *iter,
void bch2_btree_keys_stats(const struct btree *b, struct bset_stats *stats)
{
const struct bset_tree *t;
for_each_bset(b, t) {
for_each_bset_c(b, t) {
enum bset_aux_tree_type type = bset_aux_tree_type(t);
size_t j;
......
......@@ -206,7 +206,10 @@ static inline size_t btree_aux_data_u64s(const struct btree *b)
}
#define for_each_bset(_b, _t) \
for (_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
for (struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
#define for_each_bset_c(_b, _t) \
for (const struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
#define bset_tree_for_each_key(_b, _t, _k) \
for (_k = btree_bkey_first(_b, _t); \
......@@ -294,7 +297,6 @@ static inline struct bset_tree *
bch2_bkey_to_bset_inlined(struct btree *b, struct bkey_packed *k)
{
unsigned offset = __btree_node_key_to_offset(b, k);
struct bset_tree *t;
for_each_bset(b, t)
if (offset <= t->end_offset) {
......
......@@ -16,6 +16,12 @@
#include <linux/prefetch.h>
#include <linux/sched/mm.h>
#define BTREE_CACHE_NOT_FREED_INCREMENT(counter) \
do { \
if (shrinker_counter) \
bc->not_freed_##counter++; \
} while (0)
const char * const bch2_btree_node_flags[] = {
#define x(f) #f,
BTREE_FLAGS()
......@@ -162,6 +168,9 @@ void bch2_btree_node_hash_remove(struct btree_cache *bc, struct btree *b)
/* Cause future lookups for this node to fail: */
b->hash_val = 0;
if (b->c.btree_id < BTREE_ID_NR)
--bc->used_by_btree[b->c.btree_id];
}
int __bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b)
......@@ -169,8 +178,11 @@ int __bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b)
BUG_ON(b->hash_val);
b->hash_val = btree_ptr_hash_val(&b->key);
return rhashtable_lookup_insert_fast(&bc->table, &b->hash,
bch_btree_cache_params);
int ret = rhashtable_lookup_insert_fast(&bc->table, &b->hash,
bch_btree_cache_params);
if (!ret && b->c.btree_id < BTREE_ID_NR)
bc->used_by_btree[b->c.btree_id]++;
return ret;
}
int bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b,
......@@ -190,6 +202,35 @@ int bch2_btree_node_hash_insert(struct btree_cache *bc, struct btree *b,
return ret;
}
void bch2_btree_node_update_key_early(struct btree_trans *trans,
enum btree_id btree, unsigned level,
struct bkey_s_c old, struct bkey_i *new)
{
struct bch_fs *c = trans->c;
struct btree *b;
struct bkey_buf tmp;
int ret;
bch2_bkey_buf_init(&tmp);
bch2_bkey_buf_reassemble(&tmp, c, old);
b = bch2_btree_node_get_noiter(trans, tmp.k, btree, level, true);
if (!IS_ERR_OR_NULL(b)) {
mutex_lock(&c->btree_cache.lock);
bch2_btree_node_hash_remove(&c->btree_cache, b);
bkey_copy(&b->key, new);
ret = __bch2_btree_node_hash_insert(&c->btree_cache, b);
BUG_ON(ret);
mutex_unlock(&c->btree_cache.lock);
six_unlock_read(&b->c.lock);
}
bch2_bkey_buf_exit(&tmp, c);
}
__flatten
static inline struct btree *btree_cache_find(struct btree_cache *bc,
const struct bkey_i *k)
......@@ -203,7 +244,7 @@ static inline struct btree *btree_cache_find(struct btree_cache *bc,
* this version is for btree nodes that have already been freed (we're not
* reaping a real btree node)
*/
static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush)
static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush, bool shrinker_counter)
{
struct btree_cache *bc = &c->btree_cache;
int ret = 0;
......@@ -225,38 +266,64 @@ static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush)
if (b->flags & ((1U << BTREE_NODE_dirty)|
(1U << BTREE_NODE_read_in_flight)|
(1U << BTREE_NODE_write_in_flight))) {
if (!flush)
if (!flush) {
if (btree_node_dirty(b))
BTREE_CACHE_NOT_FREED_INCREMENT(dirty);
else if (btree_node_read_in_flight(b))
BTREE_CACHE_NOT_FREED_INCREMENT(read_in_flight);
else if (btree_node_write_in_flight(b))
BTREE_CACHE_NOT_FREED_INCREMENT(write_in_flight);
return -BCH_ERR_ENOMEM_btree_node_reclaim;
}
/* XXX: waiting on IO with btree cache lock held */
bch2_btree_node_wait_on_read(b);
bch2_btree_node_wait_on_write(b);
}
if (!six_trylock_intent(&b->c.lock))
if (!six_trylock_intent(&b->c.lock)) {
BTREE_CACHE_NOT_FREED_INCREMENT(lock_intent);
return -BCH_ERR_ENOMEM_btree_node_reclaim;
}
if (!six_trylock_write(&b->c.lock))
if (!six_trylock_write(&b->c.lock)) {
BTREE_CACHE_NOT_FREED_INCREMENT(lock_write);
goto out_unlock_intent;
}
/* recheck under lock */
if (b->flags & ((1U << BTREE_NODE_read_in_flight)|
(1U << BTREE_NODE_write_in_flight))) {
if (!flush)
if (!flush) {
if (btree_node_read_in_flight(b))
BTREE_CACHE_NOT_FREED_INCREMENT(read_in_flight);
else if (btree_node_write_in_flight(b))
BTREE_CACHE_NOT_FREED_INCREMENT(write_in_flight);
goto out_unlock;
}
six_unlock_write(&b->c.lock);
six_unlock_intent(&b->c.lock);
goto wait_on_io;
}
if (btree_node_noevict(b) ||
btree_node_write_blocked(b) ||
btree_node_will_make_reachable(b))
if (btree_node_noevict(b)) {
BTREE_CACHE_NOT_FREED_INCREMENT(noevict);
goto out_unlock;
}
if (btree_node_write_blocked(b)) {
BTREE_CACHE_NOT_FREED_INCREMENT(write_blocked);
goto out_unlock;
}
if (btree_node_will_make_reachable(b)) {
BTREE_CACHE_NOT_FREED_INCREMENT(will_make_reachable);
goto out_unlock;
}
if (btree_node_dirty(b)) {
if (!flush)
if (!flush) {
BTREE_CACHE_NOT_FREED_INCREMENT(dirty);
goto out_unlock;
}
/*
* Using the underscore version because we don't want to compact
* bsets after the write, since this node is about to be evicted
......@@ -286,14 +353,14 @@ static int __btree_node_reclaim(struct bch_fs *c, struct btree *b, bool flush)
goto out;
}
static int btree_node_reclaim(struct bch_fs *c, struct btree *b)
static int btree_node_reclaim(struct bch_fs *c, struct btree *b, bool shrinker_counter)
{
return __btree_node_reclaim(c, b, false);
return __btree_node_reclaim(c, b, false, shrinker_counter);
}
static int btree_node_write_and_reclaim(struct bch_fs *c, struct btree *b)
{
return __btree_node_reclaim(c, b, true);
return __btree_node_reclaim(c, b, true, false);
}
static unsigned long bch2_btree_cache_scan(struct shrinker *shrink,
......@@ -341,11 +408,12 @@ static unsigned long bch2_btree_cache_scan(struct shrinker *shrink,
if (touched >= nr)
goto out;
if (!btree_node_reclaim(c, b)) {
if (!btree_node_reclaim(c, b, true)) {
btree_node_data_free(c, b);
six_unlock_write(&b->c.lock);
six_unlock_intent(&b->c.lock);
freed++;
bc->freed++;
}
}
restart:
......@@ -354,9 +422,11 @@ static unsigned long bch2_btree_cache_scan(struct shrinker *shrink,
if (btree_node_accessed(b)) {
clear_btree_node_accessed(b);
} else if (!btree_node_reclaim(c, b)) {
bc->not_freed_access_bit++;
} else if (!btree_node_reclaim(c, b, true)) {
freed++;
btree_node_data_free(c, b);
bc->freed++;
bch2_btree_node_hash_remove(bc, b);
six_unlock_write(&b->c.lock);
......@@ -564,7 +634,7 @@ static struct btree *btree_node_cannibalize(struct bch_fs *c)
struct btree *b;
list_for_each_entry_reverse(b, &bc->live, list)
if (!btree_node_reclaim(c, b))
if (!btree_node_reclaim(c, b, false))
return b;
while (1) {
......@@ -600,7 +670,7 @@ struct btree *bch2_btree_node_mem_alloc(struct btree_trans *trans, bool pcpu_rea
* disk node. Check the freed list before allocating a new one:
*/
list_for_each_entry(b, freed, list)
if (!btree_node_reclaim(c, b)) {
if (!btree_node_reclaim(c, b, false)) {
list_del_init(&b->list);
goto got_node;
}
......@@ -626,7 +696,7 @@ struct btree *bch2_btree_node_mem_alloc(struct btree_trans *trans, bool pcpu_rea
* the list. Check if there's any freed nodes there:
*/
list_for_each_entry(b2, &bc->freeable, list)
if (!btree_node_reclaim(c, b2)) {
if (!btree_node_reclaim(c, b2, false)) {
swap(b->data, b2->data);
swap(b->aux_data, b2->aux_data);
btree_node_to_freedlist(bc, b2);
......@@ -846,7 +916,6 @@ static struct btree *__bch2_btree_node_get(struct btree_trans *trans, struct btr
struct bch_fs *c = trans->c;
struct btree_cache *bc = &c->btree_cache;
struct btree *b;
struct bset_tree *t;
bool need_relock = false;
int ret;
......@@ -966,7 +1035,6 @@ struct btree *bch2_btree_node_get(struct btree_trans *trans, struct btree_path *
{
struct bch_fs *c = trans->c;
struct btree *b;
struct bset_tree *t;
int ret;
EBUG_ON(level >= BTREE_MAX_DEPTH);
......@@ -1043,7 +1111,6 @@ struct btree *bch2_btree_node_get_noiter(struct btree_trans *trans,
struct bch_fs *c = trans->c;
struct btree_cache *bc = &c->btree_cache;
struct btree *b;
struct bset_tree *t;
int ret;
EBUG_ON(level >= BTREE_MAX_DEPTH);
......@@ -1240,9 +1307,39 @@ void bch2_btree_node_to_text(struct printbuf *out, struct bch_fs *c, const struc
stats.failed);
}
void bch2_btree_cache_to_text(struct printbuf *out, const struct bch_fs *c)
static void prt_btree_cache_line(struct printbuf *out, const struct bch_fs *c,
const char *label, unsigned nr)
{
prt_printf(out, "nr nodes:\t\t%u\n", c->btree_cache.used);
prt_printf(out, "nr dirty:\t\t%u\n", atomic_read(&c->btree_cache.dirty));
prt_printf(out, "cannibalize lock:\t%p\n", c->btree_cache.alloc_lock);
prt_printf(out, "%s\t", label);
prt_human_readable_u64(out, nr * c->opts.btree_node_size);
prt_printf(out, " (%u)\n", nr);
}
void bch2_btree_cache_to_text(struct printbuf *out, const struct btree_cache *bc)
{
struct bch_fs *c = container_of(bc, struct bch_fs, btree_cache);
if (!out->nr_tabstops)
printbuf_tabstop_push(out, 32);
prt_btree_cache_line(out, c, "total:", bc->used);
prt_btree_cache_line(out, c, "nr dirty:", atomic_read(&bc->dirty));
prt_printf(out, "cannibalize lock:\t%p\n", bc->alloc_lock);
prt_newline(out);
for (unsigned i = 0; i < ARRAY_SIZE(bc->used_by_btree); i++)
prt_btree_cache_line(out, c, bch2_btree_id_str(i), bc->used_by_btree[i]);
prt_newline(out);
prt_printf(out, "freed:\t%u\n", bc->freed);
prt_printf(out, "not freed:\n");
prt_printf(out, " dirty\t%u\n", bc->not_freed_dirty);
prt_printf(out, " write in flight\t%u\n", bc->not_freed_write_in_flight);
prt_printf(out, " read in flight\t%u\n", bc->not_freed_read_in_flight);
prt_printf(out, " lock intent failed\t%u\n", bc->not_freed_lock_intent);
prt_printf(out, " lock write failed\t%u\n", bc->not_freed_lock_write);
prt_printf(out, " access bit\t%u\n", bc->not_freed_access_bit);
prt_printf(out, " no evict failed\t%u\n", bc->not_freed_noevict);
prt_printf(out, " write blocked\t%u\n", bc->not_freed_write_blocked);
prt_printf(out, " will make reachable\t%u\n", bc->not_freed_will_make_reachable);
}
......@@ -17,6 +17,9 @@ int __bch2_btree_node_hash_insert(struct btree_cache *, struct btree *);
int bch2_btree_node_hash_insert(struct btree_cache *, struct btree *,
unsigned, enum btree_id);
void bch2_btree_node_update_key_early(struct btree_trans *, enum btree_id, unsigned,
struct bkey_s_c, struct bkey_i *);
void bch2_btree_cache_cannibalize_unlock(struct btree_trans *);
int bch2_btree_cache_cannibalize_lock(struct btree_trans *, struct closure *);
......@@ -131,6 +134,6 @@ static inline struct btree *btree_node_root(struct bch_fs *c, struct btree *b)
const char *bch2_btree_id_str(enum btree_id);
void bch2_btree_pos_to_text(struct printbuf *, struct bch_fs *, const struct btree *);
void bch2_btree_node_to_text(struct printbuf *, struct bch_fs *, const struct btree *);
void bch2_btree_cache_to_text(struct printbuf *, const struct bch_fs *);
void bch2_btree_cache_to_text(struct printbuf *, const struct btree_cache *);
#endif /* _BCACHEFS_BTREE_CACHE_H */
This diff is collapsed.
......@@ -6,10 +6,7 @@
#include "btree_types.h"
int bch2_check_topology(struct bch_fs *);
int bch2_gc(struct bch_fs *, bool, bool);
int bch2_gc_gens(struct bch_fs *);
void bch2_gc_thread_stop(struct bch_fs *);
int bch2_gc_thread_start(struct bch_fs *);
int bch2_check_allocations(struct bch_fs *);
/*
* For concurrent mark and sweep (with other index updates), we define a total
......@@ -37,16 +34,16 @@ static inline struct gc_pos gc_phase(enum gc_phase phase)
{
return (struct gc_pos) {
.phase = phase,
.pos = POS_MIN,
.level = 0,
.pos = POS_MIN,
};
}
static inline int gc_pos_cmp(struct gc_pos l, struct gc_pos r)
{
return cmp_int(l.phase, r.phase) ?:
bpos_cmp(l.pos, r.pos) ?:
cmp_int(l.level, r.level);
return cmp_int(l.phase, r.phase) ?:
-cmp_int(l.level, r.level) ?:
bpos_cmp(l.pos, r.pos);
}
static inline enum gc_phase btree_id_to_gc_phase(enum btree_id id)
......@@ -60,13 +57,13 @@ static inline enum gc_phase btree_id_to_gc_phase(enum btree_id id)
}
}
static inline struct gc_pos gc_pos_btree(enum btree_id id,
struct bpos pos, unsigned level)
static inline struct gc_pos gc_pos_btree(enum btree_id btree, unsigned level,
struct bpos pos)
{
return (struct gc_pos) {
.phase = btree_id_to_gc_phase(id),
.pos = pos,
.phase = btree_id_to_gc_phase(btree),
.level = level,
.pos = pos,
};
}
......@@ -76,19 +73,7 @@ static inline struct gc_pos gc_pos_btree(enum btree_id id,
*/
static inline struct gc_pos gc_pos_btree_node(struct btree *b)
{
return gc_pos_btree(b->c.btree_id, b->key.k.p, b->c.level);
}
/*
* GC position of the pointer to a btree root: we don't use
* gc_pos_pointer_to_btree_node() here to avoid a potential race with
* btree_split() increasing the tree depth - the new root will have level > the
* old root and thus have a greater gc position than the old root, but that
* would be incorrect since once gc has marked the root it's not coming back.
*/
static inline struct gc_pos gc_pos_btree_root(enum btree_id id)
{
return gc_pos_btree(id, SPOS_MAX, BTREE_MAX_DEPTH);
return gc_pos_btree(b->c.btree_id, b->c.level, b->key.k.p);
}
static inline bool gc_visited(struct bch_fs *c, struct gc_pos pos)
......@@ -104,11 +89,8 @@ static inline bool gc_visited(struct bch_fs *c, struct gc_pos pos)
return ret;
}
static inline void bch2_do_gc_gens(struct bch_fs *c)
{
atomic_inc(&c->kick_gc);
if (c->gc_thread)
wake_up_process(c->gc_thread);
}
int bch2_gc_gens(struct bch_fs *);
void bch2_gc_gens_async(struct bch_fs *);
void bch2_fs_gc_init(struct bch_fs *);
#endif /* _BCACHEFS_BTREE_GC_H */
This diff is collapsed.
......@@ -81,8 +81,6 @@ static inline bool should_compact_bset_lazy(struct btree *b,
static inline bool bch2_maybe_compact_whiteouts(struct bch_fs *c, struct btree *b)
{
struct bset_tree *t;
for_each_bset(b, t)
if (should_compact_bset_lazy(b, t))
return bch2_compact_whiteouts(c, b, COMPACT_LAZY);
......
This diff is collapsed.
......@@ -216,9 +216,13 @@ int __must_check bch2_btree_path_traverse_one(struct btree_trans *,
btree_path_idx_t,
unsigned, unsigned long);
static inline void bch2_trans_verify_not_unlocked(struct btree_trans *);
static inline int __must_check bch2_btree_path_traverse(struct btree_trans *trans,
btree_path_idx_t path, unsigned flags)
{
bch2_trans_verify_not_unlocked(trans);
if (trans->paths[path].uptodate < BTREE_ITER_NEED_RELOCK)
return 0;
......@@ -227,6 +231,9 @@ static inline int __must_check bch2_btree_path_traverse(struct btree_trans *tran
btree_path_idx_t bch2_path_get(struct btree_trans *, enum btree_id, struct bpos,
unsigned, unsigned, unsigned, unsigned long);
btree_path_idx_t bch2_path_get_unlocked_mut(struct btree_trans *, enum btree_id,
unsigned, struct bpos);
struct bkey_s_c bch2_btree_path_peek_slot(struct btree_path *, struct bkey *);
/*
......@@ -283,7 +290,6 @@ int bch2_trans_relock(struct btree_trans *);
int bch2_trans_relock_notrace(struct btree_trans *);
void bch2_trans_unlock(struct btree_trans *);
void bch2_trans_unlock_long(struct btree_trans *);
bool bch2_trans_locked(struct btree_trans *);
static inline int trans_was_restarted(struct btree_trans *trans, u32 restart_count)
{
......@@ -309,6 +315,14 @@ static inline void bch2_trans_verify_not_in_restart(struct btree_trans *trans)
bch2_trans_in_restart_error(trans);
}
void __noreturn bch2_trans_unlocked_error(struct btree_trans *);
static inline void bch2_trans_verify_not_unlocked(struct btree_trans *trans)
{
if (!trans->locked)
bch2_trans_unlocked_error(trans);
}
__always_inline
static int btree_trans_restart_nounlock(struct btree_trans *trans, int err)
{
......@@ -386,10 +400,10 @@ static inline void bch2_btree_iter_set_pos(struct btree_iter *iter, struct bpos
if (unlikely(iter->update_path))
bch2_path_put(trans, iter->update_path,
iter->flags & BTREE_ITER_INTENT);
iter->flags & BTREE_ITER_intent);
iter->update_path = 0;
if (!(iter->flags & BTREE_ITER_ALL_SNAPSHOTS))
if (!(iter->flags & BTREE_ITER_all_snapshots))
new_pos.snapshot = iter->snapshot;
__bch2_btree_iter_set_pos(iter, new_pos);
......@@ -397,7 +411,7 @@ static inline void bch2_btree_iter_set_pos(struct btree_iter *iter, struct bpos
static inline void bch2_btree_iter_set_pos_to_extent_start(struct btree_iter *iter)
{
BUG_ON(!(iter->flags & BTREE_ITER_IS_EXTENTS));
BUG_ON(!(iter->flags & BTREE_ITER_is_extents));
iter->pos = bkey_start_pos(&iter->k);
}
......@@ -416,20 +430,20 @@ static inline unsigned __bch2_btree_iter_flags(struct btree_trans *trans,
unsigned btree_id,
unsigned flags)
{
if (!(flags & (BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_NOT_EXTENTS)) &&
if (!(flags & (BTREE_ITER_all_snapshots|BTREE_ITER_not_extents)) &&
btree_id_is_extents(btree_id))
flags |= BTREE_ITER_IS_EXTENTS;
flags |= BTREE_ITER_is_extents;
if (!(flags & __BTREE_ITER_ALL_SNAPSHOTS) &&
if (!(flags & BTREE_ITER_snapshot_field) &&
!btree_type_has_snapshot_field(btree_id))
flags &= ~BTREE_ITER_ALL_SNAPSHOTS;
flags &= ~BTREE_ITER_all_snapshots;
if (!(flags & BTREE_ITER_ALL_SNAPSHOTS) &&
if (!(flags & BTREE_ITER_all_snapshots) &&
btree_type_has_snapshots(btree_id))
flags |= BTREE_ITER_FILTER_SNAPSHOTS;
flags |= BTREE_ITER_filter_snapshots;
if (trans->journal_replay_not_finished)
flags |= BTREE_ITER_WITH_JOURNAL;
flags |= BTREE_ITER_with_journal;
return flags;
}
......@@ -439,10 +453,10 @@ static inline unsigned bch2_btree_iter_flags(struct btree_trans *trans,
unsigned flags)
{
if (!btree_id_cached(trans->c, btree_id)) {
flags &= ~BTREE_ITER_CACHED;
flags &= ~BTREE_ITER_WITH_KEY_CACHE;
} else if (!(flags & BTREE_ITER_CACHED))
flags |= BTREE_ITER_WITH_KEY_CACHE;
flags &= ~BTREE_ITER_cached;
flags &= ~BTREE_ITER_with_key_cache;
} else if (!(flags & BTREE_ITER_cached))
flags |= BTREE_ITER_with_key_cache;
return __bch2_btree_iter_flags(trans, btree_id, flags);
}
......@@ -494,18 +508,7 @@ void bch2_trans_node_iter_init(struct btree_trans *, struct btree_iter *,
unsigned, unsigned, unsigned);
void bch2_trans_copy_iter(struct btree_iter *, struct btree_iter *);
static inline void set_btree_iter_dontneed(struct btree_iter *iter)
{
struct btree_trans *trans = iter->trans;
if (!iter->path || trans->restarted)
return;
struct btree_path *path = btree_iter_path(trans, iter);
path->preserve = false;
if (path->ref == 1)
path->should_be_locked = false;
}
void bch2_set_btree_iter_dontneed(struct btree_iter *);
void *__bch2_trans_kmalloc(struct btree_trans *, size_t);
......@@ -619,14 +622,14 @@ u32 bch2_trans_begin(struct btree_trans *);
static inline struct bkey_s_c bch2_btree_iter_peek_prev_type(struct btree_iter *iter,
unsigned flags)
{
return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) :
return flags & BTREE_ITER_slots ? bch2_btree_iter_peek_slot(iter) :
bch2_btree_iter_peek_prev(iter);
}
static inline struct bkey_s_c bch2_btree_iter_peek_type(struct btree_iter *iter,
unsigned flags)
{
return flags & BTREE_ITER_SLOTS ? bch2_btree_iter_peek_slot(iter) :
return flags & BTREE_ITER_slots ? bch2_btree_iter_peek_slot(iter) :
bch2_btree_iter_peek(iter);
}
......@@ -634,7 +637,7 @@ static inline struct bkey_s_c bch2_btree_iter_peek_upto_type(struct btree_iter *
struct bpos end,
unsigned flags)
{
if (!(flags & BTREE_ITER_SLOTS))
if (!(flags & BTREE_ITER_slots))
return bch2_btree_iter_peek_upto(iter, end);
if (bkey_gt(iter->pos, end))
......@@ -699,16 +702,12 @@ transaction_restart: \
_ret2 ?: trans_was_restarted(_trans, _restart_count); \
})
#define for_each_btree_key_upto(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _do) \
#define for_each_btree_key_upto_continue(_trans, _iter, \
_end, _flags, _k, _do) \
({ \
struct btree_iter _iter; \
struct bkey_s_c _k; \
int _ret3 = 0; \
\
bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
\
do { \
_ret3 = lockrestart_do(_trans, ({ \
(_k) = bch2_btree_iter_peek_upto_type(&(_iter), \
......@@ -724,6 +723,21 @@ transaction_restart: \
_ret3; \
})
#define for_each_btree_key_continue(_trans, _iter, _flags, _k, _do) \
for_each_btree_key_upto_continue(_trans, _iter, SPOS_MAX, _flags, _k, _do)
#define for_each_btree_key_upto(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _do) \
({ \
bch2_trans_begin(trans); \
\
struct btree_iter _iter; \
bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
\
for_each_btree_key_upto_continue(_trans, _iter, _end, _flags, _k, _do);\
})
#define for_each_btree_key(_trans, _iter, _btree_id, \
_start, _flags, _k, _do) \
for_each_btree_key_upto(_trans, _iter, _btree_id, _start, \
......@@ -794,14 +808,6 @@ __bch2_btree_iter_peek_and_restart(struct btree_trans *trans,
return k;
}
#define for_each_btree_key_old(_trans, _iter, _btree_id, \
_start, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
(_start), (_flags)); \
(_k) = __bch2_btree_iter_peek_and_restart((_trans), &(_iter), _flags),\
!((_ret) = bkey_err(_k)) && (_k).k; \
bch2_btree_iter_advance(&(_iter)))
#define for_each_btree_key_upto_norestart(_trans, _iter, _btree_id, \
_start, _end, _flags, _k, _ret) \
for (bch2_trans_iter_init((_trans), &(_iter), (_btree_id), \
......@@ -861,6 +867,7 @@ __bch2_btree_iter_peek_and_restart(struct btree_trans *trans,
})
void bch2_trans_updates_to_text(struct printbuf *, struct btree_trans *);
void bch2_btree_path_to_text(struct printbuf *, struct btree_trans *, btree_path_idx_t);
void bch2_trans_paths_to_text(struct printbuf *, struct btree_trans *);
void bch2_dump_trans_updates(struct btree_trans *);
void bch2_dump_trans_paths_updates(struct btree_trans *);
......
This diff is collapsed.
......@@ -70,4 +70,6 @@ void bch2_shoot_down_journal_keys(struct bch_fs *, enum btree_id,
unsigned, unsigned,
struct bpos, struct bpos);
void bch2_journal_keys_dump(struct bch_fs *);
#endif /* _BCACHEFS_BTREE_JOURNAL_ITER_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment