- 14 Jul, 2024 40 commits
-
-
Kent Overstreet authored
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Pankaj Raghav authored
Set the preferred folio order in the fgp_flags by calling fgf_set_order(). Page cache will try to allocate large folio of the preferred order whenever possible instead of allocating multiple 0 order folios. This improves the buffered write performance up to 1.25x with default mount options and up to 1.57x when mounted with no_data_io option with the following fio workload: fio --name=bcachefs --filename=/mnt/test --size=100G \ --ioengine=io_uring --iodepth=16 --rw=write --bs=128k Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Pankaj Raghav authored
Use FGP_WRITEBEGIN to avoid repeating the individual FGP flags before starting a buffered write. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
gc_lock is now only for synchronization between check_alloc_info and interior btree updates - nothing else Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Brian Foster authored
Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
we have a separate helper for releasing write locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
gc_pos is now based on keys, not nodes, for invariantness w.r.t. splits and merges Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
The transaction commit path takes mark_lock, so we shouldn't be holding it; use a bpos as an iterator so that we can drop and retake. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a new helper to free zeroed out accounting entries, and use it in bch2_replicas_gc2(); bch2_replicas_gc2() was killing superblock replicas entries if their corresponding accounting counters were nonzero, but that's incorrect - the superblock replicas entry needs to exist if the accounting entry exists, not if it's nonzero, because we check and create the replicas entry when creating the new accounting entry - we don't know when it's becoming nonzero. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Break up the percpu counter allocations into individual allocations for each disk accounting counter; this fixes an issue on large systems where we have too many replica entries to for the percpu allocator's max practical size. Also, use just one eytzinger tree for the normal set of counters and the gc counters; this simplifies accounting_gc_done() where we need the same set of counters to be present in both tables. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Brian Foster authored
smatch warns that the copy of arg to userspace is a potential data leak by virtue of arg.pad not being checked or zeroed. This was introduced by the commit referenced below that switched arg from being a zeroed runtime allocation to living on the stack. Fix by simply zero initializing the structure. Fixes: cde738a61e65 ("bcachefs: Convert bch2_ioctl_fs_usage() to new accounting") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_accounting_mem_insert() drops and retakes mark_lock; thus, we need to check if the entry in question has already been inserted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Ariel Miculas authored
The commit 65bd4423 ("bcachefs: bch2_btree_insert_trans() no longer specifies BTREE_ITER_cached") removes BTREE_ITER_cached from bch2_btree_insert_trans, which causes the update_inode function from bcachefs-tools to take a long time (~20s). Add an iter_flags parameter to bch2_btree_insert, so the users can specify iter update trigger flags, such as BTREE_ITER_cached. Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a new ioctl that can return the new accounting counter types; it takes as input a bitmask of accounting types to return. This will be used for returning e.g. compression accounting and rebalance_work accounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Reed Riley authored
By removing the early-exit when REMAP_FILE_DEDUP is set, we should be able to support the fideduperange ioctl, albeit less efficiently than if we handled some of the extent locking and comparison logic inside bcachefs. Extent comparison logic already exists inside of `__generic_remap_file_range_prep`. Signed-off-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Hongbo Li authored
Implement support for FS_IOC_SETFSLABEL ioctl to set filesystem label. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Hongbo Li authored
Implement support for FS_IOC_GETFSLABEL ioctl to read filesystem label. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Hongbo Li authored
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from inode, after that, users can list file's generation number by using "lsattr". Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We can't hold locks while waiting for user input, that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Youling Tang authored
Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations. The output in trace is as follows: sync-29779 [000] ..... 193.700935: bch2_sync_fs: dev 254,16 wait 1 <...>-40027 [002] ..... 342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1 Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Youling Tang authored
We already using mapping_set_error() in bch2_writepage_io_done(), so all we need to do is to use file_check_and_advance_wb_err() when handling fsync() requests in bch2_fsync(). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Ariel Miculas authored
Commit 0c0cbfdb dropped the ctx->pos update before the call to dir_emit. This breaks the userspace implementation, causing the directory reads to be stuck in an infinite loop. This doesn't happen in the kernel because the vfs handles the updates to ctx->pos, but in the fuse implementation nobody updates it. Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We now read the line from the buffer atomically, which means we have to allow the buffer to grow past STDIO_REDIRECT_BUFSIZE if we're waiting for a full line - this behaviour is necessary for stdio_redirect_readline_timeout() in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
fsck_err() now optionally takes a btree_trans; if the current thread has one, it is required that it be passed. The next patch will use this to unlock when waiting for user input. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Make things more consistent and ensure that we're using u64 bitfields - key types and btree ids are already around 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
the order in which btree_gc walks keys have changed, so we no longer have the sort of issues with online fsck this assertion was warning about. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Needed for online fsck; we need the trigger to initialize newly allocated buckets and generation number changes while gc is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Next change will move gc_alloc_start initialization into the alloc trigger, so we have to mark those first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Blocking the journal was needed to finish checking old style accounting, but that code is gone and it's not needed in the alloc rewrite, mark_lock is sufficient for synchronization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
- improve error paths - call bch2_fs_start() separately, after applying late-parsed options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Fold into bch2_fs_get_tree() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
The btree write buffer takes as input keys from the journal, sorts them, deduplicates them, and flushes them back to the btree in sorted order. The disk space accounting rewrite is moving accounting to normal btree keys, with update (in this case deltas) accumulated in the write buffer and then flushed to the btree; but this is going to increase the number of keys handled by the write buffer by perhaps as much as a factor of 3x-5x. The overhead from copying around and sorting this many keys would cause a significant performance regression, but: there is huge locality in updates to accounting keys that we can take advantage of. Instead of appending accounting keys to the list of keys to be sorted, this patch adds an eytzinger search tree of recently seen accounting keys. We look up the accounting key in the eytzinger search tree and apply the delta directly, adding it if it doesn't exist, and periodically prune the eytzinger tree of unused entries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-