Commits · 65d22e911bfc4f46cda4751f1b1926b43c316c14 · nexedi / linux

11 Nov, 2013 40 commits

bcache: Move spinlock into struct time_stats · 65d22e91
Kent Overstreet authored Jul 31, 2013
```
Minor cleanup.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
65d22e91

bcache: Kill sequential_merge option · 8aee1220

Kent Overstreet authored Jul 30, 2013

It never really made sense to expose this, so just kill it.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

8aee1220

bcache: Kill bch_next_recurse_key() · 50310164

Kent Overstreet authored Sep 10, 2013

This dates from before the btree iterator, and now it's finally gone
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

50310164

bcache: Avoid deadlocking in garbage collection · bc9389ee

Kent Overstreet authored Sep 10, 2013

Not a complete fix - we could still deadlock if btree_insert_node() has
to split...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

bc9389ee

bcache: Incremental gc · a1f0358b

Kent Overstreet authored Sep 10, 2013

Big garbage collection rewrite; now, garbage collection uses the same
mechanisms as used elsewhere for inserting/updating btree node pointers,
instead of rewriting interior btree nodes in place.

This makes the code significantly cleaner and less fragile, and means we
can now make garbage collection incremental - it doesn't have to hold a
write lock on the root of the btree for the entire duration of garbage
collection.

This means that there's less of a latency hit for doing garbage
collection, which means we can gc more frequently (and do a better job
of reclaiming from the cache), and we can coalesce across more btree
nodes (improving our space efficiency).
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

a1f0358b

bcache: Add make_btree_freeing_key() · 8835c123

Kent Overstreet authored Jul 24, 2013

Refactoring, prep work for incremental garbage collection.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

8835c123

bcache: Add btree_node_write_sync() · f269af5a

Kent Overstreet authored Jul 23, 2013

More refactoring - mostly making the interfaces more explicit about what
we actually want to do.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

f269af5a

bcache: PRECEDING_KEY() · 0eacac22

Kent Overstreet authored Jul 01, 2013

btree_insert_key() was open coding this, this is just refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

0eacac22

bcache: bch_(btree|extent)_ptr_invalid() · d5cc66e9

Kent Overstreet authored Jul 24, 2013

Trying to treat btree pointers and leaf node pointers the same way was a
mistake - going to start being more explicit about the type of
key/pointer we're dealing with. This is the first part of that
refactoring; this patch shouldn't change any actual behaviour.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

d5cc66e9

bcache: Don't bother with bucket refcount for btree node allocations · 3a3b6a4e

Kent Overstreet authored Jul 24, 2013

The bucket refcount (dropped with bkey_put()) is only needed to prevent
the newly allocated bucket from being garbage collected until we've
added a pointer to it somewhere. But for btree node allocations, the
fact that we have btree nodes locked is enough to guard against races
with garbage collection.

Eventually the per bucket refcount is going to be replaced with
something specific to bch_alloc_sectors().
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

3a3b6a4e

bcache: Debug code improvements · 280481d0

Kent Overstreet authored Oct 24, 2013

Couple changes:
 * Consolidate bch_check_keys() and bch_check_key_order(), and move the
   checks that only check_key_order() could do to bch_btree_iter_next().

 * Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in
   when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to
   flip on the EDEBUG checks at runtime.

 * Dropped an old not terribly useful check in rw_unlock(), and
   refactored/improved a some of the other debug code.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

280481d0

bcache: Fix bch_ptr_bad() · e58ff155

Kent Overstreet authored Jul 24, 2013

Previously, bch_ptr_bad() could return false when there was a pointer to
a nonexistant device... it only filtered out keys with PTR_CHECK_DEV
pointers.

This behaviour was intended for multiple cache device support; for that,
just because the device for one of the pointers has gone away doesn't
mean we want to filter out the rest of the pointers.

But we don't yet explicitly filter/check individual pointers, so without
that this behaviour was wrong - a corrupt bkey with a bad device pointer
could cause us to deref a bad pointer. Doh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

e58ff155

bcache: Pull on disk data structures out into a separate header · 81ab4190

Kent Overstreet authored Oct 31, 2013

Now, the on disk data structures are in a header that can be exported to
userspace - and having them all centralized is nice too.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

81ab4190

bcache: Move sector allocator to alloc.c · 2599b53b
Kent Overstreet authored Jul 24, 2013
```
Just reorganizing things a bit.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
2599b53b

bcache: Break up struct search · 220bb38c

Kent Overstreet authored Sep 10, 2013

With all the recent refactoring around struct btree op struct search has
gotten rather large.

But we can now easily break it up in a different way - we break out
struct btree_insert_op which is for inserting data into the cache, and
that's now what the copying gc code uses - struct search is now specific
to request.c
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

220bb38c

bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes() · cc7b8819

Kent Overstreet authored Jul 24, 2013

Last of the btree_map() conversions. Main visible effect is
bch_btree_insert() is no longer taking a struct btree_op as an argument
anymore - there's no fancy state machine stuff going on, it's just a
normal function.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

cc7b8819

bcache: Don't use op->insert_collision · 6054c6d4

Kent Overstreet authored Jul 24, 2013

When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we
won't be passing struct btree_op to bch_btree_insert() anymore - so we
need a different way of returning whether there was a collision (really,
a replace collision).
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

6054c6d4

bcache: Kill op->replace · 1b207d80

Kent Overstreet authored Sep 10, 2013

This is prep work for converting bch_btree_insert to
bch_btree_map_leaf_nodes() - we have to convert all its arguments to
actual arguments. Bunch of churn, but should be straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

1b207d80

bcache: Drop some closure stuff · faadf0c9

Kent Overstreet authored Nov 01, 2013

With a the recent bcache refactoring, some of the closure code isn't
needed anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

faadf0c9

bcache: Kill op->cl · b54d6934

Kent Overstreet authored Jul 24, 2013

This isn't used for waiting asynchronously anymore - so this is a fairly
trivial refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

b54d6934

bcache: Prune struct btree_op · c18536a7

Kent Overstreet authored Jul 24, 2013

Eventual goal is for struct btree_op to contain only what is necessary
for traversing the btree.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

c18536a7

bcache: Clean up cache_lookup_fn · cc231966

Kent Overstreet authored Jul 24, 2013

There was some looping in submit_partial_cache_hit() and
submit_partial_cache_hit() that isn't needed anymore - originally, we
wouldn't necessarily process the full hit or miss all at once because
when splitting the bio, we took into account the restrictions of the
device we were sending it to.

But, device bio size restrictions are now handled elsewhere, with a
wrapper around generic_make_request() - so that looping has been
unnecessary for awhile now and we can now do quite a bit of cleanup.

And if we trim the key we're reading from to match the subset we're
actually reading, we don't have to explicitly calculate bi_sector
anymore. Neat.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

cc231966

bcache: Convert bch_btree_read_async() to bch_btree_map_keys() · 2c1953e2

Kent Overstreet authored Jul 24, 2013

This is a fairly straightforward conversion, mostly reshuffling -
op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the
code for handling cache hits and misses wasn't really btree code, so it
gets moved to request.c.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

2c1953e2

bcache: Move some stuff to btree.c · df8e8970

Kent Overstreet authored Jul 24, 2013

With the new btree_map() functions, we don't need to export the stuff
needed for traversing the btree anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

df8e8970

bcache: Add btree_map() functions · 48dad8ba

Kent Overstreet authored Sep 10, 2013

Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.

This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction  is a
big improvement w.r.t. overall code quality.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

48dad8ba

bcache: Convert writeback to a kthread · 5e6926da

Kent Overstreet authored Jul 24, 2013

This simplifies the writeback flow control quite a bit - previously, it
was conceptually two coroutines, refill_dirty() and read_dirty(). This
makes the code quite a bit more straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

5e6926da

bcache: Convert gc to a kthread · 72a44517

Kent Overstreet authored Oct 24, 2013

We needed a dedicated rescuer workqueue for gc anyways... and gc was
conceptually a dedicated thread, just one that wasn't running all the
time. Switch it to a dedicated thread to make the code a bit more
straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

72a44517

bcache: Convert bucket_wait to wait_queue_head_t · 35fcd848

Kent Overstreet authored Jul 24, 2013

At one point we did do fancy asynchronous waiting stuff with
bucket_wait, but that's all gone (and bucket_wait is used a lot less
than it used to be). So use the standard primitives.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

35fcd848

bcache: Convert try_wait to wait_queue_head_t · e8e1d468

Kent Overstreet authored Jul 24, 2013

We never waited on c->try_wait asynchronously, so just use the standard
primitives.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

e8e1d468

bcache: Move keylist out of btree_op · 0b93207a

Kent Overstreet authored Jul 24, 2013

Slowly working on pruning struct btree_op - the aim is for it to only
contain things that are actually necessary for traversing the btree.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

0b93207a

bcache: Refactor journalling flow control · a34a8bfd

Kent Overstreet authored Oct 24, 2013

Making things less asynchronous that don't need to be - bch_journal()
only has to block when the journal or journal entry is full, which is
emphatically not a fast path. So make it a normal function that just
returns when it finishes, to make the code and control flow easier to
follow.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

a34a8bfd

bcache: Refactor read request code a bit · cdd972b1
Kent Overstreet authored Sep 10, 2013
```
More refactoring, and renaming.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
cdd972b1

bcache: Refactor request_write() · 84f0db03

Kent Overstreet authored Jul 24, 2013

Try to improve some of the naming a bit to be more consistent, and also
improve the flow of control in request_write() a bit.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

84f0db03

bcache: Clean up keylist code · c2f95ae2
Kent Overstreet authored Jul 24, 2013
```
More random refactoring.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
c2f95ae2

bcache: Add explicit keylist arg to btree_insert() · 4f3d4014

Kent Overstreet authored Sep 10, 2013

Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

4f3d4014

bcache: Convert btree_insert_check_key() to btree_insert_node() · e7c590eb

Kent Overstreet authored Sep 10, 2013

This was the main point of all this refactoring - now,
btree_insert_check_key() won't fail just because the leaf node happened
to be full.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

e7c590eb

bcache: Insert multiple keys at a time · 403b6cde

Kent Overstreet authored Jul 24, 2013

We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.

Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.

With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

403b6cde

bcache: Add btree_insert_node() · 26c949f8

Kent Overstreet authored Sep 10, 2013

The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.

The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.

This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.

This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

26c949f8

bcache: Explicitly track btree node's parent · d6fd3b11

Kent Overstreet authored Jul 24, 2013

This is prep work for the reworked btree insertion code.

The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.

I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

d6fd3b11

bcache: Remove unnecessary check in should_split() · 8304ad4d

Kent Overstreet authored Jul 24, 2013

Checking i->seq was redundant, because since ages ago we always
initialize the new bset when advancing b->written
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

8304ad4d