Commits · 539f1af037858b905c50c560f2a608555d8457ff · mirror / ccan

01 Dec, 2010 7 commits

tdb2: use magic freetable value rather than different magic for coalescing · 539f1af0

Rusty Russell authored Dec 01, 2010

We have to unlock during coalescing, so we mark records specially to indicate
to tdb_check that they're not on any list, and to prevent other coalescers
from grabbing them.

Use a special free list number, rather than a new magic.

539f1af0

tdb2: compare the extra 11 hash bits in the header. · 11564894

Rusty Russell authored Dec 01, 2010

We already have 10 hash bits encoded in the offset itself; we only get
here incorrectly about 1 time in 1000, so it's a pretty minor
optimization at best.

Nonetheless, we have the information, so let's check it before
accessing the key.  This reduces the probability of a false keycmp by
another factor of 2000.

11564894

tdb2: add comparison stats · f2c286c9
Rusty Russell authored Dec 01, 2010

f2c286c9

tdb2: clean up logging · 4e185ad8

Rusty Russell authored Dec 01, 2010

Logged errors should always set tdb->ecode before they are called, and
there's little reason to have a sprintf-style logging function since
we can do the formatting internally.

Change the tdb_log attribute to just take a "const char *", and create
a new tdb_logerr() helper which sets ecode and calls it.  As a bonus,
mark it COLD so the compiler can optimize appropriately knowing that
it's unlikely to be invoked.

4e185ad8

tdb2: remove truncated bit. · 57680260

Rusty Russell authored Dec 01, 2010

There was an idea that we would use a bit to indicate that we didn't have
the full hash value; this would allow us to move records down when we
expanded a hash without rehashing them.

There's little evidence that rehashing in this case is particularly
expensive, so remove the bit.  We use that bit simply to indicate that
an offset refers to a subhash instead.

57680260

tdb2: use direct access for tdb_read_off/tdb_write_off · b44978e1

Rusty Russell authored Dec 01, 2010

This is one case where getting rid of tdb_get() cost us.  Also, we
add more read-only checks.

Before we removed tdb_get:
Adding 1000000 records:  6480 ns (59900296 bytes)
Finding 1000000 records:  2839 ns (59900296 bytes)
Missing 1000000 records:  2485 ns (59900296 bytes)
Traversing 1000000 records:  2598 ns (59900296 bytes)
Deleting 1000000 records:  5342 ns (59900296 bytes)
Re-adding 1000000 records:  5613 ns (59900296 bytes)
Appending 1000000 records:  12194 ns (93594224 bytes)
Churning 1000000 records:  14549 ns (93594224 bytes)

Now:
Adding 1000000 records:  6307 ns (59900296 bytes)
Finding 1000000 records:  2801 ns (59900296 bytes)
Missing 1000000 records:  2515 ns (59900296 bytes)
Traversing 1000000 records:  2579 ns (59900296 bytes)
Deleting 1000000 records:  5225 ns (59900296 bytes)
Re-adding 1000000 records:  5878 ns (59900296 bytes)
Appending 1000000 records:  12665 ns (93594224 bytes)
Churning 1000000 records:  16090 ns (93594224 bytes)

b44978e1

tdb2: remove tdb_get() · 96b169e9

Rusty Russell authored Dec 01, 2010

We have four internal helpers for reading data from the database:
1) tdb_read_convert() - read (and convert) into a buffer.
2) tdb_read_off() - read (and convert) and offset.
3) tdb_access_read() - malloc or direct access to the database.
4) tdb_get() - copy into a buffer or direct access to the database.

The last one doesn't really buy us anything, so remove it (except for
tdb_read_off/tdb_write_off, see next patch).

Before:
Adding 1000000 records:  6480 ns (59900296 bytes)
Finding 1000000 records:  2839 ns (59900296 bytes)
Missing 1000000 records:  2485 ns (59900296 bytes)
Traversing 1000000 records:  2598 ns (59900296 bytes)
Deleting 1000000 records:  5342 ns (59900296 bytes)
Re-adding 1000000 records:  5613 ns (59900296 bytes)
Appending 1000000 records:  12194 ns (93594224 bytes)
Churning 1000000 records:  14549 ns (93594224 bytes)

After:
Adding 1000000 records:  6497 ns (59900296 bytes)
Finding 1000000 records:  2854 ns (59900296 bytes)
Missing 1000000 records:  2563 ns (59900296 bytes)
Traversing 1000000 records:  2735 ns (59900296 bytes)
Deleting 1000000 records:  11357 ns (59900296 bytes)
Re-adding 1000000 records:  8145 ns (59900296 bytes)
Appending 1000000 records:  10939 ns (93594224 bytes)
Churning 1000000 records:  18479 ns (93594224 bytes)

96b169e9

23 Nov, 2010 1 commit
- tdb2: trivial optimization for free list · afc3c1e7
  Rusty Russell authored Nov 23, 2010
```
We currently only have one, so shortcut the case where we want our current
one.
```
  afc3c1e7
01 Dec, 2010 1 commit
- tdb2: Add stats attribute. · fe55330a
  Rusty Russell authored Dec 01, 2010
```
This is good for deep debugging.
```
  fe55330a
22 Nov, 2010 2 commits

tdb2: relax locking to allow two free list locks at once · a5b66d70

Rusty Russell authored Nov 22, 2010

As long as they are in descending order.  This prevents the common case of:

1) Grab lock for bucket.
2) Remove entry from bucket.
3) Drop lock for bucket.
4) Grab lock for bucket for leftover.
5) Add leftover entry to bucket.
6) Drop lock for leftover bucket.

In particular it's quite common for the leftover bucket to be the same
as the entry bucket (when it's the largest bucket); if it's not, we are
no worse than before.

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  13194 ns (60083192 bytes)
Finding 1000000 records:  2438 ns (60083192 bytes)
Traversing 1000000 records:  2167 ns (60083192 bytes)
Deleting 1000000 records:  9265 ns (60083192 bytes)
Re-adding 1000000 records:  10241 ns (60083192 bytes)
Appending 1000000 records:  17692 ns (93879992 bytes)
Churning 1000000 records:  26077 ns (93879992 bytes)

Previous:
$ ./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

a5b66d70

tdb2: use expansion heuristics from tdb1 · 20defbbc

Rusty Russell authored Nov 22, 2010

This reduces the amount of expansion we do.

Before:
./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

After:
$ ./speed 1000000
Adding 1000000 records:  17480 ns (61472904 bytes)
Finding 1000000 records:  2431 ns (61472904 bytes)
Traversing 1000000 records:  2194 ns (61472904 bytes)
Deleting 1000000 records:  10948 ns (61472904 bytes)
Re-adding 1000000 records:  11247 ns (61472904 bytes)
Appending 1000000 records:  21826 ns (96051424 bytes)
Churning 1000000 records:  27242 ns (96051424 bytes)

20defbbc

01 Dec, 2010 1 commit

tdb2: shrink free header from 32 to 24 bytes. · 5e30abc6

Rusty Russell authored Dec 01, 2010

This reduces our minimum key+data length to 8 bytes; we do this by packing
the prev pointer where we used to put the flist pointer, and storing the
flist as an 8 bit index (meaning we can only have 256 free tables).

Note that this has a perverse result on the size of the database, as our
4-byte key and 4-byte data now fit perfectly in a minimal record, so
appeding causes us to allocate new records which are 50% larger,
since we detect growing.

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

Previous:
$ ./speed 1000000
Adding 1000000 records:  28324 ns (67232528 bytes)
Finding 1000000 records:  2468 ns (67232528 bytes)
Traversing 1000000 records:  2200 ns (67232528 bytes)
Deleting 1000000 records:  13083 ns (67232528 bytes)
Re-adding 1000000 records:  16433 ns (67232528 bytes)
Appending 1000000 records:  2511 ns (67232528 bytes)
Churning 1000000 records:  31068 ns (67570448 bytes)

5e30abc6

23 Nov, 2010 1 commit
- tdb2: rename set_header to the more appropriate set_used_header. · dfae76fd
  Rusty Russell authored Nov 23, 2010
  
  dfae76fd
01 Dec, 2010 5 commits

tdb2: Add speed test to tdb and tdb2 · 076c398e

Rusty Russell authored Dec 01, 2010

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  14726 ns (67244816 bytes)
Finding 1000000 records:  2844 ns (67244816 bytes)
Missing 1000000 records:  2528 ns (67244816 bytes)
Traversing 1000000 records:  2572 ns (67244816 bytes)
Deleting 1000000 records:  5358 ns (67244816 bytes)
Re-adding 1000000 records:  9176 ns (67244816 bytes)
Appending 1000000 records:  3035 ns (67244816 bytes)
Churning 1000000 records:  18139 ns (67565840 bytes)
$ ./speed 100000
Adding 100000 records:  13270 ns (14349584 bytes)
Finding 100000 records:  2769 ns (14349584 bytes)
Missing 100000 records:  2422 ns (14349584 bytes)
Traversing 100000 records:  2595 ns (14349584 bytes)
Deleting 100000 records:  5331 ns (14349584 bytes)
Re-adding 100000 records:  5875 ns (14349584 bytes)
Appending 100000 records:  2751 ns (14349584 bytes)
Churning 100000 records:  20666 ns (25771280 bytes)

vs tdb1 (with hashsize 100003):
$ ./speed 1000000
Adding 1000000 records:  8547 ns (44306432 bytes)
Finding 1000000 records:  5595 ns (44306432 bytes)
Missing 1000000 records:  3469 ns (44306432 bytes)
Traversing 1000000 records:  4571 ns (44306432 bytes)
Deleting 1000000 records:  12115 ns (44306432 bytes)
Re-adding 1000000 records:  10505 ns (44306432 bytes)
Appending 1000000 records:  10610 ns (44306432 bytes)
Churning 1000000 records:  28697 ns (44306432 bytes)
$ ./speed 100000
Adding 100000 records:  6030 ns (4751360 bytes)
Finding 100000 records:  3141 ns (4751360 bytes)
Missing 100000 records:  3143 ns (4751360 bytes)
Traversing 100000 records:  4659 ns (4751360 bytes)
Deleting 100000 records:  7891 ns (4751360 bytes)
Re-adding 100000 records:  5913 ns (4751360 bytes)
Appending 100000 records:  4242 ns (4751360 bytes)
Churning 100000 records:  15300 ns (4751360 bytes)

076c398e

tdb2: fix tdb_check() return when free table entries missing. · dbbde019
Rusty Russell authored Dec 01, 2010
```
It mistakenly returned -1 meaning "success".
```
dbbde019
tdb2: make tdb_check call check() function. · c84c6577
Rusty Russell authored Dec 01, 2010

c84c6577
tdb2: make summary command handle recovery "dead zone" · ec88af5d
Rusty Russell authored Dec 01, 2010
```
We can run summary with a recovery area, or a dead zone.
```
ec88af5d
tdb2: cancel transactions on tdb_close · d95645d5
Rusty Russell authored Dec 01, 2010
```
Otherwise we leak memory.
```
d95645d5

23 Nov, 2010 6 commits

tdb2: enable transactions in tdbtorture · 04b2feef
Rusty Russell authored Nov 23, 2010

04b2feef

tdb2: stricter ordering on expansion lock · 554f3856

Rusty Russell authored Nov 23, 2010

It's problematic for transaction commit to get the expansion lock, but
in fact we always grab a hash lock before the transaction lock, so it
doesn't really need it (the transaction locks the entire database).

Assert that this is true, and fix up a few lowlevel tests where it wasn't.

554f3856

tdb2: remove all the dead code · 6520c831

Rusty Russell authored Nov 23, 2010

I left much tdb1 code in various files for inspiration, and in case I needed
it later.  Now we have all the major features implemented, remove it.

6520c831

tdb2: transaction support · 5e8b9af5

Rusty Russell authored Nov 23, 2010

This adds transactions to tdb2; the code is taken from tdb1 with minimal
modifications, as are the unit

5e8b9af5

tdb2: allow nesting of read locks on top of write locks. · 49c1b2e3

Rusty Russell authored Nov 23, 2010

If we have a write lock and ask for a read lock, that's OK, but not the
other way around.  tdb_nest_lock() allowed both, tdb_allrecord_lock() allowed
neither.

49c1b2e3

ccanlint: fix -x core dump · dde92439

Rusty Russell authored Nov 23, 2010

This wasn't fixed when we converted to ccan/opt in 8d706678.

Unfortunately, unistd.h defines optarg, so the compiler didn't catch
this.

dde92439

17 Nov, 2010 5 commits

tdb2: handle chains of free tables · ef9dec60

Rusty Russell authored Nov 17, 2010

This adds chains of free tables: we choose one at random at the start and
we iterate through them when they are exhausted.

Currently there is no code to actually add a new free table, but we test
that we can handle it if we add one in future.

ef9dec60

tdb2: get rid of zones · d70577b6

Rusty Russell authored Nov 17, 2010

Zones were a bad idea.  They mean we can't simply add stuff to the end
of the file (which transactions relied upon), and there's no good heuristic
in how to size them.

This patch reverts us to a single free table.

d70577b6

tdb2: fix bucket search · 2ecf943a
Rusty Russell authored Nov 17, 2010
```
We were previously jumping straight from the first bucket to the end.
```
2ecf943a

tdb2: only adjust size once when growing · e984ef66

Rusty Russell authored Nov 17, 2010

We were adding 50% to datalen twice, so move it out of adjust_size and
make the callers do it.

We also add a test that the heuristic is working at all.

e984ef66

tdb2: remove tailer · d1383862
Rusty Russell authored Nov 17, 2010
```
We don't actually need it.
```
d1383862

15 Nov, 2010 9 commits

tdb2: fix tdb_chainlock · c5e3f07a

Rusty Russell authored Nov 15, 2010

We can't enlarge the lock without risking deadlock, so tdb_chainlock() can't
simply grab a one-byte lock; it needs to grab the lock we'd need to protect
the hash.

In theory, tdb_chainlock_read() could use a one-byte lock though.

c5e3f07a

tdb2: fix coalesce race #3 · 06a5b1a8

Rusty Russell authored Nov 15, 2010

When we're coalescing, we need to drop the lock on the current free list, as
we've enlarged the block and it may now belong in a different list.

Unfortunately (as shown by repeated tdbtorture -n 8) another coalescing run
can do the coalescing while we've dropped the lock. So for this case, we
use the TDB_COALESCING_MAGIC flag so it doesn't look free.

06a5b1a8

tdb2: add TDB_COALESCING_MAGIC to solve coalescing race. · 56ea2c52
Rusty Russell authored Nov 15, 2010
```
A new special record marker to indicate coalescing is in progress.
```
56ea2c52

tdb2: fix coalesce race #2 · d2a4d6b4

Rusty Russell authored Nov 15, 2010

When we find a free block, we need to mark it as used before we drop the
free lock, even though we've removed it from the list. Otherwise the
coalescing code can find it.

This means handing the information we need down to lock_and_alloc, which
also means we know when we're grabbing a "growing" entry, and can relax
the heuristics about finding a good-sized block in that case.

d2a4d6b4

tdb2: coalescing race fix #1 · b5479009

Rusty Russell authored Nov 15, 2010

When coalescing, we check the adjacent entry then lock its free list: we
need to *recheck* after locking, to make sure it's still in that free list.

b5479009

tdb2: minor optimization for set_header · 8afb9681

Rusty Russell authored Nov 15, 2010

We actually only need the bottom 5 bits of the hash value, so don't waste
8 bytes passing it.

8afb9681

tdb2: hoist adjust_size · 590eee6f

Rusty Russell authored Nov 15, 2010

We're going to want it in get_free() in the next patch, so move it upwards.
Trivial changes, too: add to size before min length check, and rename growing
to want_extra.

590eee6f

tdb2: clean up makefile for tools · fdba839b
Rusty Russell authored Nov 15, 2010

fdba839b
tdb2: extra debugging checks · b371060f
Rusty Russell authored Nov 15, 2010

b371060f

17 Nov, 2010 2 commits

ccanlint: add ccanlint section to _info · a791eb1e

Rusty Russell authored Nov 17, 2010

This supersedes the previous Fails: section, into a more general set of
lines of form:

      <testname> <option>...

With the special <option> "FAIL" to mean we know we fail this test.
We accept options to valgrind-tests; in particular tdb2 wants
--partial-loads-ok=yes passed to valgrind.

a791eb1e

ccanlint: override _info's Fails: with --target · 7ddfc669
Rusty Russell authored Nov 17, 2010
```
I wanted to see what happened with tdb2's valgrind test (suppressed in the
_info file).
```
7ddfc669