Commits · 8f2bfcd370a0e00605a7d84600847b150255c533 · mirror / ccan

07 Dec, 2010 1 commit
- rbtree: add tests (which currently fail) · 8f2bfcd3
  Rusty Russell authored Dec 07, 2010
  
  8f2bfcd3
06 Dec, 2010 8 commits

idtree: fix overflow for v. large ids on allocation and removal · c5074939

Rusty Russell authored Dec 06, 2010

Chris Cowan tracked down a SEGV in sub_alloc: idp->level can actually
be equal to 7 (MAX_LEVEL) there, as it can be in sub_remove.

c5074939

idtree: fix right shift of signed ints · 9ad912e2

Rusty Russell authored Dec 06, 2010

(Imported from SAMBA commit 2db1987f5a3a)

Right-shifting signed integers in undefined; indeed it seems that on
AIX with their compiler, doing a 30-bit shift on (INT_MAX-200) gives
0, not 1 as we might expect (THIS WAS WRONG, REAL FIX LATER).

The obvious fix is to make id and oid unsigned: l (level count) is also
logically unsigned.

(Note: Samba doesn't generally get to ids > 1 billion, but ctdb does)
Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

9ad912e2

idtree: import fix from SAMBA source · 6994f3d0

Rusty Russell authored Dec 06, 2010

(Imported from b53f8c187de8)

Author: Rusty Russell <rusty@rustorp.com.au>
Date:   Thu Jun 10 13:27:51 2010 -0700

Since idtree assigns sequentially, it rarely reaches high numbers.
But such numbers can be forced with idr_get_new_above(), and that
reveals two bugs:

1) Crash in sub_remove() caused by pa array being too short.
2) Shift by more than 32 in _idr_find(), which is undefined, causing
   the "outside the current tree" optimization to misfire and return NULL.

6994f3d0

idtree: add unit test for ids around INT_MAX · d208258f
Rusty Russell authored Dec 06, 2010
```
This causes a SEGV on my laptop.
```
d208258f

ccanlint: ignore comments when parsing examples. · 637ee8ae

Rusty Russell authored Dec 06, 2010

We insert comments when we massage or combine examples; don't let these
throw off our analysis (as happened for idtree.h).

637ee8ae

ccanlint: show example we actually compiled when we report warnings. · 79c5dd6d
Rusty Russell authored Dec 06, 2010

79c5dd6d
ccanlint: examples_compile depends on build, which depends on build_objs · 9cca4b73
Rusty Russell authored Dec 06, 2010
```
We might as well use the compiled .o rather than all the little .o files.
```
9cca4b73

ccanlint: fix usage of static funcs · 9d511cc5

Rusty Russell authored Dec 06, 2010

Out-by-one error had us using character prior to declaration, eg, in
"static int *foo" we use "*foo".  This seems to compile, but is weird.

9d511cc5

03 Dec, 2010 1 commit

array: fix type introduced in handling !HAVE_TYPEOF · 117cc6cd

Rusty Russell authored Dec 03, 2010

Commit da72623a added a typo; ccanlint caught it, but doesn't consider
test compile failing to be fatal (it should!).

117cc6cd

01 Dec, 2010 13 commits

Merge branch 'tdb2' · 51a56b52
Rusty Russell authored Dec 01, 2010

51a56b52

tdb2: update documentation · a42bba8e

Rusty Russell authored Dec 01, 2010

Specifically the linked free tables, and reflect on the status of each
point of the design document.

a42bba8e

tdb2: use separate magic constants for chain, htable and ftable entries · 2491b65a

Rusty Russell authored Dec 01, 2010

Rather than overloading TDB_USED_MAGIC and the hash value as we do now.
We also rename "free list" to the more-accurate "free table" everywhere.

2491b65a

tdb2: direct access during transactions. · f5087965

Rusty Russell authored Dec 01, 2010

Currently we fall back to copying data during a transaction, but we don't
need to in many cases.  Grant direct access in those cases.

Before:
$ ./speed --transaction 1000000
Adding 1000000 records:  2409 ns (59916680 bytes)
Finding 1000000 records:  1156 ns (59916680 bytes)
Missing 1000000 records:  604 ns (59916680 bytes)
Missing 1000000 records:  604 ns (59916680 bytes)
Traversing 1000000 records:  1226 ns (59916680 bytes)
Deleting 1000000 records:  1556 ns (119361928 bytes)
Re-adding 1000000 records:  2326 ns (119361928 bytes)
Appending 1000000 records:  3269 ns (246656880 bytes)
Churning 1000000 records:  5613 ns (338235248 bytes)

After:
$ ./speed --transaction 1000000
Adding 1000000 records:  1902 ns (59916680 bytes)
Finding 1000000 records:  1032 ns (59916680 bytes)
Missing 1000000 records:  606 ns (59916680 bytes)
Missing 1000000 records:  606 ns (59916680 bytes)
Traversing 1000000 records:  741 ns (59916680 bytes)
Deleting 1000000 records:  1347 ns (119361928 bytes)
Re-adding 1000000 records:  1727 ns (119361928 bytes)
Appending 1000000 records:  2561 ns (246656880 bytes)
Churning 1000000 records:  4403 ns (338235248 bytes)

f5087965

tdb2: add write flag to tdb_direct · a56db4a5

Rusty Russell authored Dec 01, 2010

This is a precursor to direct access during transactions: they care about
whether we are going to read or write to the file.

a56db4a5

tdb2: hash chaining · 2523e67f

Rusty Russell authored Dec 01, 2010

If we get enough hash collisions, we can run out of hash bits; this almost
certainly is caused by a deliberate attempt to break the tdb (hash bombing).

Implement chained records for this case; they're slow but will keep the
rest of the database functioning.

2523e67f

tdb2: use magic freetable value rather than different magic for coalescing · 539f1af0

Rusty Russell authored Dec 01, 2010

We have to unlock during coalescing, so we mark records specially to indicate
to tdb_check that they're not on any list, and to prevent other coalescers
from grabbing them.

Use a special free list number, rather than a new magic.

539f1af0

tdb2: compare the extra 11 hash bits in the header. · 11564894

Rusty Russell authored Dec 01, 2010

We already have 10 hash bits encoded in the offset itself; we only get
here incorrectly about 1 time in 1000, so it's a pretty minor
optimization at best.

Nonetheless, we have the information, so let's check it before
accessing the key.  This reduces the probability of a false keycmp by
another factor of 2000.

11564894

tdb2: add comparison stats · f2c286c9
Rusty Russell authored Dec 01, 2010

f2c286c9

tdb2: clean up logging · 4e185ad8

Rusty Russell authored Dec 01, 2010

Logged errors should always set tdb->ecode before they are called, and
there's little reason to have a sprintf-style logging function since
we can do the formatting internally.

Change the tdb_log attribute to just take a "const char *", and create
a new tdb_logerr() helper which sets ecode and calls it.  As a bonus,
mark it COLD so the compiler can optimize appropriately knowing that
it's unlikely to be invoked.

4e185ad8

tdb2: remove truncated bit. · 57680260

Rusty Russell authored Dec 01, 2010

There was an idea that we would use a bit to indicate that we didn't have
the full hash value; this would allow us to move records down when we
expanded a hash without rehashing them.

There's little evidence that rehashing in this case is particularly
expensive, so remove the bit.  We use that bit simply to indicate that
an offset refers to a subhash instead.

57680260

tdb2: use direct access for tdb_read_off/tdb_write_off · b44978e1

Rusty Russell authored Dec 01, 2010

This is one case where getting rid of tdb_get() cost us.  Also, we
add more read-only checks.

Before we removed tdb_get:
Adding 1000000 records:  6480 ns (59900296 bytes)
Finding 1000000 records:  2839 ns (59900296 bytes)
Missing 1000000 records:  2485 ns (59900296 bytes)
Traversing 1000000 records:  2598 ns (59900296 bytes)
Deleting 1000000 records:  5342 ns (59900296 bytes)
Re-adding 1000000 records:  5613 ns (59900296 bytes)
Appending 1000000 records:  12194 ns (93594224 bytes)
Churning 1000000 records:  14549 ns (93594224 bytes)

Now:
Adding 1000000 records:  6307 ns (59900296 bytes)
Finding 1000000 records:  2801 ns (59900296 bytes)
Missing 1000000 records:  2515 ns (59900296 bytes)
Traversing 1000000 records:  2579 ns (59900296 bytes)
Deleting 1000000 records:  5225 ns (59900296 bytes)
Re-adding 1000000 records:  5878 ns (59900296 bytes)
Appending 1000000 records:  12665 ns (93594224 bytes)
Churning 1000000 records:  16090 ns (93594224 bytes)

b44978e1

tdb2: remove tdb_get() · 96b169e9

Rusty Russell authored Dec 01, 2010

We have four internal helpers for reading data from the database:
1) tdb_read_convert() - read (and convert) into a buffer.
2) tdb_read_off() - read (and convert) and offset.
3) tdb_access_read() - malloc or direct access to the database.
4) tdb_get() - copy into a buffer or direct access to the database.

The last one doesn't really buy us anything, so remove it (except for
tdb_read_off/tdb_write_off, see next patch).

Before:
Adding 1000000 records:  6480 ns (59900296 bytes)
Finding 1000000 records:  2839 ns (59900296 bytes)
Missing 1000000 records:  2485 ns (59900296 bytes)
Traversing 1000000 records:  2598 ns (59900296 bytes)
Deleting 1000000 records:  5342 ns (59900296 bytes)
Re-adding 1000000 records:  5613 ns (59900296 bytes)
Appending 1000000 records:  12194 ns (93594224 bytes)
Churning 1000000 records:  14549 ns (93594224 bytes)

After:
Adding 1000000 records:  6497 ns (59900296 bytes)
Finding 1000000 records:  2854 ns (59900296 bytes)
Missing 1000000 records:  2563 ns (59900296 bytes)
Traversing 1000000 records:  2735 ns (59900296 bytes)
Deleting 1000000 records:  11357 ns (59900296 bytes)
Re-adding 1000000 records:  8145 ns (59900296 bytes)
Appending 1000000 records:  10939 ns (93594224 bytes)
Churning 1000000 records:  18479 ns (93594224 bytes)

96b169e9

23 Nov, 2010 2 commits
- ccanlint: build depends if necessary · 451d97ad
  Rusty Russell authored Nov 23, 2010
```
Now it will build copies of other ccan deps if it can't find them.
```
  451d97ad
- tdb2: trivial optimization for free list · afc3c1e7
  Rusty Russell authored Nov 23, 2010
```
We currently only have one, so shortcut the case where we want our current
one.
```
  afc3c1e7
01 Dec, 2010 1 commit
- tdb2: Add stats attribute. · fe55330a
  Rusty Russell authored Dec 01, 2010
```
This is good for deep debugging.
```
  fe55330a
23 Nov, 2010 6 commits

tdb2: enable transactions in tdbtorture · 04b2feef
Rusty Russell authored Nov 23, 2010

04b2feef

tdb2: stricter ordering on expansion lock · 554f3856

Rusty Russell authored Nov 23, 2010

It's problematic for transaction commit to get the expansion lock, but
in fact we always grab a hash lock before the transaction lock, so it
doesn't really need it (the transaction locks the entire database).

Assert that this is true, and fix up a few lowlevel tests where it wasn't.

554f3856

tdb2: remove all the dead code · 6520c831

Rusty Russell authored Nov 23, 2010

I left much tdb1 code in various files for inspiration, and in case I needed
it later.  Now we have all the major features implemented, remove it.

6520c831

tdb2: transaction support · 5e8b9af5

Rusty Russell authored Nov 23, 2010

This adds transactions to tdb2; the code is taken from tdb1 with minimal
modifications, as are the unit

5e8b9af5

tdb2: allow nesting of read locks on top of write locks. · 49c1b2e3

Rusty Russell authored Nov 23, 2010

If we have a write lock and ask for a read lock, that's OK, but not the
other way around.  tdb_nest_lock() allowed both, tdb_allrecord_lock() allowed
neither.

49c1b2e3

ccanlint: fix -x core dump · dde92439

Rusty Russell authored Nov 23, 2010

This wasn't fixed when we converted to ccan/opt in 8d706678.

Unfortunately, unistd.h defines optarg, so the compiler didn't catch
this.

dde92439

22 Nov, 2010 2 commits

tdb2: relax locking to allow two free list locks at once · a5b66d70

Rusty Russell authored Nov 22, 2010

As long as they are in descending order.  This prevents the common case of:

1) Grab lock for bucket.
2) Remove entry from bucket.
3) Drop lock for bucket.
4) Grab lock for bucket for leftover.
5) Add leftover entry to bucket.
6) Drop lock for leftover bucket.

In particular it's quite common for the leftover bucket to be the same
as the entry bucket (when it's the largest bucket); if it's not, we are
no worse than before.

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  13194 ns (60083192 bytes)
Finding 1000000 records:  2438 ns (60083192 bytes)
Traversing 1000000 records:  2167 ns (60083192 bytes)
Deleting 1000000 records:  9265 ns (60083192 bytes)
Re-adding 1000000 records:  10241 ns (60083192 bytes)
Appending 1000000 records:  17692 ns (93879992 bytes)
Churning 1000000 records:  26077 ns (93879992 bytes)

Previous:
$ ./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

a5b66d70

tdb2: use expansion heuristics from tdb1 · 20defbbc

Rusty Russell authored Nov 22, 2010

This reduces the amount of expansion we do.

Before:
./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

After:
$ ./speed 1000000
Adding 1000000 records:  17480 ns (61472904 bytes)
Finding 1000000 records:  2431 ns (61472904 bytes)
Traversing 1000000 records:  2194 ns (61472904 bytes)
Deleting 1000000 records:  10948 ns (61472904 bytes)
Re-adding 1000000 records:  11247 ns (61472904 bytes)
Appending 1000000 records:  21826 ns (96051424 bytes)
Churning 1000000 records:  27242 ns (96051424 bytes)

20defbbc

01 Dec, 2010 1 commit

tdb2: shrink free header from 32 to 24 bytes. · 5e30abc6

Rusty Russell authored Dec 01, 2010

This reduces our minimum key+data length to 8 bytes; we do this by packing
the prev pointer where we used to put the flist pointer, and storing the
flist as an 8 bit index (meaning we can only have 256 free tables).

Note that this has a perverse result on the size of the database, as our
4-byte key and 4-byte data now fit perfectly in a minimal record, so
appeding causes us to allocate new records which are 50% larger,
since we detect growing.

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  23210 ns (59193360 bytes)
Finding 1000000 records:  2387 ns (59193360 bytes)
Traversing 1000000 records:  2150 ns (59193360 bytes)
Deleting 1000000 records:  13392 ns (59193360 bytes)
Re-adding 1000000 records:  11546 ns (59193360 bytes)
Appending 1000000 records:  29327 ns (91193360 bytes)
Churning 1000000 records:  33026 ns (91193360 bytes)

Previous:
$ ./speed 1000000
Adding 1000000 records:  28324 ns (67232528 bytes)
Finding 1000000 records:  2468 ns (67232528 bytes)
Traversing 1000000 records:  2200 ns (67232528 bytes)
Deleting 1000000 records:  13083 ns (67232528 bytes)
Re-adding 1000000 records:  16433 ns (67232528 bytes)
Appending 1000000 records:  2511 ns (67232528 bytes)
Churning 1000000 records:  31068 ns (67570448 bytes)

5e30abc6

23 Nov, 2010 1 commit
- tdb2: rename set_header to the more appropriate set_used_header. · dfae76fd
  Rusty Russell authored Nov 23, 2010
  
  dfae76fd
01 Dec, 2010 4 commits

tdb2: Add speed test to tdb and tdb2 · 076c398e

Rusty Russell authored Dec 01, 2010

Current results of speed test:
$ ./speed 1000000
Adding 1000000 records:  14726 ns (67244816 bytes)
Finding 1000000 records:  2844 ns (67244816 bytes)
Missing 1000000 records:  2528 ns (67244816 bytes)
Traversing 1000000 records:  2572 ns (67244816 bytes)
Deleting 1000000 records:  5358 ns (67244816 bytes)
Re-adding 1000000 records:  9176 ns (67244816 bytes)
Appending 1000000 records:  3035 ns (67244816 bytes)
Churning 1000000 records:  18139 ns (67565840 bytes)
$ ./speed 100000
Adding 100000 records:  13270 ns (14349584 bytes)
Finding 100000 records:  2769 ns (14349584 bytes)
Missing 100000 records:  2422 ns (14349584 bytes)
Traversing 100000 records:  2595 ns (14349584 bytes)
Deleting 100000 records:  5331 ns (14349584 bytes)
Re-adding 100000 records:  5875 ns (14349584 bytes)
Appending 100000 records:  2751 ns (14349584 bytes)
Churning 100000 records:  20666 ns (25771280 bytes)

vs tdb1 (with hashsize 100003):
$ ./speed 1000000
Adding 1000000 records:  8547 ns (44306432 bytes)
Finding 1000000 records:  5595 ns (44306432 bytes)
Missing 1000000 records:  3469 ns (44306432 bytes)
Traversing 1000000 records:  4571 ns (44306432 bytes)
Deleting 1000000 records:  12115 ns (44306432 bytes)
Re-adding 1000000 records:  10505 ns (44306432 bytes)
Appending 1000000 records:  10610 ns (44306432 bytes)
Churning 1000000 records:  28697 ns (44306432 bytes)
$ ./speed 100000
Adding 100000 records:  6030 ns (4751360 bytes)
Finding 100000 records:  3141 ns (4751360 bytes)
Missing 100000 records:  3143 ns (4751360 bytes)
Traversing 100000 records:  4659 ns (4751360 bytes)
Deleting 100000 records:  7891 ns (4751360 bytes)
Re-adding 100000 records:  5913 ns (4751360 bytes)
Appending 100000 records:  4242 ns (4751360 bytes)
Churning 100000 records:  15300 ns (4751360 bytes)

076c398e

tdb2: fix tdb_check() return when free table entries missing. · dbbde019
Rusty Russell authored Dec 01, 2010
```
It mistakenly returned -1 meaning "success".
```
dbbde019
tdb2: make tdb_check call check() function. · c84c6577
Rusty Russell authored Dec 01, 2010

c84c6577
tdb2: make summary command handle recovery "dead zone" · ec88af5d
Rusty Russell authored Dec 01, 2010
```
We can run summary with a recovery area, or a dead zone.
```
ec88af5d