1. 01 Dec, 2010 6 commits
    • Rusty Russell's avatar
      tdb2: compare the extra 11 hash bits in the header. · 11564894
      Rusty Russell authored
      We already have 10 hash bits encoded in the offset itself; we only get
      here incorrectly about 1 time in 1000, so it's a pretty minor
      optimization at best.
      
      Nonetheless, we have the information, so let's check it before
      accessing the key.  This reduces the probability of a false keycmp by
      another factor of 2000.
      11564894
    • Rusty Russell's avatar
      tdb2: add comparison stats · f2c286c9
      Rusty Russell authored
      f2c286c9
    • Rusty Russell's avatar
      tdb2: clean up logging · 4e185ad8
      Rusty Russell authored
      Logged errors should always set tdb->ecode before they are called, and
      there's little reason to have a sprintf-style logging function since
      we can do the formatting internally.
      
      Change the tdb_log attribute to just take a "const char *", and create
      a new tdb_logerr() helper which sets ecode and calls it.  As a bonus,
      mark it COLD so the compiler can optimize appropriately knowing that
      it's unlikely to be invoked.
      4e185ad8
    • Rusty Russell's avatar
      tdb2: remove truncated bit. · 57680260
      Rusty Russell authored
      There was an idea that we would use a bit to indicate that we didn't have
      the full hash value; this would allow us to move records down when we
      expanded a hash without rehashing them.
      
      There's little evidence that rehashing in this case is particularly
      expensive, so remove the bit.  We use that bit simply to indicate that
      an offset refers to a subhash instead.
      57680260
    • Rusty Russell's avatar
      tdb2: use direct access for tdb_read_off/tdb_write_off · b44978e1
      Rusty Russell authored
      This is one case where getting rid of tdb_get() cost us.  Also, we
      add more read-only checks.
      
      Before we removed tdb_get:
      Adding 1000000 records:  6480 ns (59900296 bytes)
      Finding 1000000 records:  2839 ns (59900296 bytes)
      Missing 1000000 records:  2485 ns (59900296 bytes)
      Traversing 1000000 records:  2598 ns (59900296 bytes)
      Deleting 1000000 records:  5342 ns (59900296 bytes)
      Re-adding 1000000 records:  5613 ns (59900296 bytes)
      Appending 1000000 records:  12194 ns (93594224 bytes)
      Churning 1000000 records:  14549 ns (93594224 bytes)
      
      Now:
      Adding 1000000 records:  6307 ns (59900296 bytes)
      Finding 1000000 records:  2801 ns (59900296 bytes)
      Missing 1000000 records:  2515 ns (59900296 bytes)
      Traversing 1000000 records:  2579 ns (59900296 bytes)
      Deleting 1000000 records:  5225 ns (59900296 bytes)
      Re-adding 1000000 records:  5878 ns (59900296 bytes)
      Appending 1000000 records:  12665 ns (93594224 bytes)
      Churning 1000000 records:  16090 ns (93594224 bytes)
      b44978e1
    • Rusty Russell's avatar
      tdb2: remove tdb_get() · 96b169e9
      Rusty Russell authored
      We have four internal helpers for reading data from the database:
      1) tdb_read_convert() - read (and convert) into a buffer.
      2) tdb_read_off() - read (and convert) and offset.
      3) tdb_access_read() - malloc or direct access to the database.
      4) tdb_get() - copy into a buffer or direct access to the database.
      
      The last one doesn't really buy us anything, so remove it (except for
      tdb_read_off/tdb_write_off, see next patch).
      
      Before:
      Adding 1000000 records:  6480 ns (59900296 bytes)
      Finding 1000000 records:  2839 ns (59900296 bytes)
      Missing 1000000 records:  2485 ns (59900296 bytes)
      Traversing 1000000 records:  2598 ns (59900296 bytes)
      Deleting 1000000 records:  5342 ns (59900296 bytes)
      Re-adding 1000000 records:  5613 ns (59900296 bytes)
      Appending 1000000 records:  12194 ns (93594224 bytes)
      Churning 1000000 records:  14549 ns (93594224 bytes)
      
      After:
      Adding 1000000 records:  6497 ns (59900296 bytes)
      Finding 1000000 records:  2854 ns (59900296 bytes)
      Missing 1000000 records:  2563 ns (59900296 bytes)
      Traversing 1000000 records:  2735 ns (59900296 bytes)
      Deleting 1000000 records:  11357 ns (59900296 bytes)
      Re-adding 1000000 records:  8145 ns (59900296 bytes)
      Appending 1000000 records:  10939 ns (93594224 bytes)
      Churning 1000000 records:  18479 ns (93594224 bytes)
      96b169e9
  2. 23 Nov, 2010 1 commit
  3. 01 Dec, 2010 1 commit
  4. 22 Nov, 2010 2 commits
    • Rusty Russell's avatar
      tdb2: relax locking to allow two free list locks at once · a5b66d70
      Rusty Russell authored
      As long as they are in descending order.  This prevents the common case of:
      
      1) Grab lock for bucket.
      2) Remove entry from bucket.
      3) Drop lock for bucket.
      4) Grab lock for bucket for leftover.
      5) Add leftover entry to bucket.
      6) Drop lock for leftover bucket.
      
      In particular it's quite common for the leftover bucket to be the same
      as the entry bucket (when it's the largest bucket); if it's not, we are
      no worse than before.
      
      Current results of speed test:
      $ ./speed 1000000
      Adding 1000000 records:  13194 ns (60083192 bytes)
      Finding 1000000 records:  2438 ns (60083192 bytes)
      Traversing 1000000 records:  2167 ns (60083192 bytes)
      Deleting 1000000 records:  9265 ns (60083192 bytes)
      Re-adding 1000000 records:  10241 ns (60083192 bytes)
      Appending 1000000 records:  17692 ns (93879992 bytes)
      Churning 1000000 records:  26077 ns (93879992 bytes)
      
      Previous:
      $ ./speed 1000000
      Adding 1000000 records:  23210 ns (59193360 bytes)
      Finding 1000000 records:  2387 ns (59193360 bytes)
      Traversing 1000000 records:  2150 ns (59193360 bytes)
      Deleting 1000000 records:  13392 ns (59193360 bytes)
      Re-adding 1000000 records:  11546 ns (59193360 bytes)
      Appending 1000000 records:  29327 ns (91193360 bytes)
      Churning 1000000 records:  33026 ns (91193360 bytes)
      a5b66d70
    • Rusty Russell's avatar
      tdb2: use expansion heuristics from tdb1 · 20defbbc
      Rusty Russell authored
      This reduces the amount of expansion we do.
      
      Before:
      ./speed 1000000
      Adding 1000000 records:  23210 ns (59193360 bytes)
      Finding 1000000 records:  2387 ns (59193360 bytes)
      Traversing 1000000 records:  2150 ns (59193360 bytes)
      Deleting 1000000 records:  13392 ns (59193360 bytes)
      Re-adding 1000000 records:  11546 ns (59193360 bytes)
      Appending 1000000 records:  29327 ns (91193360 bytes)
      Churning 1000000 records:  33026 ns (91193360 bytes)
      
      After:
      $ ./speed 1000000
      Adding 1000000 records:  17480 ns (61472904 bytes)
      Finding 1000000 records:  2431 ns (61472904 bytes)
      Traversing 1000000 records:  2194 ns (61472904 bytes)
      Deleting 1000000 records:  10948 ns (61472904 bytes)
      Re-adding 1000000 records:  11247 ns (61472904 bytes)
      Appending 1000000 records:  21826 ns (96051424 bytes)
      Churning 1000000 records:  27242 ns (96051424 bytes)
      20defbbc
  5. 01 Dec, 2010 1 commit
    • Rusty Russell's avatar
      tdb2: shrink free header from 32 to 24 bytes. · 5e30abc6
      Rusty Russell authored
      This reduces our minimum key+data length to 8 bytes; we do this by packing
      the prev pointer where we used to put the flist pointer, and storing the
      flist as an 8 bit index (meaning we can only have 256 free tables).
      
      Note that this has a perverse result on the size of the database, as our
      4-byte key and 4-byte data now fit perfectly in a minimal record, so
      appeding causes us to allocate new records which are 50% larger,
      since we detect growing.
      
      Current results of speed test:
      $ ./speed 1000000
      Adding 1000000 records:  23210 ns (59193360 bytes)
      Finding 1000000 records:  2387 ns (59193360 bytes)
      Traversing 1000000 records:  2150 ns (59193360 bytes)
      Deleting 1000000 records:  13392 ns (59193360 bytes)
      Re-adding 1000000 records:  11546 ns (59193360 bytes)
      Appending 1000000 records:  29327 ns (91193360 bytes)
      Churning 1000000 records:  33026 ns (91193360 bytes)
      
      Previous:
      $ ./speed 1000000
      Adding 1000000 records:  28324 ns (67232528 bytes)
      Finding 1000000 records:  2468 ns (67232528 bytes)
      Traversing 1000000 records:  2200 ns (67232528 bytes)
      Deleting 1000000 records:  13083 ns (67232528 bytes)
      Re-adding 1000000 records:  16433 ns (67232528 bytes)
      Appending 1000000 records:  2511 ns (67232528 bytes)
      Churning 1000000 records:  31068 ns (67570448 bytes)
      5e30abc6
  6. 23 Nov, 2010 1 commit
  7. 01 Dec, 2010 5 commits
    • Rusty Russell's avatar
      tdb2: Add speed test to tdb and tdb2 · 076c398e
      Rusty Russell authored
      Current results of speed test:
      $ ./speed 1000000
      Adding 1000000 records:  14726 ns (67244816 bytes)
      Finding 1000000 records:  2844 ns (67244816 bytes)
      Missing 1000000 records:  2528 ns (67244816 bytes)
      Traversing 1000000 records:  2572 ns (67244816 bytes)
      Deleting 1000000 records:  5358 ns (67244816 bytes)
      Re-adding 1000000 records:  9176 ns (67244816 bytes)
      Appending 1000000 records:  3035 ns (67244816 bytes)
      Churning 1000000 records:  18139 ns (67565840 bytes)
      $ ./speed 100000
      Adding 100000 records:  13270 ns (14349584 bytes)
      Finding 100000 records:  2769 ns (14349584 bytes)
      Missing 100000 records:  2422 ns (14349584 bytes)
      Traversing 100000 records:  2595 ns (14349584 bytes)
      Deleting 100000 records:  5331 ns (14349584 bytes)
      Re-adding 100000 records:  5875 ns (14349584 bytes)
      Appending 100000 records:  2751 ns (14349584 bytes)
      Churning 100000 records:  20666 ns (25771280 bytes)
      
      vs tdb1 (with hashsize 100003):
      $ ./speed 1000000
      Adding 1000000 records:  8547 ns (44306432 bytes)
      Finding 1000000 records:  5595 ns (44306432 bytes)
      Missing 1000000 records:  3469 ns (44306432 bytes)
      Traversing 1000000 records:  4571 ns (44306432 bytes)
      Deleting 1000000 records:  12115 ns (44306432 bytes)
      Re-adding 1000000 records:  10505 ns (44306432 bytes)
      Appending 1000000 records:  10610 ns (44306432 bytes)
      Churning 1000000 records:  28697 ns (44306432 bytes)
      $ ./speed 100000
      Adding 100000 records:  6030 ns (4751360 bytes)
      Finding 100000 records:  3141 ns (4751360 bytes)
      Missing 100000 records:  3143 ns (4751360 bytes)
      Traversing 100000 records:  4659 ns (4751360 bytes)
      Deleting 100000 records:  7891 ns (4751360 bytes)
      Re-adding 100000 records:  5913 ns (4751360 bytes)
      Appending 100000 records:  4242 ns (4751360 bytes)
      Churning 100000 records:  15300 ns (4751360 bytes)
      076c398e
    • Rusty Russell's avatar
      tdb2: fix tdb_check() return when free table entries missing. · dbbde019
      Rusty Russell authored
      It mistakenly returned -1 meaning "success".
      dbbde019
    • Rusty Russell's avatar
      c84c6577
    • Rusty Russell's avatar
      tdb2: make summary command handle recovery "dead zone" · ec88af5d
      Rusty Russell authored
      We can run summary with a recovery area, or a dead zone.
      ec88af5d
    • Rusty Russell's avatar
      tdb2: cancel transactions on tdb_close · d95645d5
      Rusty Russell authored
      Otherwise we leak memory.
      d95645d5
  8. 23 Nov, 2010 6 commits
  9. 17 Nov, 2010 5 commits
    • Rusty Russell's avatar
      tdb2: handle chains of free tables · ef9dec60
      Rusty Russell authored
      This adds chains of free tables: we choose one at random at the start and
      we iterate through them when they are exhausted.
      
      Currently there is no code to actually add a new free table, but we test
      that we can handle it if we add one in future.
      ef9dec60
    • Rusty Russell's avatar
      tdb2: get rid of zones · d70577b6
      Rusty Russell authored
      Zones were a bad idea.  They mean we can't simply add stuff to the end
      of the file (which transactions relied upon), and there's no good heuristic
      in how to size them.
      
      This patch reverts us to a single free table.
      d70577b6
    • Rusty Russell's avatar
      tdb2: fix bucket search · 2ecf943a
      Rusty Russell authored
      We were previously jumping straight from the first bucket to the end.
      2ecf943a
    • Rusty Russell's avatar
      tdb2: only adjust size once when growing · e984ef66
      Rusty Russell authored
      We were adding 50% to datalen twice, so move it out of adjust_size and
      make the callers do it.
      
      We also add a test that the heuristic is working at all.
      e984ef66
    • Rusty Russell's avatar
      tdb2: remove tailer · d1383862
      Rusty Russell authored
      We don't actually need it.
      d1383862
  10. 15 Nov, 2010 9 commits
    • Rusty Russell's avatar
      tdb2: fix tdb_chainlock · c5e3f07a
      Rusty Russell authored
      We can't enlarge the lock without risking deadlock, so tdb_chainlock() can't
      simply grab a one-byte lock; it needs to grab the lock we'd need to protect
      the hash.
      
      In theory, tdb_chainlock_read() could use a one-byte lock though.
      c5e3f07a
    • Rusty Russell's avatar
      tdb2: fix coalesce race #3 · 06a5b1a8
      Rusty Russell authored
      When we're coalescing, we need to drop the lock on the current free list, as
      we've enlarged the block and it may now belong in a different list.
      
      Unfortunately (as shown by repeated tdbtorture -n 8) another coalescing run
      can do the coalescing while we've dropped the lock.  So for this case, we
      use the TDB_COALESCING_MAGIC flag so it doesn't look free.
      06a5b1a8
    • Rusty Russell's avatar
      tdb2: add TDB_COALESCING_MAGIC to solve coalescing race. · 56ea2c52
      Rusty Russell authored
      A new special record marker to indicate coalescing is in progress.
      56ea2c52
    • Rusty Russell's avatar
      tdb2: fix coalesce race #2 · d2a4d6b4
      Rusty Russell authored
      When we find a free block, we need to mark it as used before we drop the
      free lock, even though we've removed it from the list.  Otherwise the
      coalescing code can find it.
      
      This means handing the information we need down to lock_and_alloc, which
      also means we know when we're grabbing a "growing" entry, and can relax
      the heuristics about finding a good-sized block in that case.
      d2a4d6b4
    • Rusty Russell's avatar
      tdb2: coalescing race fix #1 · b5479009
      Rusty Russell authored
      When coalescing, we check the adjacent entry then lock its free list: we
      need to *recheck* after locking, to make sure it's still in that free list.
      b5479009
    • Rusty Russell's avatar
      tdb2: minor optimization for set_header · 8afb9681
      Rusty Russell authored
      We actually only need the bottom 5 bits of the hash value, so don't waste
      8 bytes passing it.
      8afb9681
    • Rusty Russell's avatar
      tdb2: hoist adjust_size · 590eee6f
      Rusty Russell authored
      We're going to want it in get_free() in the next patch, so move it upwards.
      Trivial changes, too: add to size before min length check, and rename growing
      to want_extra.
      590eee6f
    • Rusty Russell's avatar
      tdb2: clean up makefile for tools · fdba839b
      Rusty Russell authored
      fdba839b
    • Rusty Russell's avatar
      tdb2: extra debugging checks · b371060f
      Rusty Russell authored
      b371060f
  11. 17 Nov, 2010 3 commits