Commit fc962b10 authored by Rich Prohaska's avatar Rich Prohaska Committed by Yoni Fogel

refs #6040 update the README

git-svn-id: file:///svn/toku/tokudb@53420 c7de825b-a66e-492c-adef-691d508d4ae1
parent 3400f1d1
...@@ -5,8 +5,14 @@ you cannot access a database created by Berkeley DB using the Tokutek ...@@ -5,8 +5,14 @@ you cannot access a database created by Berkeley DB using the Tokutek
DB, or vice-versa. DB, or vice-versa.
db-insert is a program that inserts random key-value pairs into a database. db-insert is a program that inserts random key-value pairs into a database.
db-scan is a program that scans through the key-value pairs, reading every row, from a database. db-scan is a program that scans through the key-value pairs, reading every row, from a database.
db-update is a program that upserts key-value pairs into a database. If the key already exists it increment a count in the value.
db-insert-multiple is a program and inserts key-value pairs into multiple databases. This is is now TokuDB maintains consistent
secondary databases.
To build it and run it (it's been tested on Fedora 10): To build it and run it (it's been tested on Fedora 10):
$ make (Makes the binaries) $ make (Makes the binaries)
Run the insertion workload under TokuDB: Run the insertion workload under TokuDB:
...@@ -63,4 +69,17 @@ VmPeak: 244668 kB ...@@ -63,4 +69,17 @@ VmPeak: 244668 kB
VmHWM: 68096 kB VmHWM: 68096 kB
VmRSS: 1232 kB VmRSS: 1232 kB
The update-bdb program upserts 1B rows into a BDB database. When the database gets larger than memory, the throughput
should tank since every update needs to read a block from the storage system. The storage system becomes the performance
bottleneck. The program uses 1 1GB cache in front of the kernel's file system buffer cache. The program should hit the wall
at about 300M rows on a machine with 16GB of memory since keys are 8 bytes and values are 8 bytes in size.
$ ./db-update-bdb
The update program upserts 1B rows into a TokuDB database. Throughput should be not degrade significantly since the cost
of the storage system reads is amortized over 1000's of update operations. One should expect TokuDB to be at least 50 times
faster than BDB.
$ ./db-update
There isn't much documentation for the Tokutek Fractal Tree index library, but most of the API is like Berkeley DB's. There isn't much documentation for the Tokutek Fractal Tree index library, but most of the API is like Berkeley DB's.
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
#ident "$Id$" #ident "$Id$"
#ident "Copyright (c) 2007-2012 Tokutek Inc. All rights reserved." #ident "Copyright (c) 2007-2012 Tokutek Inc. All rights reserved."
#ident "The technology is licensed by the Massachusetts Institute of Technology, Rutgers State University of New Jersey, and the Research Foundation of State University of New York at Stony Brook under United States of America Serial No. 11/760379 and to the patents and/or patent applications resulting from it." #ident "The technology is licensed by the Massachusetts Institute of Technology, Rutgers State University of New Jersey, and the Research Foundation of State University of New York at Stony Brook under United States of America Serial No. 11/760379 and to the patents and/or patent applications resulting from it."
// measure the performance of insertions into multiple dictionaries using ENV->put_multiple // measure the performance of insertions into multiple dictionaries using ENV->put_multiple
// the table schema is t(a bigint, b bigint, c bigint, d bigint, primary key(a), key(b), key(c,d), clustering key(d)) // the table schema is t(a bigint, b bigint, c bigint, d bigint, primary key(a), key(b), key(c,d), clustering key(d))
// the primary key(a) is represented with key=a and value=b,c,d // the primary key(a) is represented with key=a and value=b,c,d
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
#ident "$Id$" #ident "$Id$"
#ident "Copyright (c) 2007-2012 Tokutek Inc. All rights reserved." #ident "Copyright (c) 2007-2012 Tokutek Inc. All rights reserved."
#ident "The technology is licensed by the Massachusetts Institute of Technology, Rutgers State University of New Jersey, and the Research Foundation of State University of New York at Stony Brook under United States of America Serial No. 11/760379 and to the patents and/or patent applications resulting from it." #ident "The technology is licensed by the Massachusetts Institute of Technology, Rutgers State University of New Jersey, and the Research Foundation of State University of New York at Stony Brook under United States of America Serial No. 11/760379 and to the patents and/or patent applications resulting from it."
// measure the performance of a simulated "insert on duplicate key update" operation // measure the performance of a simulated "insert on duplicate key update" operation
// the table schema is t(a int, b int, c int, d int, primary key(a, b)) // the table schema is t(a int, b int, c int, d int, primary key(a, b))
// a and b are random // a and b are random
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment