1. 11 Jul, 2018 8 commits
    • Kirill Smelkov's avatar
      go/neo/t/neotest: bench-cpu += unzlib for wczblk1 and prod1 objects · 91a8afa8
      Kirill Smelkov authored
      NEO uses zlib compression for data, and this way client has to spend
      time decompressing it. Benchmark how much time zlib decompression takes.
      With stdlib zlib decompressor out of the box it looks like:
      
      	name                 time/op
      	unzlib/py/wczdata    20.7µs ± 2%
      	unzlib/go/wczdata    70.6µs ± 0%
      	unzlib/py/prod1-avg  4.02µs ± 1%
      	unzlib/go/prod1-avg  15.2µs ± 0%
      
      i.e. much not in favour of Go.
      
      We'll be fixing that in the following patches.
      91a8afa8
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Switch to zwrk to simulate parallel load from multiple clients · 646a94b5
      Kirill Smelkov authored
      With zwrk for ZODB being similar to what wrk is for HTTP.
      
      Rationale: simulating multiple clients is:
      
      1. noisy - the timings from run to run are changing sometimes up to 50%
      2. with significant additional overhead - there are constant OS-level
         process switches in between client processes and this prevents to
         actually create the load.
      3. the above load from "2" actually takes resources from the server in
         localhost case.
      
      So let's switch to simulating many requests in lightweight way similarly
      to how it is done in wrk - in one process and not so many threads (it
      can be just 1) with many connections opened to server and epolly way to
      load it with Go providing epoll-goroutine matching.
      
      Example summarized zbench-local output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.txt
      	name                             time/object
      	cluster:rio dataset:wczblk1-8
      	fs1-zhash.py                             23.7µs ± 5%
      	fs1-zhash.go                             5.68µs ± 8%
      	fs1-zhash.go+prefetch128                 6.44µs ±16%
      	zeo/py/fs1-zhash.py                       376µs ± 4%
      	zeo/py/fs1-zhash.go                       130µs ± 3%
      	zeo/py/fs1-zhash.go+prefetch128          72.3µs ± 4%
      	neo/py(!log)/sqlite·P1-zhash.py           565µs ± 4%
      	neo/py(!log)/sql·P1-zhash.py              491µs ± 8%
      	cluster:rio dataset:prod1-1024
      	fs1-zhash.py                             19.5µs ± 2%
      	fs1-zhash.go                             3.92µs ±12%
      	fs1-zhash.go+prefetch128                 4.42µs ± 6%
      	zeo/py/fs1-zhash.py                       365µs ± 9%
      	zeo/py/fs1-zhash.go                       120µs ± 1%
      	zeo/py/fs1-zhash.go+prefetch128          68.4µs ± 3%
      	neo/py(!log)/sqlite·P1-zhash.py           560µs ± 5%
      	neo/py(!log)/sql·P1-zhash.py              482µs ± 8%
      
      	name                             req/s
      	cluster:rio dataset:wczblk1-8
      	fs1-zwrk.go·1                              380k ± 2%
      	fs1-zwrk.go·2                              666k ± 3%
      	fs1-zwrk.go·3                              948k ± 1%
      	fs1-zwrk.go·4                             1.24M ± 1%
      	fs1-zwrk.go·8                             1.62M ± 0%
      	fs1-zwrk.go·12                            1.70M ± 0%
      	fs1-zwrk.go·16                            1.71M ± 0%
      	zeo/py/fs1-zwrk.go·1                      8.29k ± 1%
      	zeo/py/fs1-zwrk.go·2                      10.4k ± 2%
      	zeo/py/fs1-zwrk.go·3                      11.2k ± 1%
      	zeo/py/fs1-zwrk.go·4                      11.7k ± 1%
      	zeo/py/fs1-zwrk.go·8                      12.1k ± 2%
      	zeo/py/fs1-zwrk.go·12                     12.3k ± 1%
      	zeo/py/fs1-zwrk.go·16                     12.3k ± 2%
      	cluster:rio dataset:prod1-1024
      	fs1-zwrk.go·1                              594k ± 7%
      	fs1-zwrk.go·2                             1.14M ± 4%
      	fs1-zwrk.go·3                             1.60M ± 2%
      	fs1-zwrk.go·4                             2.09M ± 1%
      	fs1-zwrk.go·8                             2.74M ± 1%
      	fs1-zwrk.go·12                            2.76M ± 0%
      	fs1-zwrk.go·16                            2.76M ± 1%
      	zeo/py/fs1-zwrk.go·1                      9.42k ± 9%
      	zeo/py/fs1-zwrk.go·2                      10.4k ± 1%
      	zeo/py/fs1-zwrk.go·3                      11.4k ± 1%
      	zeo/py/fs1-zwrk.go·4                      11.7k ± 2%
      	zeo/py/fs1-zwrk.go·8                      12.4k ± 1%
      	zeo/py/fs1-zwrk.go·12                     12.5k ± 1%
      	zeo/py/fs1-zwrk.go·16                     13.4k ±11%
      
      	name                             latency-time/object
      	cluster:rio dataset:wczblk1-8
      	fs1-zwrk.go·1                            2.63µs ± 2%
      	fs1-zwrk.go·2                            3.00µs ± 3%
      	fs1-zwrk.go·3                            3.16µs ± 1%
      	fs1-zwrk.go·4                            3.23µs ± 1%
      	fs1-zwrk.go·8                            4.94µs ± 0%
      	fs1-zwrk.go·12                           7.06µs ± 0%
      	fs1-zwrk.go·16                           9.36µs ± 0%
      	zeo/py/fs1-zwrk.go·1                      121µs ± 1%
      	zeo/py/fs1-zwrk.go·2                      192µs ± 2%
      	zeo/py/fs1-zwrk.go·3                      267µs ± 1%
      	zeo/py/fs1-zwrk.go·4                      343µs ± 1%
      	zeo/py/fs1-zwrk.go·8                      660µs ± 2%
      	zeo/py/fs1-zwrk.go·12                     977µs ± 1%
      	zeo/py/fs1-zwrk.go·16                    1.30ms ± 2%
      	cluster:rio dataset:prod1-1024
      	fs1-zwrk.go·1                            1.69µs ± 7%
      	fs1-zwrk.go·2                            1.76µs ± 4%
      	fs1-zwrk.go·3                            1.88µs ± 2%
      	fs1-zwrk.go·4                            1.91µs ± 1%
      	fs1-zwrk.go·8                            2.92µs ± 1%
      	fs1-zwrk.go·12                           4.34µs ± 0%
      	fs1-zwrk.go·16                           5.80µs ± 1%
      	zeo/py/fs1-zwrk.go·1                      107µs ± 9%
      	zeo/py/fs1-zwrk.go·2                      192µs ± 1%
      	zeo/py/fs1-zwrk.go·3                      263µs ± 1%
      	zeo/py/fs1-zwrk.go·4                      342µs ± 2%
      	zeo/py/fs1-zwrk.go·8                      648µs ± 1%
      	zeo/py/fs1-zwrk.go·12                     957µs ± 1%
      	zeo/py/fs1-zwrk.go·16                    1.20ms ±10%
      
      The scalability graphs in http://navytux.spb.ru/~kirr/neo.html were
      made with simulating client load by zwrk, not many client OS processes.
      http://navytux.spb.ru/~kirr/neo.html#performance-tests has some
      additional notes on zwrk.
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/ca0d828b	X neotest: Tzwrk1 - place to control running time of 1 zwrk iteration
      	lab.nexedi.com/kirr/neo/commit/bbfb5006	X zwrk: Make sure we warm up connections to all NEO storages when cluster is partitioned
      	lab.nexedi.com/kirr/neo/commit/7f22bba6	X zwrk: New tool to simulate paralell load from multiple clients
      646a94b5
    • Kirill Smelkov's avatar
      go/zodb: Allow to open a storage in "direct" mode - without local cache · 1f92a4e2
      Kirill Smelkov authored
      zodb/go provides generic cache (see 7233b4c0 "zodb/go: In-RAM client
      cache") primarily in order for prefetch to work. However if we need to
      benchmark a storage with loading some objects several times, this cache
      can hide the actual time it takes for an object to load.
      
      For such use cases add NoCache open option so that opening does not
      create a cache and always conveys load operations directly to storage
      driver. The option will be used by zwrk tool (see next patch).
      1f92a4e2
    • Kirill Smelkov's avatar
      go/neo/t/neotest: ZODB benchmarks · 3f578560
      Kirill Smelkov authored
      Add to neotest zbench-local and zbench-cluster commands that perform
      ZODB benchmarks on FileStorage, ZEO and NEO with Python and Go clients
      either locally, or with a server and client running on 2 different nodes.
      
      There are 2 client programs: tzodb.py and tzodb.go which for now compute
      hash of whole latest objects stream in a ZODB database. On server side
      neotest is taught to launch ZEO and various NEO clusters and to execute
      client load on them.
      
      Two test datasets are used: wczblk1-8 - the dataset with wendelin.core ZBlk1
      objects covering 8M array, and prod1-1024 - synthethic dataset that tries to
      represent regular ERP5 instance. Both datasets are very small and so we can
      assume they reside completely in server disk cache while running benchmarks.
      Benchmark timings will thus give pure storage software processing latency, as
      pagecache hit time is on par, or less, to 1µs.
      
      Example output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest zbench-local
      	dataset:	wczblk1-8
      	node:
      	cluster:	deco
      
      	*** generating fs1 data...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 297 MB	RSS: 48MB
      
      	*** generating sqlite data...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 386 MB	RSS: 58MB
      	2018-07-10 19:57:35.7065 ERROR     NEO        [           app: 91] primary master is down
      	Cluster state changed
      
      	*** generating sql data...
      	2018-07-10 19:57:35 140115116649600 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27574 ...
      	2018-07-10 19:57:39 140205509999744 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27603 ...
      	2018-07-10 19:57:42 139692109810816 [Note] /usr/sbin/mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27633 ...
      	2018-07-10 19:57:45 139759221546112 [Note] mysqld (mysqld 10.1.29-MariaDB-6+b1) starting as process 27662 ...
      	I: RAM:  7.47GB
      	I: WORK: 0.01GB
      	gen signal t=0...1.05e+06  float64  (= 0.01GB)
      	gen signal blk [0:1048576]  (100.0%)
      	VIRT: 387 MB	RSS: 59MB
      	2018-07-10 19:57:48.2565 ERROR     NEO        [           app: 91] primary master is down
      	Cluster state changed
      
      	*** FileStorage
      	Benchmarkfs1-zhash.py 2127 16.3 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·2 C1·73 C1E·38 C3·12 C6·36 C7s·0 C8·112 C9·0 C10·62
      	Benchmarkfs1-zhash.py 2127 16.6 µs/object	# crc32:14640593  nread=8540363  t=0.035s	# POLL·0 C1·113 C1E·21 C3·16 C6·56 C7s·0 C8·136 C9·0 C10·41
      	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·71 C1E·36 C3·22 C6·50 C7s·0 C8·167 C9·0 C10·47
      	Benchmarkfs1-zhash.py 2127 15.9 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·77 C1E·32 C3·11 C6·55 C7s·0 C8·184 C9·0 C10·31
      	Benchmarkfs1-zhash.py 2127 16.0 µs/object	# crc32:14640593  nread=8540363  t=0.034s	# POLL·0 C1·78 C1E·15 C3·12 C6·51 C7s·0 C8·140 C9·0 C10·44
      
      	# 16 clients in parallel
      	Benchmarkfs1-zhash.py·P16 2127 129.0 µs/object	# crc32:14640593  nread=8540363  t=0.274s
      	Benchmarkfs1-zhash.py·P16 2127 132.6 µs/object	# crc32:14640593  nread=8540363  t=0.282s
      	Benchmarkfs1-zhash.py·P16 2127 135.0 µs/object	# crc32:14640593  nread=8540363  t=0.287s
      	Benchmarkfs1-zhash.py·P16 2127 135.3 µs/object	# crc32:14640593  nread=8540363  t=0.288s
      	Benchmarkfs1-zhash.py·P16 2127 136.6 µs/object	# crc32:14640593  nread=8540363  t=0.291s
      	Benchmarkfs1-zhash.py·P16 2127 122.8 µs/object	# crc32:14640593  nread=8540363  t=0.261s
      	Benchmarkfs1-zhash.py·P16 2127 130.9 µs/object	# crc32:14640593  nread=8540363  t=0.279s
      	Benchmarkfs1-zhash.py·P16 2127 126.4 µs/object	# crc32:14640593  nread=8540363  t=0.269s
      	Benchmarkfs1-zhash.py·P16 2127 125.8 µs/object	# crc32:14640593  nread=8540363  t=0.268s
      	Benchmarkfs1-zhash.py·P16 2127 108.3 µs/object	# crc32:14640593  nread=8540363  t=0.230s
      	Benchmarkfs1-zhash.py·P16 2127 131.0 µs/object	# crc32:14640593  nread=8540363  t=0.279s
      	Benchmarkfs1-zhash.py·P16 2127 124.1 µs/object	# crc32:14640593  nread=8540363  t=0.264s
      	Benchmarkfs1-zhash.py·P16 2127 129.3 µs/object	# crc32:14640593  nread=8540363  t=0.275s
      	Benchmarkfs1-zhash.py·P16 2127 125.0 µs/object	# crc32:14640593  nread=8540363  t=0.266s
      	Benchmarkfs1-zhash.py·P16 2127 131.5 µs/object	# crc32:14640593  nread=8540363  t=0.280s
      	Benchmarkfs1-zhash.py·P16 2127 131.4 µs/object	# crc32:14640593  nread=8540363  t=0.280s
      	# POLL·0 C1·4 C1E·13 C3·11 C6·79 C7s·0 C8·14 C9·0 C10·0
      
      	...
      
      And its summary via benchstat:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ benchstat -split node,cluster,dataset x.log
      	name                                 time/object
      	cluster:deco dataset:wczblk1-8
      	fs1-zhash.py                         16.1µs ± 3%
      	fs1-zhash.py·P16                      130µs ± 5%
      	fs1-zhash.go                         3.00µs ±10%
      	fs1-zhash.go+prefetch128             3.40µs ±18%
      	fs1-zhash.go·P16                     10.2µs ±71%
      	zeo/py/fs1-zhash.py                   336µs ± 3%
      	zeo/py/fs1-zhash.py·P16              3.22ms ± 6%
      	zeo/py/fs1-zhash.go                   112µs ± 2%
      	zeo/py/fs1-zhash.go+prefetch128      60.9µs ± 1%
      	zeo/py/fs1-zhash.go·P16              1.07ms ± 5%
      	neo/py(!log)/sqlite·P1-zhash.py       291µs ± 2%
      	neo/py(!log)/sqlite·P1-zhash.py·P16  2.86ms ± 1%
      	neo/py(!log)/sql·P1-zhash.py          318µs ± 4%
      	neo/py(!log)/sql·P1-zhash.py·P16     3.99ms ± 0%
      	cluster:deco dataset:prod1-1024
      	fs1-zhash.py                         12.3µs ± 1%
      	fs1-zhash.py·P16                      106µs ±10%
      	fs1-zhash.go                         2.56µs ±10%
      	fs1-zhash.go+prefetch128             2.68µs ± 8%
      	fs1-zhash.go·P16                     9.48µs ±43%
      	zeo/py/fs1-zhash.py                   319µs ± 3%
      	zeo/py/fs1-zhash.py·P16              3.13ms ± 3%
      	zeo/py/fs1-zhash.go                   101µs ± 5%
      	zeo/py/fs1-zhash.go+prefetch128      56.9µs ± 1%
      	zeo/py/fs1-zhash.go·P16              1.19ms ± 4%
      	neo/py(!log)/sqlite·P1-zhash.py       281µs ± 3%
      	neo/py(!log)/sqlite·P1-zhash.py·P16  2.80ms ± 1%
      	neo/py(!log)/sql·P1-zhash.py          316µs ± 1%
      	neo/py(!log)/sql·P1-zhash.py·P16     3.91ms ± 1%
      
      Since there is no NEO/go support yet, corresponding neotest parts are merged,
      but commented-out with appropriate remark.
      
      Parallel access is simulated with spawning many OS processes for now.
      This will change in the nearby followup patch to zwrk.
      
      Results of ZODB benchmarking were discussed in
      
      	http://navytux.spb.ru/~kirr/neo.html#performance-tests		, and
      	http://navytux.spb.ru/~kirr/neo.html#results-and-discussion
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/e0d875bc	X neotest: Teach it to benchmark NEO with storage partitioned to several nodes
      	lab.nexedi.com/kirr/neo/commit/590f0a46	X neo/py uses n(replica) as n(real-replica) - 1
      	lab.nexedi.com/kirr/neo/commit/b655da26	X save time not benchmarking things we do not show
      	lab.nexedi.com/kirr/neo/commit/f834f40d	X zhash: Show N(obj) read, not 1, in place of N(iter)
      	lab.nexedi.com/kirr/neo/commit/a16e8d52	X teach golang to access ZEO
      	lab.nexedi.com/kirr/neo/commit/b9827725	X switch to using no compression, because this way it is more fair for comparing storage latencies
      	lab.nexedi.com/kirr/neo/commit/c0067335	X neotest: Don't depend on killall
      	lab.nexedi.com/kirr/neo/commit/2bcd6ebb	X neotest: add zbench-local & zbench-cluster subcomands
      	lab.nexedi.com/kirr/neo/commit/fb165ad9	X neotest: Also benchmark NEO/py with logging disabled
      	lab.nexedi.com/kirr/neo/commit/2118ba38	X neotest: Help mysqlk_install_db find its basedir under SlapOS
      	lab.nexedi.com/kirr/neo/commit/80eaa05e	X zgenprod1 tool
      	lab.nexedi.com/kirr/neo/commit/eb0e516f	X check hash result and error if mismatch (zhash.* part); neotest part pending
      	lab.nexedi.com/kirr/neo/commit/046370db	X benchify rest of bench-cluster
      	lab.nexedi.com/kirr/neo/commit/2d13818e	X bench-local + zhash: Add output in std bench format
      	lab.nexedi.com/kirr/neo/commit/1d692a3b	X add NEO/go with SHA1 disabled (both Sgo and Cgo to regular benchmarks)
      3f578560
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Network information & benchmarks · 26006d7e
      Kirill Smelkov authored
      Add to neotest bench-net command that performs latency measurments at
      ping and TCP levels. Example output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9
      	node:
      	cluster:        deco-rio
      
      	*** link latency:
      
      	# deco ⇄ rio (ping 16B)
      	PING rio (192.168.0.8) 16(44) bytes of data.
      
      	--- rio ping statistics ---
      	25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms
      	rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms
      	Benchmarkpingrtt-/16B-min 1 0.080 ms/op
      	Benchmarkpingrtt-/16B-avg 1 0.097 ms/op
      	# POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141
      
      	...
      
      	*** TCP latency:
      
      	# deco ⇄ rio (lat_tcp.c 1B  -> lat_tcp.c -s)
      	Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op       # TCP latency using rio: 116.1743 microseconds  # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444
      	Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op       # TCP latency using rio: 117.2896 microseconds  # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209
      	Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op       # TCP latency using rio: 117.5331 microseconds  # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246
      	Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op       # TCP latency using rio: 117.6509 microseconds  # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187
      	Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op       # TCP latency using rio: 116.8125 microseconds  # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475
      
      	...
      
      And its summary via benchstat:
      
      	name                 time/op
      	pingrtt-/16B-min     80.0µs ± 0%
      	pingrtt-/16B-avg     97.0µs ± 0%
      	-pingrtt/16B-min     79.0µs ± 0%
      	-pingrtt/16B-avg      112µs ± 0%
      	pingrtt-/1452B-min    241µs ± 0%
      	pingrtt-/1452B-avg    303µs ± 0%
      	-pingrtt/1452B-min    266µs ± 0%
      	-pingrtt/1452B-avg    303µs ± 0%
      	tcprtt(c_c)-/1B       117µs ± 1%
      	tcprtt(c_go)-/1B      122µs ± 2%
      	-tcprtt(c_c)/1B       117µs ± 1%
      	-tcprtt(c_go)/1B      121µs ± 5%
      	tcprtt(c_c)-/1400B    392µs ± 4%
      	tcprtt(c_go)-/1400B   363µs ±18%
      	-tcprtt(c_c)/1400B    412µs ±21%
      	-tcprtt(c_go)/1400B   391µs ±38%
      	tcprtt(c_c)-/1500B    271µs ±18%
      	tcprtt(c_go)-/1500B   290µs ±21%
      	-tcprtt(c_c)/1500B    282µs ±16%
      	-tcprtt(c_go)/1500B   334µs ±24%
      	tcprtt(c_c)-/4096B    711µs ± 5%
      	tcprtt(c_go)-/4096B   737µs ± 5%
      	-tcprtt(c_c)/4096B    740µs ± 2%
      	-tcprtt(c_go)/4096B   711µs ± 7%
      
      Latencies here are not good because for this run on rio interrupt mitigation
      was not tuned (see below). By the way, analyzing ping RTT latencies on our
      shuttle machines (similar to rio) resulted in the following kernel patch
      
      	https://git.kernel.org/linus/509708310c (released with Linux 4.15)
      
      to fix/being able to adjust interrupt mitigation on Realtek NICs.
      
      While at networking topic, teach info/info-local to show related
      information about node's NICs. Example lines output for deco:
      
      	nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21
      	nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ...
      	nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq,  txc: 0μs/0f/0μs-irq/0f-irq
      	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	nic/wlan0: Intel Corporation Wireless 8260 rev 3a
      	nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ...
      	nic/wlan0/coalesce: rxc: ?,  txc: ?
      	nic/wlan0/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor
      
      for rio:
      
      	nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
      	nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
      	nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq,  txc: 200μs/4f/0μs-irq/0f-irq
      	nic/eth0/status:   up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor
      	WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency
      	nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06
      	nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ...
      	nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq,  txc: 0μs/1f/0μs-irq/0f-irq
      	nic/eth1/status:   down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs
      	WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor
      
      The warning about "RX coalesce latency is max 200μs ..." says that on
      receive path eth0 will be coalescing incoming frames for up to 200μs and
      this way this delay will be added to overal latency. (for small frames
      Realtek NICs do not coalesce interrupts - see details in the kernel patch).
      
      Networked performance (raw and NEO) was not discussed in
      http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the
      importance of C-states for performance was first found via this
      networking latency benchmarks. Links on C-states topic:
      
      	http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states
      	http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/e8e395ae	X neotest: Move network benchmarking into separate function + add `neotest bench-net`
      	lab.nexedi.com/kirr/neo/commit/a971231c	X neotest/info: Handle USB NICs
      	lab.nexedi.com/kirr/neo/commit/5dd3d1ab	X neotest: sort NIC names
      	lab.nexedi.com/kirr/neo/commit/9888f047	X neotest: Do not crash if kernel is too old to support gro_flush_timeout
      	lab.nexedi.com/kirr/neo/commit/3a1bdf4a	X bench-remote / tcp : std benchmark output
      	lab.nexedi.com/kirr/neo/commit/9450b6db	X bench-remote / ping += std bench output
      	lab.nexedi.com/kirr/neo/commit/68d5b015	X show gro_flush_timeout + friends
      	lab.nexedi.com/kirr/neo/commit/4c815af9	X neotest: Show NIC features and emit warning if !TSO
      	lab.nexedi.com/kirr/neo/commit/659ce938	X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc...
      	lab.nexedi.com/kirr/neo/commit/ded384cb	X neotest += `lat_tcp.go -s`
      	lab.nexedi.com/kirr/neo/commit/59d46504	X neotest += lat_tcp
      	lab.nexedi.com/kirr/neo/commit/67fc3440	X show small (56B) and full-packet (1472B) ping link latencies
      26006d7e
    • Kirill Smelkov's avatar
      go/neo/t/neotest: Disk information & benchmarks · ee6c2796
      Kirill Smelkov authored
      Add to neotest bench-disk command that performs random-read disk
      benchmarks via ioping. Example output:
      
      	(venv) (8) neotest@rio:~/8/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-disk
      	node:   rio
      	cluster:
      
      	*** disk: random direct (no kernel cache) 4K-read latency
      
      	--- . (ext4 /dev/sda1) ioping statistics ---
      	29.1 k requests completed in 2.95 s, 113.6 MiB read, 9.85 k iops, 38.5 MiB/s
      	generated 29.1 k requests in 3.00 s, 113.6 MiB, 9.69 k iops, 37.9 MiB/s
      	min/avg/max/mdev = 43.2 us / 101.5 us / 250.0 us / 7.48 us
      	Benchmarkdisk/randread/direct/4K-min 1 43.2 us/op
      	Benchmarkdisk/randread/direct/4K-avg 1 101.5 us/op
      	< 59.2 us       458     |
      	< 63.0 us       0       |
      	< 66.7 us       0       |
      	< 70.5 us       0       |
      	< 74.2 us       1       |
      	< 78.0 us       1       |
      	< 81.7 us       0       |
      	< 85.5 us       1       |
      	< 89.2 us       0       |
      	< 93.0 us       0       |
      	< 96.7 us       0       |
      	< 100.5 us      333     |
      	< 104.2 us      27793   | ***********************************************
      	< 108.0 us      259     |
      	< 111.7 us      21      |
      	< 115.5 us      8       |
      	< 119.2 us      18      |
      	< 123.0 us      1       |
      	< 126.7 us      0       |
      	< 130.5 us      7       |
      	< 134.2 us      59      |
      	<       +∞      18      |
      	# POLL·186 C1·291360 C1E·290802 C3·31 C6·1218
      
      	...
      
      	*** disk: random cached 4K-read latency
      
      	--- . (ext4 /dev/sda1) ioping statistics ---
      	3.15 M requests completed in 2.82 s, 12.0 GiB read, 1.12 M iops, 4.26 GiB/s
      	generated 3.15 M requests in 3.00 s, 12.0 GiB, 1.05 M iops, 4.00 GiB/s
      	min/avg/max/mdev = 465 ns / 896 ns / 37.4 us / 183 ns
      	Benchmarkdisk/randread/pagecache/4K-min 1 465 ns/op
      	Benchmarkdisk/randread/pagecache/4K-avg 1 896 ns/op
      	< 839 ns        771375  | ************
      	< 872 ns        609361  | *********
      	< 905 ns        660635  | **********
      	< 938 ns        505305  | ********
      	< 971 ns        189182  | ***
      	< 1.00 us       93655   | *
      	< 1.04 us       70811   | *
      	< 1.07 us       57650   |
      	< 1.10 us       51587   |
      	< 1.14 us       44648   |
      	< 1.17 us       40868   |
      	< 1.20 us       27301   |
      	< 1.24 us       12503   |
      	< 1.27 us       5580    |
      	< 1.30 us       2517    |
      	< 1.34 us       1404    |
      	< 1.37 us       698     |
      	< 1.40 us       378     |
      	< 1.44 us       208     |
      	< 1.47 us       119     |
      	< 1.50 us       50      |
      	<       +∞      978     |
      	# POLL·1 C1·57 C1E·11 C3·2 C6·257
      
      The benchmarks are so done so that output conforms to Go benchmarking
      format. This way it is possible to process the output with benchstat to
      summarize / compare results. For above run summarization gives:
      
      	name                            time/op
      	disk/randread/direct/4K-min     39.0µs ±12%
      	disk/randread/direct/4K-avg      117µs ± 3%
      	disk/randread/pagecache/4K-min   461ns ± 3%
      	disk/randread/pagecache/4K-avg   880ns ± 0%
      
      While at disk topic, teach info/info-local to show related information
      about node's disk. Example line output for rio:
      
      	disk/sda: Samsung SSD 840   rev BB0Q 931.5G
      
      Please see
      
      	http://navytux.spb.ru/~kirr/neo.html#the-need-for-faster-storage
      	http://navytux.spb.ru/~kirr/neo.html#appendix-i-ssd-latency
      
      for some discussion about SSD performance.
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/d35a2fdf	X neotest/info-local: Fix disk display in presence of bind-mounts
      	lab.nexedi.com/kirr/neo/commit/44529dbf	X neotest/bench-disk: Deduplicate code; change 1M -> 2M
      	lab.nexedi.com/kirr/neo/commit/8bac3dba	X neotest/bench-disk: Also benchmark randomly reading 1M blocks
      	lab.nexedi.com/kirr/neo/commit/e795c6ed	X neotest: Fix disk display in case of DM
      	lab.nexedi.com/kirr/neo/commit/352cd100	X neotest: Fix disk display in case of MD
      	lab.nexedi.com/kirr/neo/commit/cd2cd093	X bench_disk: Add std bench format
      	lab.nexedi.com/kirr/neo/commit/9f86eb40	X bench += ioping
      ee6c2796
    • Kirill Smelkov's avatar
      go/neo/t/neotest: CPU information & benchmarks · a60c472c
      Kirill Smelkov authored
      Add to neotest bench-cpu command that performs basic CPU benchmarks:
      pystone and CRC32/SHA1 for now. While every benchmark is run
      additionally C-states profile is collected(*). Example output:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu
      	node:   deco
      	cluster:
      	Benchmarkpystone 1 283297 pystone/s     # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6
      	Benchmarkpystone 1 289788 pystone/s     # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7
      	Benchmarkpystone 1 286329 pystone/s     # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6
      	Benchmarkpystone 1 292087 pystone/s     # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3
      	Benchmarkpystone 1 290119 pystone/s     # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5
      	Benchmarkcrc32/py/4K 300000     3.415 µs/op     # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71
      	Benchmarkcrc32/py/4K 300000     3.402 µs/op     # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77
      	Benchmarkcrc32/py/4K 300000     3.396 µs/op     # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36
      	Benchmarkcrc32/py/4K 300000     3.435 µs/op     # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79
      	Benchmarkcrc32/py/4K 300000     3.434 µs/op     # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55
      	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295
      	Benchmarkcrc32/go/4K 10000000   0.216 µs/op     # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330
      	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301
      	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309
      	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328
      	Benchmarksha1/py/4K 300000      4.553 µs/op     # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94
      	Benchmarksha1/py/4K 300000      4.459 µs/op     # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92
      	Benchmarksha1/py/4K 300000      4.492 µs/op     # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62
      	Benchmarksha1/py/4K 300000      4.550 µs/op     # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93
      	Benchmarksha1/py/4K 300000      4.518 µs/op     # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78
      	Benchmarksha1/go/4K 300000      4.312 µs/op     # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190
      	Benchmarksha1/go/4K 300000      4.383 µs/op     # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182
      	Benchmarksha1/go/4K 300000      4.387 µs/op     # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186
      	Benchmarksha1/go/4K 300000      4.328 µs/op     # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179
      	Benchmarksha1/go/4K 300000      4.337 µs/op     # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191
      
      Such raw output can be summarized with the help of benchstat - either
      with Go[1] or Python[2] implementations:
      
      	$ benchstat x.txt
      	name         pystone/s
      	pystone        288k ± 2%
      
      	name         time/op
      	crc32/py/4K  3.42µs ± 1%
      	crc32/go/4K   218ns ± 1%
      	sha1/py/4K   4.51µs ± 1%
      	sha1/go/4K   4.35µs ± 1%
      
      See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some
      discussion on SHA1 vs CRC32.
      
      While at CPU topic, teach info/info-local to show related information
      about node's CPU: available processors, frequency and idle governors.
      Example of lines added:
      
      	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6
      	...
      	cpu:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
      	cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz]
      	cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs
      	WARNING: cpu: frequency not fixed - benchmark timings won't be stable
      	WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable
      	WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)
      
      See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to
      understand why there are warnings in above example.
      
      Some draft history related to this patch:
      
      	lab.nexedi.com/kirr/neo/commit/cf1f7c24	X tcpu: Don't depend on running tests with cwd = .../go/neo/t/
      	lab.nexedi.com/kirr/neo/commit/1e438610	fixup! X neotest: Also show target-latency for C-states
      	lab.nexedi.com/kirr/neo/commit/4af48245	X neotest: Also show target-latency for C-states
      	lab.nexedi.com/kirr/neo/commit/2910cf56	X neotest: Prefer first part of FQDN for hostname
      	lab.nexedi.com/kirr/neo/commit/c86ba1b0	X bench-cpu += crc32, adler32
      	lab.nexedi.com/kirr/neo/commit/4ac3a550	X neotest: Don't use bc
      	lab.nexedi.com/kirr/neo/commit/3918a997	X neotest: Don't assume we are invoked from the directory where neotest is
      	lab.nexedi.com/kirr/neo/commit/9a266d11	X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B
      	lab.nexedi.com/kirr/neo/commit/b6a830d8	X switch cpu benchmarks to go format
      	lab.nexedi.com/kirr/neo/commit/4436b983	X neotest: Provide cpustat command so it is possible to cpustat something external
      	lab.nexedi.com/kirr/neo/commit/b062b349	X microbenchmark CPU first
      	lab.nexedi.com/kirr/neo/commit/a4a18b55	X first cut on C-state profiling
      	lab.nexedi.com/kirr/neo/commit/ea1e0835	X found that cpuidle can be affecting latency a lot!
      
      (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and
          http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for
          why this is important.
      
          Since being able to profile C-states can be generally useful, we
          expose such profiling with externally-visible `neotest cpustat` utility.
      
      [1] https://godoc.org/golang.org/x/perf/cmd/benchstat
      [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
      a60c472c
    • Kirill Smelkov's avatar
      go/neo/t: Neotest start · c12f2991
      Kirill Smelkov authored
      For testing and benchmarking how NEO/py & NEO/go interact with each
      other we need corresponding test driver. Neotest will be that driver and
      present patch begins it:
      
      - neotest can deploy NEO/{go,py} checkout either locally or on remote node;
      - it can run NEO/{go,py} unit tests either locally or on remote node;
      - it can also show information about a system - either local or remote.
      
      Examples in action. Banner:
      
          x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest
          Neotest is a tool to test and benchmark NEO.
      
          Usage:
      
                  neotest command [arguments]
      
          The commands are:
      
                  test            run all tests on a remote host
                  test-local      run all tests locally
      
                  test-go         run NEO/go unit tests   (part of test-local)
                  test-py         run NEO/py unit tests   (part of test-local)
      
                  deploy          deploy NEO & needed software for tests to remote host
                  deploy-local    deploy NEO & needed software for tests locally
      
                  info            print information about a node
                  info-local      print information about local deployment
      
      Deploy to another node:
      
          x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest deploy neotest@rio.kirr.nexedi.com:3
          *** deploying to neotest@rio.kirr.nexedi.com:3 ...
          ...
          # deployed ok
      
      Print information about that node:
      
          x/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:3
          date:   Sun, 08 Jul 2018 21:30:43 +0300
          xnode:  neotest@rio.kirr.nexedi.com (2401:5180:0:2a::1 192.168.0.8)
          uname:  Linux rio 4.16.0-2-amd64 #1 SMP Debian 4.16.16-2 (2018-06-22) x86_64 GNU/Linux
          sw/python:          Python 2.7.15
          sw/go:              go version go1.10.3 linux/amd64
          sw/sqlite:          sqlite 3.24.0 (py mod 2.6.0)
          sw/mysqld:          mysqld  Ver 10.1.29-MariaDB-6+b1 for debian-linux-gnu on x86_64 (Debian buildd-unstable)
          sw/neo:             v1.9-42-g972ff5f9
          sw/zodb:            5.4.0
          sw/zeo:             5.2.0
          sw/mysqlclient:     1.3.13
      
      Run NEO/{py,go} unit tests there:
      
          x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest test neotest@rio.kirr.nexedi.com:4
          ?       lab.nexedi.com/kirr/neo/go/internal/packed      [no test files]
          ok      lab.nexedi.com/kirr/neo/go/neo/neonet   (cached)
          ok      lab.nexedi.com/kirr/neo/go/neo/proto    (cached)
          ok      lab.nexedi.com/kirr/neo/go/zodb (cached)
          ?       lab.nexedi.com/kirr/neo/go/zodb/cmd/zodb        [no test files]
          ?       lab.nexedi.com/kirr/neo/go/zodb/internal/pickletools    [no test files]
          ?       lab.nexedi.com/kirr/neo/go/zodb/storage [no test files]
          ok      lab.nexedi.com/kirr/neo/go/zodb/storage/fs1     (cached)
          ?       lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/cmd/fs1     [no test files]
          ok      lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fs1tools    (cached)
          ?       lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fsb [no test files]
          ?       lab.nexedi.com/kirr/neo/go/zodb/storage/zeo     [no test files]
          ?       lab.nexedi.com/kirr/neo/go/zodb/wks     [no test files]
          ok      lab.nexedi.com/kirr/neo/go/zodb/zodbtools       (cached)
          ...........................................................................E.EE....c....^C
      
      Neotest will be the driver that was used to prepare benchmarks in
      http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html .
      
      Some draft history related to this patch:
      
      lab.nexedi.com/kirr/neo/commit/d46afb3e		X neotest: Teach it to also run go & py unit tests; hook it into nxd/runTestSuite
      lab.nexedi.com/kirr/neo/commit/f694d643		X neotest: Don't fail silently if network address detection fails
      lab.nexedi.com/kirr/neo/commit/fa78290a		X neotest: FQDN host name might be not configured
      lab.nexedi.com/kirr/neo/commit/56faccad		X neotest/info-local: Don't crash if a prog could not be found
      lab.nexedi.com/kirr/neo/commit/f2932247		X neotest/info-local: Don't crash if an egg could not be found
      lab.nexedi.com/kirr/neo/commit/f06b7302		X neotest: Determine machine IP addresses via `ip ...` directly, not `getent ...`
      lab.nexedi.com/kirr/neo/commit/42e5fe71		X neotest info
      c12f2991
  2. 08 Jul, 2018 11 commits
    • Kirill Smelkov's avatar
      go/zodb/zeo: Add support for protocols preceding ZEO5 · 5f30b4c0
      Kirill Smelkov authored
      In particular try to support ZEO4:
      
      - during handshake we now first wait for remote server to announce its
        preferred protocol, and only then send the version we select to use.
        This is the procedure original ZEO server-client do.
      
      - teach rpc.call to decode exceptions not only for how ZEO5 encodes them
        (marking via 2 flag in "async" field), but also on how ZEO4 and
        earlier encode them: via replying with (exc_type, exc_inst) and
        expecting client to dynamically check exc_type is a subtype of
        Exception.
      
      - handle other protocol differences - e.g. ZEO5 returns last_tid on
        register(), while earlier versions return nothing there.
      
      Tests pending.
      5f30b4c0
    • Kirill Smelkov's avatar
      go/zodb: Teach ZODB/go to access ZEO (draft) · b65f6d0f
      Kirill Smelkov authored
      For the reference on deco (performance, frequency not fixed):
      
      	name                           time/object
      	deco/fs1/zhash.py              15.8µs ± 2%
      	deco/fs1/zhash.py-P16           116µs ±12%
      	deco/fs1/zhash.go              2.60µs ± 0%
      	deco/fs1/zhash.go+prefetch128  3.70µs ±11%
      	deco/fs1/zhash.go-P16          13.4µs ±43%
      	deco/zeo/zhash.py               316µs ± 7%
      	deco/zeo/zhash.py-P16          2.68ms ± 7%
      	deco/zeo/zhash.go               111µs ± 2%
      	deco/zeo/zhash.go+prefetch128  57.7µs ± 2%
      	deco/zeo/zhash.go-P16          1.23ms ± 5%
      
      and in particular it shows that with the same ZEO/py server, the latency
      to load an object via py client is ~ 3x worse compared to the latency to
      load the same object via hereby Go client.
      
      The performance was obtained via forthcoming neotest, and in particular
      ZEO/go client will be also used in forthcoming zwrk (no analog on python side).
      
      See http://navytux.spb.ru/~kirr/neo.html#performance-tests for details.
      
      Tests: pending.
      b65f6d0f
    • Kirill Smelkov's avatar
      go/zodb: Start putting pickle-related utilities into internal/pickletools package · 2ee495ce
      Kirill Smelkov authored
      As the first step factor-out int64 Xint64 checker from zodb/storagefs1/index.go
      into there.  We'll need the checker in the next patch.
      2ee495ce
    • Kirill Smelkov's avatar
      go/neo/neonet: Lightweight mode · ec4b3ce0
      Kirill Smelkov authored
      In situations when created connections are used to only send/receive 1
      request/response, the overhead to create/shutdown full connections could be
      too much. Unfortunately this is exactly the mode that is currently
      primarily used for compatibility with NEO/py. To help mitigate the overhead
      in such scenarios, lightweight connections mode is provided:
      
      At requester side, one message can be sent over node link with link.Send1 .
      Inside a connection will be created and then shut down, but since the
      code manages whole process internally and does not show the connection to
      user, it can optimize those operations significantly. Similarly link.Ask1
      sends 1 request, receives 1 response, and then puts the connection back into
      pool for later reuse.
      
      At receiver side, link.Recv1 accepts a connection with the first message
      remote peer sent us when establishing it, and wraps the result into Request
      object. The Request contains the message received and internally the
      connection. A response can be sent back via Request.Reply. Then once
      Request.Close is called the connection object that was accepted is
      immediately put back into pool for later reuse.
      
      Some history of lightweight mode:
      
      lab.nexedi.com/kirr/neo/commit/0fa96338	X Clarified Request.Close semantics - tests working again
      lab.nexedi.com/kirr/neo/commit/a5ac1652	X Ask1: switch to sending directly over link
      lab.nexedi.com/kirr/neo/commit/755e3654	X Request.Reply: switch to replying directly over link
      lab.nexedi.com/kirr/neo/commit/c643ba53	X Send1: switch to sending directly over link
      lab.nexedi.com/kirr/neo/commit/7dcbc9c5	X Send1: switch to lightClose
      lab.nexedi.com/kirr/neo/commit/851864a9	X chan RTT benchmark which simulates Recv1 = Accept + Recv
      lab.nexedi.com/kirr/neo/commit/099bfc29	X freelist for PktBuf
      lab.nexedi.com/kirr/neo/commit/58c2e39a	X Benchmark for link Ask1/Recv1 over TCP loopback
      ec4b3ce0
    • Kirill Smelkov's avatar
      go/neo/neonet: User-visible Send/Recv + Ask/Expect · 39982595
      Kirill Smelkov authored
      Provide Conn.Send and Conn.Recv which work on NEO messages and
      automatically encode/decode them into packets on the fly.
      
      Similarly to NEO/py also provide Ask to send a request and receive
      expected reply and Expect which does only the latter half of Ask.
      39982595
    • Kirill Smelkov's avatar
      go/neo/neonet: Link establishment · e8396003
      Kirill Smelkov authored
      Implement NEO protocol handshaking and use it in newly provided DialLink
      and ListenLink which correspondingly first do regular network dial or
      listen and than perform the handshake on just established TCP
      connection. If handshake goes ok, the result is wrapped into NodeLink.
      
      Some history:
      
      	lab.nexedi.com/kirr/neo/commit/8d0a1469	X Handshake draftly done
      
      See also http://navytux.spb.ru/~kirr/neo.html#development-overview
      (starting from "The neonet module also provides DialLink and ListenLink
      ...")
      e8396003
    • Kirill Smelkov's avatar
      go/neo/neonet: Start (first draft) · 64513925
      Kirill Smelkov authored
      Continue NEO/go with neonet - the layer to exchange messages in between
      NEO nodes.
      
      NEO/go shifts from thinking about NEO protocol logic as RPC to thinking
      of it as more general network protocol and so settles to provide general
      connection-oriented message exchange service. This way neonet provides
      generic connection multiplexing on top of a single TCP node-node link.
      
      Neonet compatibility with NEO/py depends on the following small NEO/py patch:
      
          kirr/neo@dd3bb8b4
      
      which adjusts message ID a bit so it behaves like stream_id in HTTP/2:
      
          - always even for server initiated streams
          - always odd  for client initiated streams
      
      and is incremented by += 2, instead of += 1 to maintain above invariant.
      
      See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from
      "Then comes the link layer which provides service to exchange messages over
      network...") for the rationale.
      
      Unfortunately current NEO/py maintainer is very much against merging that patch.
      
      This patch brings in the core of neonet. Next patches will add initial
      handshaking, user-level Send/Recv + Ask/Expect and "lightweight mode".
      
      Some neonet core history:
      
      lab.nexedi.com/kirr/neo/commit/6b9ed46d	X neonet: Avoid integer overflow on max packet length check
      lab.nexedi.com/kirr/neo/commit/8eac771c	X neo/connection: Fix race between link.shutdown() and conn.lightClose()
      lab.nexedi.com/kirr/neo/commit/8021a1d5	X rxghandoff
      lab.nexedi.com/kirr/neo/commit/68738036	X ... but negative impact on separate client / server processes, strange ...
      lab.nexedi.com/kirr/neo/commit/b0dda9d2	X serveRecv: help Go scheduler to switch to receiving G sooner
      lab.nexedi.com/kirr/neo/commit/4989918a	X remove defer from rx/tx hot paths
      lab.nexedi.com/kirr/neo/commit/e055406a	X no select for acceptq - similarly for rxq path
      lab.nexedi.com/kirr/neo/commit/c28ad4d0	X Conn.Recv: receive without select
      lab.nexedi.com/kirr/neo/commit/496bd425	X add benchmark RTT over plain net.Conn with serveRecv-style RX handler
      lab.nexedi.com/kirr/neo/commit/9fa79958	X draft how to mark RX down without reallocating .rxdown
      lab.nexedi.com/kirr/neo/commit/4324c812	X restore all Conn functionality
      lab.nexedi.com/kirr/neo/commit/a8e61d2f	X serveSend is not needed
      lab.nexedi.com/kirr/neo/commit/9d047b36	X recvPkt via only 1 syscall
      lab.nexedi.com/kirr/neo/commit/b555a507	X baseline net RTT benchmark
      lab.nexedi.com/kirr/neo/commit/91be5cdd	X everyone is listening from start; CloseAccept to disable listening - works
      lab.nexedi.com/kirr/neo/commit/c2a1b63a	X naming: Packet = raw data; Message = meaningful object
      lab.nexedi.com/kirr/neo/commit/6fd0c9be	X connection: Adding context to errors from NodeLink and Conn operations
      lab.nexedi.com/kirr/neo/commit/65b17bdc	X rework Conn acceptance to be explicit via NodeLink.Accept
      64513925
    • Kirill Smelkov's avatar
      go/neo/proto: Test that message codes are the same in between Go and Python NEO versions · 5beab048
      Kirill Smelkov authored
      This brings some go/py compatibility checks that verify go and python
      treat a message code equally. Although messages encoding are tested in
      the previous patch there is no explicit tests for go/py compatibility on
      messages encoding.
      5beab048
    • Kirill Smelkov's avatar
      go/neo/proto: Serialization support · ea5f7d61
      Kirill Smelkov authored
      Provide a way for every message to be encoded/decoded to/from NEO wire
      encoding. For this introduce Msg interface with wire coding methods and
      provide such methods for all message types.
      
      For selected types the methods are implemented manually.
      For most of the types the methods are generated automatically by protogen.go program.
      
      protogen.go was mentioned in http://navytux.spb.ru/~kirr/neo.html#development-overview
      in "On server-side NEO/go work started by first implementing messages
      serialization in exactly the same wire format as NEO/py does ..." paragraph.
      
      A bit of late protogen fixups history:
      
      	lab.nexedi.com/kirr/neo/commit/c884bfd5
      	lab.nexedi.com/kirr/neo/commit/385d813a
      	lab.nexedi.com/kirr/neo/commit/0f7e0b00
      	lab.nexedi.com/kirr/neo/commit/de3ef2c0
      
      Also a message type can be reverse-looked up by message code via MsgType().
      This will be later used in network receive code path.
      ea5f7d61
    • Kirill Smelkov's avatar
      go/neo/proto: String/Error/Address/UUID conversion · fcd6f9f6
      Kirill Smelkov authored
      Provide routines to convert selected types to string and also for
      UUID and Address <-> string  encoding/decoding.
      fcd6f9f6
    • Kirill Smelkov's avatar
      go/neo/proto: Start (first draft) · 6301c23f
      Kirill Smelkov authored
      Start NEO/go code with protocol package that defines message that NEO
      nodes exchange in between each other. The definition is based on
      neo/lib/protocol.py and is kept in sync with that file.
      
      This commit brings only messages definition. Messages serialization will
      come in the follow-up patch.
      6301c23f
  3. 05 Jul, 2018 1 commit
  4. 04 Jul, 2018 1 commit
  5. 03 Jul, 2018 1 commit
  6. 09 Apr, 2018 1 commit
  7. 13 Mar, 2018 1 commit
  8. 02 Mar, 2018 3 commits
    • Julien Muchembled's avatar
      master: fix resumption of backup replication (internal or not) · 27229793
      Julien Muchembled authored
      Before, it waited for upstream activity until all partitions are touched.
      However, when upstream is idle the backup cluster could remain stuck forever
      if it was interrupted whereas some cells were still late.
      27229793
    • Julien Muchembled's avatar
      master: fix/simplify generation of TID · 7b2e6752
      Julien Muchembled authored
      The 'min_tid < new_tid' assertion failed when jumping to the past.
      7b2e6752
    • Julien Muchembled's avatar
      master: fix possible failure when reading data in a backup cluster with replicas · ca2f7061
      Julien Muchembled authored
      Given that:
      - read locks are only taken by transactions (not replication)
      - in backup mode, storage nodes stay in UP_TO_DATE state, even if partitions
        are synchronized up to different tids
      
      there was a race condition with the master node replying to LastTransaction
      with a TID that may not be replicated yet by all replicas, potentially causing
      such replicas to reply OidDoesNotExist or OidNotFound if a client asks it data
      too early.
      
      IOW, even if the cluster does contain the data up to `getBackupTid(max)`,
      it is only readable by NEO clients up to `getBackupTid(min)` as long as the
      cluster is in BACKINGUP state.
      ca2f7061
  9. 17 Jan, 2018 1 commit
  10. 15 Jan, 2018 12 commits