- 11 Jul, 2018 4 commits
-
-
Kirill Smelkov authored
Add to neotest bench-net command that performs latency measurments at ping and TCP levels. Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-net neotest@rio:9 node: cluster: deco-rio *** link latency: # deco ⇄ rio (ping 16B) PING rio (192.168.0.8) 16(44) bytes of data. --- rio ping statistics --- 25705 packets transmitted, 25705 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.080/0.097/0.220/0.011 ms, ipg/ewma 0.116/0.095 ms Benchmarkpingrtt-/16B-min 1 0.080 ms/op Benchmarkpingrtt-/16B-avg 1 0.097 ms/op # POLL·3 C1·476 C1E·60917 C3·53 C6·132 C7s·0 C8·203 C9·0 C10·141 ... *** TCP latency: # deco ⇄ rio (lat_tcp.c 1B -> lat_tcp.c -s) Benchmarktcprtt(c_c)-/1B 1 116.1743 µs/op # TCP latency using rio: 116.1743 microseconds # POLL·6 C1·892 C1E·65748 C3·80 C6·165 C7s·0 C8·339 C9·0 C10·444 Benchmarktcprtt(c_c)-/1B 1 117.2896 µs/op # TCP latency using rio: 117.2896 microseconds # POLL·4 C1·1063 C1E·67647 C3·64 C6·77 C7s·0 C8·144 C9·0 C10·209 Benchmarktcprtt(c_c)-/1B 1 117.5331 µs/op # TCP latency using rio: 117.5331 microseconds # POLL·1 C1·954 C1E·76866 C3·96 C6·88 C7s·0 C8·206 C9·0 C10·246 Benchmarktcprtt(c_c)-/1B 1 117.6509 µs/op # TCP latency using rio: 117.6509 microseconds # POLL·4 C1·731 C1E·84210 C3·103 C6·93 C7s·0 C8·180 C9·0 C10·187 Benchmarktcprtt(c_c)-/1B 1 116.8125 µs/op # TCP latency using rio: 116.8125 microseconds # POLL·9 C1·550 C1E·79544 C3·110 C6·213 C7s·0 C8·508 C9·0 C10·475 ... And its summary via benchstat: name time/op pingrtt-/16B-min 80.0µs ± 0% pingrtt-/16B-avg 97.0µs ± 0% -pingrtt/16B-min 79.0µs ± 0% -pingrtt/16B-avg 112µs ± 0% pingrtt-/1452B-min 241µs ± 0% pingrtt-/1452B-avg 303µs ± 0% -pingrtt/1452B-min 266µs ± 0% -pingrtt/1452B-avg 303µs ± 0% tcprtt(c_c)-/1B 117µs ± 1% tcprtt(c_go)-/1B 122µs ± 2% -tcprtt(c_c)/1B 117µs ± 1% -tcprtt(c_go)/1B 121µs ± 5% tcprtt(c_c)-/1400B 392µs ± 4% tcprtt(c_go)-/1400B 363µs ±18% -tcprtt(c_c)/1400B 412µs ±21% -tcprtt(c_go)/1400B 391µs ±38% tcprtt(c_c)-/1500B 271µs ±18% tcprtt(c_go)-/1500B 290µs ±21% -tcprtt(c_c)/1500B 282µs ±16% -tcprtt(c_go)/1500B 334µs ±24% tcprtt(c_c)-/4096B 711µs ± 5% tcprtt(c_go)-/4096B 737µs ± 5% -tcprtt(c_c)/4096B 740µs ± 2% -tcprtt(c_go)/4096B 711µs ± 7% Latencies here are not good because for this run on rio interrupt mitigation was not tuned (see below). By the way, analyzing ping RTT latencies on our shuttle machines (similar to rio) resulted in the following kernel patch https://git.kernel.org/linus/509708310c (released with Linux 4.15) to fix/being able to adjust interrupt mitigation on Realtek NICs. While at networking topic, teach info/info-local to show related information about node's NICs. Example lines output for deco: nic/eth0: Intel Corporation Ethernet Connection I219-LM rev 21 nic/eth0/features: rx tx sg tso !ufo gso gro !lro rxvlan txvlan !ntuple rxhash ... nic/eth0/coalesce: rxc: 3μs/0f/0μs-irq/0f-irq, txc: 0μs/0f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs nic/wlan0: Intel Corporation Wireless 8260 rev 3a nic/wlan0/features: !rx !tx sg !tso !ufo gso gro !lro !rxvlan !txvlan !ntuple !rxhash ... nic/wlan0/coalesce: rxc: ?, txc: ? nic/wlan0/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/wlan0: TSO not enabled - TCP latency with packets > MSS will be poor for rio: nic/eth0: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth0/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth0/coalesce: rxc: 200μs/4f/0μs-irq/0f-irq, txc: 200μs/4f/0μs-irq/0f-irq nic/eth0/status: up, speed=1000, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth0: TSO not enabled - TCP latency with packets > MSS will be poor WARNING: nic/eth0: RX coalesce latency is max 200μs - that will add to networked request-reply latency nic/eth1: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller rev 06 nic/eth1/features: rx !tx !sg !tso !ufo !gso gro !lro rxvlan txvlan !ntuple !rxhash ... nic/eth1/coalesce: rxc: 0μs/1f/0μs-irq/0f-irq, txc: 0μs/1f/0μs-irq/0f-irq nic/eth1/status: down, speed=?, mtu=1500, txqlen=1000, gro_flush_timeout=0.000µs WARNING: nic/eth1: TSO not enabled - TCP latency with packets > MSS will be poor The warning about "RX coalesce latency is max 200μs ..." says that on receive path eth0 will be coalescing incoming frames for up to 200μs and this way this delay will be added to overal latency. (for small frames Realtek NICs do not coalesce interrupts - see details in the kernel patch). Networked performance (raw and NEO) was not discussed in http://navytux.spb.ru/~kirr/neo.html at all, but for the reference the importance of C-states for performance was first found via this networking latency benchmarks. Links on C-states topic: http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/e8e395ae X neotest: Move network benchmarking into separate function + add `neotest bench-net` lab.nexedi.com/kirr/neo/commit/a971231c X neotest/info: Handle USB NICs lab.nexedi.com/kirr/neo/commit/5dd3d1ab X neotest: sort NIC names lab.nexedi.com/kirr/neo/commit/9888f047 X neotest: Do not crash if kernel is too old to support gro_flush_timeout lab.nexedi.com/kirr/neo/commit/3a1bdf4a X bench-remote / tcp : std benchmark output lab.nexedi.com/kirr/neo/commit/9450b6db X bench-remote / ping += std bench output lab.nexedi.com/kirr/neo/commit/68d5b015 X show gro_flush_timeout + friends lab.nexedi.com/kirr/neo/commit/4c815af9 X neotest: Show NIC features and emit warning if !TSO lab.nexedi.com/kirr/neo/commit/659ce938 X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc... lab.nexedi.com/kirr/neo/commit/ded384cb X neotest += `lat_tcp.go -s` lab.nexedi.com/kirr/neo/commit/59d46504 X neotest += lat_tcp lab.nexedi.com/kirr/neo/commit/67fc3440 X show small (56B) and full-packet (1472B) ping link latencies
-
Kirill Smelkov authored
Add to neotest bench-disk command that performs random-read disk benchmarks via ioping. Example output: (venv) (8) neotest@rio:~/8/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-disk node: rio cluster: *** disk: random direct (no kernel cache) 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 29.1 k requests completed in 2.95 s, 113.6 MiB read, 9.85 k iops, 38.5 MiB/s generated 29.1 k requests in 3.00 s, 113.6 MiB, 9.69 k iops, 37.9 MiB/s min/avg/max/mdev = 43.2 us / 101.5 us / 250.0 us / 7.48 us Benchmarkdisk/randread/direct/4K-min 1 43.2 us/op Benchmarkdisk/randread/direct/4K-avg 1 101.5 us/op < 59.2 us 458 | < 63.0 us 0 | < 66.7 us 0 | < 70.5 us 0 | < 74.2 us 1 | < 78.0 us 1 | < 81.7 us 0 | < 85.5 us 1 | < 89.2 us 0 | < 93.0 us 0 | < 96.7 us 0 | < 100.5 us 333 | < 104.2 us 27793 | *********************************************** < 108.0 us 259 | < 111.7 us 21 | < 115.5 us 8 | < 119.2 us 18 | < 123.0 us 1 | < 126.7 us 0 | < 130.5 us 7 | < 134.2 us 59 | < +∞ 18 | # POLL·186 C1·291360 C1E·290802 C3·31 C6·1218 ... *** disk: random cached 4K-read latency --- . (ext4 /dev/sda1) ioping statistics --- 3.15 M requests completed in 2.82 s, 12.0 GiB read, 1.12 M iops, 4.26 GiB/s generated 3.15 M requests in 3.00 s, 12.0 GiB, 1.05 M iops, 4.00 GiB/s min/avg/max/mdev = 465 ns / 896 ns / 37.4 us / 183 ns Benchmarkdisk/randread/pagecache/4K-min 1 465 ns/op Benchmarkdisk/randread/pagecache/4K-avg 1 896 ns/op < 839 ns 771375 | ************ < 872 ns 609361 | ********* < 905 ns 660635 | ********** < 938 ns 505305 | ******** < 971 ns 189182 | *** < 1.00 us 93655 | * < 1.04 us 70811 | * < 1.07 us 57650 | < 1.10 us 51587 | < 1.14 us 44648 | < 1.17 us 40868 | < 1.20 us 27301 | < 1.24 us 12503 | < 1.27 us 5580 | < 1.30 us 2517 | < 1.34 us 1404 | < 1.37 us 698 | < 1.40 us 378 | < 1.44 us 208 | < 1.47 us 119 | < 1.50 us 50 | < +∞ 978 | # POLL·1 C1·57 C1E·11 C3·2 C6·257 The benchmarks are so done so that output conforms to Go benchmarking format. This way it is possible to process the output with benchstat to summarize / compare results. For above run summarization gives: name time/op disk/randread/direct/4K-min 39.0µs ±12% disk/randread/direct/4K-avg 117µs ± 3% disk/randread/pagecache/4K-min 461ns ± 3% disk/randread/pagecache/4K-avg 880ns ± 0% While at disk topic, teach info/info-local to show related information about node's disk. Example line output for rio: disk/sda: Samsung SSD 840 rev BB0Q 931.5G Please see http://navytux.spb.ru/~kirr/neo.html#the-need-for-faster-storage http://navytux.spb.ru/~kirr/neo.html#appendix-i-ssd-latency for some discussion about SSD performance. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/d35a2fdf X neotest/info-local: Fix disk display in presence of bind-mounts lab.nexedi.com/kirr/neo/commit/44529dbf X neotest/bench-disk: Deduplicate code; change 1M -> 2M lab.nexedi.com/kirr/neo/commit/8bac3dba X neotest/bench-disk: Also benchmark randomly reading 1M blocks lab.nexedi.com/kirr/neo/commit/e795c6ed X neotest: Fix disk display in case of DM lab.nexedi.com/kirr/neo/commit/352cd100 X neotest: Fix disk display in case of MD lab.nexedi.com/kirr/neo/commit/cd2cd093 X bench_disk: Add std bench format lab.nexedi.com/kirr/neo/commit/9f86eb40 X bench += ioping
-
Kirill Smelkov authored
Add to neotest bench-cpu command that performs basic CPU benchmarks: pystone and CRC32/SHA1 for now. While every benchmark is run additionally C-states profile is collected(*). Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu node: deco cluster: Benchmarkpystone 1 283297 pystone/s # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6 Benchmarkpystone 1 289788 pystone/s # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7 Benchmarkpystone 1 286329 pystone/s # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6 Benchmarkpystone 1 292087 pystone/s # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3 Benchmarkpystone 1 290119 pystone/s # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5 Benchmarkcrc32/py/4K 300000 3.415 µs/op # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71 Benchmarkcrc32/py/4K 300000 3.402 µs/op # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77 Benchmarkcrc32/py/4K 300000 3.396 µs/op # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36 Benchmarkcrc32/py/4K 300000 3.435 µs/op # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79 Benchmarkcrc32/py/4K 300000 3.434 µs/op # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295 Benchmarkcrc32/go/4K 10000000 0.216 µs/op # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328 Benchmarksha1/py/4K 300000 4.553 µs/op # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94 Benchmarksha1/py/4K 300000 4.459 µs/op # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92 Benchmarksha1/py/4K 300000 4.492 µs/op # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62 Benchmarksha1/py/4K 300000 4.550 µs/op # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93 Benchmarksha1/py/4K 300000 4.518 µs/op # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78 Benchmarksha1/go/4K 300000 4.312 µs/op # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190 Benchmarksha1/go/4K 300000 4.383 µs/op # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182 Benchmarksha1/go/4K 300000 4.387 µs/op # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186 Benchmarksha1/go/4K 300000 4.328 µs/op # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179 Benchmarksha1/go/4K 300000 4.337 µs/op # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191 Such raw output can be summarized with the help of benchstat - either with Go[1] or Python[2] implementations: $ benchstat x.txt name pystone/s pystone 288k ± 2% name time/op crc32/py/4K 3.42µs ± 1% crc32/go/4K 218ns ± 1% sha1/py/4K 4.51µs ± 1% sha1/go/4K 4.35µs ± 1% See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some discussion on SHA1 vs CRC32. While at CPU topic, teach info/info-local to show related information about node's CPU: available processors, frequency and idle governors. Example of lines added: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6 ... cpu: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz] cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs WARNING: cpu: frequency not fixed - benchmark timings won't be stable WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency) See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to understand why there are warnings in above example. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/cf1f7c24 X tcpu: Don't depend on running tests with cwd = .../go/neo/t/ lab.nexedi.com/kirr/neo/commit/1e438610 fixup! X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/4af48245 X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/2910cf56 X neotest: Prefer first part of FQDN for hostname lab.nexedi.com/kirr/neo/commit/c86ba1b0 X bench-cpu += crc32, adler32 lab.nexedi.com/kirr/neo/commit/4ac3a550 X neotest: Don't use bc lab.nexedi.com/kirr/neo/commit/3918a997 X neotest: Don't assume we are invoked from the directory where neotest is lab.nexedi.com/kirr/neo/commit/9a266d11 X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B lab.nexedi.com/kirr/neo/commit/b6a830d8 X switch cpu benchmarks to go format lab.nexedi.com/kirr/neo/commit/4436b983 X neotest: Provide cpustat command so it is possible to cpustat something external lab.nexedi.com/kirr/neo/commit/b062b349 X microbenchmark CPU first lab.nexedi.com/kirr/neo/commit/a4a18b55 X first cut on C-state profiling lab.nexedi.com/kirr/neo/commit/ea1e0835 X found that cpuidle can be affecting latency a lot! (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for why this is important. Since being able to profile C-states can be generally useful, we expose such profiling with externally-visible `neotest cpustat` utility. [1] https://godoc.org/golang.org/x/perf/cmd/benchstat [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
-
Kirill Smelkov authored
For testing and benchmarking how NEO/py & NEO/go interact with each other we need corresponding test driver. Neotest will be that driver and present patch begins it: - neotest can deploy NEO/{go,py} checkout either locally or on remote node; - it can run NEO/{go,py} unit tests either locally or on remote node; - it can also show information about a system - either local or remote. Examples in action. Banner: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest Neotest is a tool to test and benchmark NEO. Usage: neotest command [arguments] The commands are: test run all tests on a remote host test-local run all tests locally test-go run NEO/go unit tests (part of test-local) test-py run NEO/py unit tests (part of test-local) deploy deploy NEO & needed software for tests to remote host deploy-local deploy NEO & needed software for tests locally info print information about a node info-local print information about local deployment Deploy to another node: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest deploy neotest@rio.kirr.nexedi.com:3 *** deploying to neotest@rio.kirr.nexedi.com:3 ... ... # deployed ok Print information about that node: x/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:3 date: Sun, 08 Jul 2018 21:30:43 +0300 xnode: neotest@rio.kirr.nexedi.com (2401:5180:0:2a::1 192.168.0.8) uname: Linux rio 4.16.0-2-amd64 #1 SMP Debian 4.16.16-2 (2018-06-22) x86_64 GNU/Linux sw/python: Python 2.7.15 sw/go: go version go1.10.3 linux/amd64 sw/sqlite: sqlite 3.24.0 (py mod 2.6.0) sw/mysqld: mysqld Ver 10.1.29-MariaDB-6+b1 for debian-linux-gnu on x86_64 (Debian buildd-unstable) sw/neo: v1.9-42-g972ff5f9 sw/zodb: 5.4.0 sw/zeo: 5.2.0 sw/mysqlclient: 1.3.13 Run NEO/{py,go} unit tests there: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest test neotest@rio.kirr.nexedi.com:4 ? lab.nexedi.com/kirr/neo/go/internal/packed [no test files] ok lab.nexedi.com/kirr/neo/go/neo/neonet (cached) ok lab.nexedi.com/kirr/neo/go/neo/proto (cached) ok lab.nexedi.com/kirr/neo/go/zodb (cached) ? lab.nexedi.com/kirr/neo/go/zodb/cmd/zodb [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/internal/pickletools [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/storage [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/storage/fs1 (cached) ? lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/cmd/fs1 [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fs1tools (cached) ? lab.nexedi.com/kirr/neo/go/zodb/storage/fs1/fsb [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/storage/zeo [no test files] ? lab.nexedi.com/kirr/neo/go/zodb/wks [no test files] ok lab.nexedi.com/kirr/neo/go/zodb/zodbtools (cached) ...........................................................................E.EE....c....^C Neotest will be the driver that was used to prepare benchmarks in http://navytux.spb.ru/~kirr/neo.html and http://navytux.spb.ru/~kirr/misc/neo·P4.html . Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/d46afb3e X neotest: Teach it to also run go & py unit tests; hook it into nxd/runTestSuite lab.nexedi.com/kirr/neo/commit/f694d643 X neotest: Don't fail silently if network address detection fails lab.nexedi.com/kirr/neo/commit/fa78290a X neotest: FQDN host name might be not configured lab.nexedi.com/kirr/neo/commit/56faccad X neotest/info-local: Don't crash if a prog could not be found lab.nexedi.com/kirr/neo/commit/f2932247 X neotest/info-local: Don't crash if an egg could not be found lab.nexedi.com/kirr/neo/commit/f06b7302 X neotest: Determine machine IP addresses via `ip ...` directly, not `getent ...` lab.nexedi.com/kirr/neo/commit/42e5fe71 X neotest info
-
- 08 Jul, 2018 11 commits
-
-
Kirill Smelkov authored
In particular try to support ZEO4: - during handshake we now first wait for remote server to announce its preferred protocol, and only then send the version we select to use. This is the procedure original ZEO server-client do. - teach rpc.call to decode exceptions not only for how ZEO5 encodes them (marking via 2 flag in "async" field), but also on how ZEO4 and earlier encode them: via replying with (exc_type, exc_inst) and expecting client to dynamically check exc_type is a subtype of Exception. - handle other protocol differences - e.g. ZEO5 returns last_tid on register(), while earlier versions return nothing there. Tests pending.
-
Kirill Smelkov authored
For the reference on deco (performance, frequency not fixed): name time/object deco/fs1/zhash.py 15.8µs ± 2% deco/fs1/zhash.py-P16 116µs ±12% deco/fs1/zhash.go 2.60µs ± 0% deco/fs1/zhash.go+prefetch128 3.70µs ±11% deco/fs1/zhash.go-P16 13.4µs ±43% deco/zeo/zhash.py 316µs ± 7% deco/zeo/zhash.py-P16 2.68ms ± 7% deco/zeo/zhash.go 111µs ± 2% deco/zeo/zhash.go+prefetch128 57.7µs ± 2% deco/zeo/zhash.go-P16 1.23ms ± 5% and in particular it shows that with the same ZEO/py server, the latency to load an object via py client is ~ 3x worse compared to the latency to load the same object via hereby Go client. The performance was obtained via forthcoming neotest, and in particular ZEO/go client will be also used in forthcoming zwrk (no analog on python side). See http://navytux.spb.ru/~kirr/neo.html#performance-tests for details. Tests: pending.
-
Kirill Smelkov authored
As the first step factor-out int64 Xint64 checker from zodb/storagefs1/index.go into there. We'll need the checker in the next patch.
-
Kirill Smelkov authored
In situations when created connections are used to only send/receive 1 request/response, the overhead to create/shutdown full connections could be too much. Unfortunately this is exactly the mode that is currently primarily used for compatibility with NEO/py. To help mitigate the overhead in such scenarios, lightweight connections mode is provided: At requester side, one message can be sent over node link with link.Send1 . Inside a connection will be created and then shut down, but since the code manages whole process internally and does not show the connection to user, it can optimize those operations significantly. Similarly link.Ask1 sends 1 request, receives 1 response, and then puts the connection back into pool for later reuse. At receiver side, link.Recv1 accepts a connection with the first message remote peer sent us when establishing it, and wraps the result into Request object. The Request contains the message received and internally the connection. A response can be sent back via Request.Reply. Then once Request.Close is called the connection object that was accepted is immediately put back into pool for later reuse. Some history of lightweight mode: lab.nexedi.com/kirr/neo/commit/0fa96338 X Clarified Request.Close semantics - tests working again lab.nexedi.com/kirr/neo/commit/a5ac1652 X Ask1: switch to sending directly over link lab.nexedi.com/kirr/neo/commit/755e3654 X Request.Reply: switch to replying directly over link lab.nexedi.com/kirr/neo/commit/c643ba53 X Send1: switch to sending directly over link lab.nexedi.com/kirr/neo/commit/7dcbc9c5 X Send1: switch to lightClose lab.nexedi.com/kirr/neo/commit/851864a9 X chan RTT benchmark which simulates Recv1 = Accept + Recv lab.nexedi.com/kirr/neo/commit/099bfc29 X freelist for PktBuf lab.nexedi.com/kirr/neo/commit/58c2e39a X Benchmark for link Ask1/Recv1 over TCP loopback
-
Kirill Smelkov authored
Provide Conn.Send and Conn.Recv which work on NEO messages and automatically encode/decode them into packets on the fly. Similarly to NEO/py also provide Ask to send a request and receive expected reply and Expect which does only the latter half of Ask.
-
Kirill Smelkov authored
Implement NEO protocol handshaking and use it in newly provided DialLink and ListenLink which correspondingly first do regular network dial or listen and than perform the handshake on just established TCP connection. If handshake goes ok, the result is wrapped into NodeLink. Some history: lab.nexedi.com/kirr/neo/commit/8d0a1469 X Handshake draftly done See also http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from "The neonet module also provides DialLink and ListenLink ...")
-
Kirill Smelkov authored
Continue NEO/go with neonet - the layer to exchange messages in between NEO nodes. NEO/go shifts from thinking about NEO protocol logic as RPC to thinking of it as more general network protocol and so settles to provide general connection-oriented message exchange service. This way neonet provides generic connection multiplexing on top of a single TCP node-node link. Neonet compatibility with NEO/py depends on the following small NEO/py patch: kirr/neo@dd3bb8b4 which adjusts message ID a bit so it behaves like stream_id in HTTP/2: - always even for server initiated streams - always odd for client initiated streams and is incremented by += 2, instead of += 1 to maintain above invariant. See http://navytux.spb.ru/~kirr/neo.html#development-overview (starting from "Then comes the link layer which provides service to exchange messages over network...") for the rationale. Unfortunately current NEO/py maintainer is very much against merging that patch. This patch brings in the core of neonet. Next patches will add initial handshaking, user-level Send/Recv + Ask/Expect and "lightweight mode". Some neonet core history: lab.nexedi.com/kirr/neo/commit/6b9ed46d X neonet: Avoid integer overflow on max packet length check lab.nexedi.com/kirr/neo/commit/8eac771c X neo/connection: Fix race between link.shutdown() and conn.lightClose() lab.nexedi.com/kirr/neo/commit/8021a1d5 X rxghandoff lab.nexedi.com/kirr/neo/commit/68738036 X ... but negative impact on separate client / server processes, strange ... lab.nexedi.com/kirr/neo/commit/b0dda9d2 X serveRecv: help Go scheduler to switch to receiving G sooner lab.nexedi.com/kirr/neo/commit/4989918a X remove defer from rx/tx hot paths lab.nexedi.com/kirr/neo/commit/e055406a X no select for acceptq - similarly for rxq path lab.nexedi.com/kirr/neo/commit/c28ad4d0 X Conn.Recv: receive without select lab.nexedi.com/kirr/neo/commit/496bd425 X add benchmark RTT over plain net.Conn with serveRecv-style RX handler lab.nexedi.com/kirr/neo/commit/9fa79958 X draft how to mark RX down without reallocating .rxdown lab.nexedi.com/kirr/neo/commit/4324c812 X restore all Conn functionality lab.nexedi.com/kirr/neo/commit/a8e61d2f X serveSend is not needed lab.nexedi.com/kirr/neo/commit/9d047b36 X recvPkt via only 1 syscall lab.nexedi.com/kirr/neo/commit/b555a507 X baseline net RTT benchmark lab.nexedi.com/kirr/neo/commit/91be5cdd X everyone is listening from start; CloseAccept to disable listening - works lab.nexedi.com/kirr/neo/commit/c2a1b63a X naming: Packet = raw data; Message = meaningful object lab.nexedi.com/kirr/neo/commit/6fd0c9be X connection: Adding context to errors from NodeLink and Conn operations lab.nexedi.com/kirr/neo/commit/65b17bdc X rework Conn acceptance to be explicit via NodeLink.Accept
-
Kirill Smelkov authored
This brings some go/py compatibility checks that verify go and python treat a message code equally. Although messages encoding are tested in the previous patch there is no explicit tests for go/py compatibility on messages encoding.
-
Kirill Smelkov authored
Provide a way for every message to be encoded/decoded to/from NEO wire encoding. For this introduce Msg interface with wire coding methods and provide such methods for all message types. For selected types the methods are implemented manually. For most of the types the methods are generated automatically by protogen.go program. protogen.go was mentioned in http://navytux.spb.ru/~kirr/neo.html#development-overview in "On server-side NEO/go work started by first implementing messages serialization in exactly the same wire format as NEO/py does ..." paragraph. A bit of late protogen fixups history: lab.nexedi.com/kirr/neo/commit/c884bfd5 lab.nexedi.com/kirr/neo/commit/385d813a lab.nexedi.com/kirr/neo/commit/0f7e0b00 lab.nexedi.com/kirr/neo/commit/de3ef2c0 Also a message type can be reverse-looked up by message code via MsgType(). This will be later used in network receive code path.
-
Kirill Smelkov authored
Provide routines to convert selected types to string and also for UUID and Address <-> string encoding/decoding.
-
Kirill Smelkov authored
Start NEO/go code with protocol package that defines message that NEO nodes exchange in between each other. The definition is based on neo/lib/protocol.py and is kept in sync with that file. This commit brings only messages definition. Messages serialization will come in the follow-up patch.
-
- 05 Jul, 2018 1 commit
-
-
Kirill Smelkov authored
We will need to use BE16 and BE32 in the next patch.
-
- 04 Jul, 2018 1 commit
-
-
Kirill Smelkov authored
See nexedi/zodbtools@b1163449 for details.
-
- 03 Jul, 2018 1 commit
-
-
Kirill Smelkov authored
-
- 09 Apr, 2018 1 commit
-
-
Kirill Smelkov authored
Tomáš Peterka noticed that gotos in dump.go are not actually needed because the same functionality could be achieved with defer in more clean and structured way. Do it. This brings ~ 5% performance hit name old time/op new time/op delta ZodbDump-4 148µs ± 1% 155µs ± 2% +4.69% (p=0.000 n=9+10) because defer implementation is currently not great (https://github.com/golang/go/issues/14939) If we absolutely need those 5% back it could be worked around similar to e.g. FileStorage.Load: https://lab.nexedi.com/kirr/neo/blob/6faed528/go/zodb/storage/fs1/filestorage.go#L133 https://lab.nexedi.com/kirr/neo/blob/6faed528/go/zodb/storage/fs1/filestorage.go#L141 /suggested-by @katomaso
-
- 13 Mar, 2018 1 commit
-
-
Julien Muchembled authored
-
- 02 Mar, 2018 3 commits
-
-
Julien Muchembled authored
Before, it waited for upstream activity until all partitions are touched. However, when upstream is idle the backup cluster could remain stuck forever if it was interrupted whereas some cells were still late.
-
Julien Muchembled authored
The 'min_tid < new_tid' assertion failed when jumping to the past.
-
Julien Muchembled authored
Given that: - read locks are only taken by transactions (not replication) - in backup mode, storage nodes stay in UP_TO_DATE state, even if partitions are synchronized up to different tids there was a race condition with the master node replying to LastTransaction with a TID that may not be replicated yet by all replicas, potentially causing such replicas to reply OidDoesNotExist or OidNotFound if a client asks it data too early. IOW, even if the cluster does contain the data up to `getBackupTid(max)`, it is only readable by NEO clients up to `getBackupTid(min)` as long as the cluster is in BACKINGUP state.
-
- 17 Jan, 2018 1 commit
-
-
Kirill Smelkov authored
Usage of supportsTransactionalUndo() was removed from ZODB in 2007 - see e.g. the following commits: https://github.com/zopefoundation/ZODB/commit/a06bfc03 https://github.com/zopefoundation/ZODB/commit/e667b022 https://github.com/zopefoundation/ZODB/commit/f595f7e7 ... /reviewed-by @vpelletier /reviewed-on nexedi/neoppod!8
-
- 15 Jan, 2018 16 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
`zodb catobj` command to dump content of an object - similarly to `git cat-file`. Two modes: raw and verbose with `zodb dump` like headers for the object present. There is no such command currently in zodbtools/py.
-
Kirill Smelkov authored
Command to print general information about a ZODB database. Same as `zodb info` in zodbtools/py.
-
Kirill Smelkov authored
Add `zodb dump` command to dump arbitrary ZODB database in generic format. The actual dump protocol being used here is the same as in zodbtools/py with https://lab.nexedi.com/zodbtools/merge_requests/3 applied. (the MR there is OK and is just waiting for upstream ZODB to negotiate a way to retrieve transaction extension data in raw form).
-
Kirill Smelkov authored
Add zodbtools which is generic (contrast to fs1tools) set of ZODB managing utilities. Only package and command infrastructure here - actual commands will follow up in the next patches.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Add commands for FileStorage index maintainance: manually rebuild the index and to performe index verification.
-
Kirill Smelkov authored
Add various FileStorage-specific dump commands with output being bit-to-bit exact with the following ZODB/py FileStorage tools: - fsdump.py - fsdump.py (verbose dumper) - fstail.py Please see the patch for links about this dump formats.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Build FileStorage ZODB driver out of format record loading/decoding and index routines we just added in previous patches. The driver supports only read-only mode so far. Promised tests for data format interoperability with ZODB/py are added.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Build index type on top of fsb.Tree introduced in the previous patch and add routines to save and load it to/from disk. We ensure ZODB/py compatibility via generating test FileStorage database + its index and checking we can load index from it and also that if we save an index ZODB/py can load it back. FileStorage index is hard to get bit-to-bit identical since this index uses python pickles which can encode the same objects in several different ways.
-
Kirill Smelkov authored
FileStorage index maps oid to file position storing latest data record for this oid. This index is naturally to implement via BTree as e.g. ZODB/py does. In Go world there is github.com/cznic/b BTree library but without specialization and working via interface{} it is slower than it could be and allocates a lot. So generate specialized version of that code with key and value types exactly suitable for FileStorage indexing. We use a bit patched b version with speed ups for bulk-loading data via regular point-ingestion BTree entry point: https://lab.nexedi.com/kirr/b x/refill The patches has not been upstreamed because it slows down general case a bit (only a bit, but still this is a "no" to me), and because with dedicated bulk-loading API it could be possible to still load data several times faster. Still current version is enough for not very-huge indices. Btw ZODB/py does the same (see fsBucket + friends).
-
Kirill Smelkov authored
Start implementing FileStorage support by adding code to load/decode FileStorage records and way to iterate a FileStorage. Tests will come in a later patch together with ZODB-level loading support.
-