-
Kirill Smelkov authored
Add to neotest bench-cpu command that performs basic CPU benchmarks: pystone and CRC32/SHA1 for now. While every benchmark is run additionally C-states profile is collected(*). Example output: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu node: deco cluster: Benchmarkpystone 1 283297 pystone/s # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6 Benchmarkpystone 1 289788 pystone/s # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7 Benchmarkpystone 1 286329 pystone/s # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6 Benchmarkpystone 1 292087 pystone/s # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3 Benchmarkpystone 1 290119 pystone/s # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5 Benchmarkcrc32/py/4K 300000 3.415 µs/op # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71 Benchmarkcrc32/py/4K 300000 3.402 µs/op # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77 Benchmarkcrc32/py/4K 300000 3.396 µs/op # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36 Benchmarkcrc32/py/4K 300000 3.435 µs/op # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79 Benchmarkcrc32/py/4K 300000 3.434 µs/op # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295 Benchmarkcrc32/go/4K 10000000 0.216 µs/op # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301 Benchmarkcrc32/go/4K 10000000 0.218 µs/op # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309 Benchmarkcrc32/go/4K 10000000 0.219 µs/op # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328 Benchmarksha1/py/4K 300000 4.553 µs/op # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94 Benchmarksha1/py/4K 300000 4.459 µs/op # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92 Benchmarksha1/py/4K 300000 4.492 µs/op # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62 Benchmarksha1/py/4K 300000 4.550 µs/op # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93 Benchmarksha1/py/4K 300000 4.518 µs/op # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78 Benchmarksha1/go/4K 300000 4.312 µs/op # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190 Benchmarksha1/go/4K 300000 4.383 µs/op # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182 Benchmarksha1/go/4K 300000 4.387 µs/op # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186 Benchmarksha1/go/4K 300000 4.328 µs/op # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179 Benchmarksha1/go/4K 300000 4.337 µs/op # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191 Such raw output can be summarized with the help of benchstat - either with Go[1] or Python[2] implementations: $ benchstat x.txt name pystone/s pystone 288k ± 2% name time/op crc32/py/4K 3.42µs ± 1% crc32/go/4K 218ns ± 1% sha1/py/4K 4.51µs ± 1% sha1/go/4K 4.35µs ± 1% See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some discussion on SHA1 vs CRC32. While at CPU topic, teach info/info-local to show related information about node's CPU: available processors, frequency and idle governors. Example of lines added: x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6 ... cpu: Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz] cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs WARNING: cpu: frequency not fixed - benchmark timings won't be stable WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency) See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to understand why there are warnings in above example. Some draft history related to this patch: lab.nexedi.com/kirr/neo/commit/cf1f7c24 X tcpu: Don't depend on running tests with cwd = .../go/neo/t/ lab.nexedi.com/kirr/neo/commit/1e438610 fixup! X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/4af48245 X neotest: Also show target-latency for C-states lab.nexedi.com/kirr/neo/commit/2910cf56 X neotest: Prefer first part of FQDN for hostname lab.nexedi.com/kirr/neo/commit/c86ba1b0 X bench-cpu += crc32, adler32 lab.nexedi.com/kirr/neo/commit/4ac3a550 X neotest: Don't use bc lab.nexedi.com/kirr/neo/commit/3918a997 X neotest: Don't assume we are invoked from the directory where neotest is lab.nexedi.com/kirr/neo/commit/9a266d11 X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B lab.nexedi.com/kirr/neo/commit/b6a830d8 X switch cpu benchmarks to go format lab.nexedi.com/kirr/neo/commit/4436b983 X neotest: Provide cpustat command so it is possible to cpustat something external lab.nexedi.com/kirr/neo/commit/b062b349 X microbenchmark CPU first lab.nexedi.com/kirr/neo/commit/a4a18b55 X first cut on C-state profiling lab.nexedi.com/kirr/neo/commit/ea1e0835 X found that cpuidle can be affecting latency a lot! (*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for why this is important. Since being able to profile C-states can be generally useful, we expose such profiling with externally-visible `neotest cpustat` utility. [1] https://godoc.org/golang.org/x/perf/cmd/benchstat [2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
a60c472c