go/neo/t/tcpu.py · a60c472c76da315f918afeb06f994c8610cf7c84 · nexedi / neoppod

go/neo/t/neotest: CPU information & benchmarks · a60c472c
Kirill Smelkov authored Jul 09, 2018
Add to neotest bench-cpu command that performs basic CPU benchmarks:
pystone and CRC32/SHA1 for now. While every benchmark is run
additionally C-states profile is collected(*). Example output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu
	node:   deco
	cluster:
	Benchmarkpystone 1 283297 pystone/s     # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6
	Benchmarkpystone 1 289788 pystone/s     # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7
	Benchmarkpystone 1 286329 pystone/s     # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6
	Benchmarkpystone 1 292087 pystone/s     # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3
	Benchmarkpystone 1 290119 pystone/s     # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5
	Benchmarkcrc32/py/4K 300000     3.415 µs/op     # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71
	Benchmarkcrc32/py/4K 300000     3.402 µs/op     # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77
	Benchmarkcrc32/py/4K 300000     3.396 µs/op     # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36
	Benchmarkcrc32/py/4K 300000     3.435 µs/op     # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79
	Benchmarkcrc32/py/4K 300000     3.434 µs/op     # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295
	Benchmarkcrc32/go/4K 10000000   0.216 µs/op     # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328
	Benchmarksha1/py/4K 300000      4.553 µs/op     # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94
	Benchmarksha1/py/4K 300000      4.459 µs/op     # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92
	Benchmarksha1/py/4K 300000      4.492 µs/op     # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62
	Benchmarksha1/py/4K 300000      4.550 µs/op     # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93
	Benchmarksha1/py/4K 300000      4.518 µs/op     # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78
	Benchmarksha1/go/4K 300000      4.312 µs/op     # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190
	Benchmarksha1/go/4K 300000      4.383 µs/op     # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182
	Benchmarksha1/go/4K 300000      4.387 µs/op     # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186
	Benchmarksha1/go/4K 300000      4.328 µs/op     # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179
	Benchmarksha1/go/4K 300000      4.337 µs/op     # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191

Such raw output can be summarized with the help of benchstat - either
with Go[1] or Python[2] implementations:

	$ benchstat x.txt
	name         pystone/s
	pystone        288k ± 2%

	name         time/op
	crc32/py/4K  3.42µs ± 1%
	crc32/go/4K   218ns ± 1%
	sha1/py/4K   4.51µs ± 1%
	sha1/go/4K   4.35µs ± 1%

See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some
discussion on SHA1 vs CRC32.

While at CPU topic, teach info/info-local to show related information
about node's CPU: available processors, frequency and idle governors.
Example of lines added:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6
	...
	cpu:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
	cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz]
	cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs
	WARNING: cpu: frequency not fixed - benchmark timings won't be stable
	WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable
	WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)

See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to
understand why there are warnings in above example.

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/cf1f7c24	X tcpu: Don't depend on running tests with cwd = .../go/neo/t/
	lab.nexedi.com/kirr/neo/commit/1e438610	fixup! X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/4af48245	X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/2910cf56	X neotest: Prefer first part of FQDN for hostname
	lab.nexedi.com/kirr/neo/commit/c86ba1b0	X bench-cpu += crc32, adler32
	lab.nexedi.com/kirr/neo/commit/4ac3a550	X neotest: Don't use bc
	lab.nexedi.com/kirr/neo/commit/3918a997	X neotest: Don't assume we are invoked from the directory where neotest is
	lab.nexedi.com/kirr/neo/commit/9a266d11	X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B
	lab.nexedi.com/kirr/neo/commit/b6a830d8	X switch cpu benchmarks to go format
	lab.nexedi.com/kirr/neo/commit/4436b983	X neotest: Provide cpustat command so it is possible to cpustat something external
	lab.nexedi.com/kirr/neo/commit/b062b349	X microbenchmark CPU first
	lab.nexedi.com/kirr/neo/commit/a4a18b55	X first cut on C-state profiling
	lab.nexedi.com/kirr/neo/commit/ea1e0835	X found that cpuidle can be affecting latency a lot!

(*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and
    http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for
    why this is important.

    Since being able to profile C-states can be generally useful, we
    expose such profiling with externally-visible `neotest cpustat` utility.

[1] https://godoc.org/golang.org/x/perf/cmd/benchstat
[2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
a60c472c
tcpu.py 3.06 KB
Replace tcpu.py