Commit a60c472c authored by Kirill Smelkov's avatar Kirill Smelkov

go/neo/t/neotest: CPU information & benchmarks

Add to neotest bench-cpu command that performs basic CPU benchmarks:
pystone and CRC32/SHA1 for now. While every benchmark is run
additionally C-states profile is collected(*). Example output:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest bench-cpu
	node:   deco
	cluster:
	Benchmarkpystone 1 283297 pystone/s     # POLL·1 C1·16 C1E·9 C3·25 C6·32 C7s·0 C8·69 C9·0 C10·6
	Benchmarkpystone 1 289788 pystone/s     # POLL·0 C1·0 C1E·7 C3·10 C6·49 C7s·0 C8·45 C9·0 C10·7
	Benchmarkpystone 1 286329 pystone/s     # POLL·0 C1·0 C1E·18 C3·16 C6·37 C7s·0 C8·63 C9·0 C10·6
	Benchmarkpystone 1 292087 pystone/s     # POLL·0 C1·0 C1E·4 C3·17 C6·40 C7s·0 C8·56 C9·0 C10·3
	Benchmarkpystone 1 290119 pystone/s     # POLL·0 C1·0 C1E·6 C3·13 C6·46 C7s·0 C8·68 C9·0 C10·5
	Benchmarkcrc32/py/4K 300000     3.415 µs/op     # POLL·2 C1·52 C1E·27 C3·9 C6·37 C7s·0 C8·78 C9·0 C10·71
	Benchmarkcrc32/py/4K 300000     3.402 µs/op     # POLL·0 C1·35 C1E·24 C3·18 C6·38 C7s·0 C8·88 C9·0 C10·77
	Benchmarkcrc32/py/4K 300000     3.396 µs/op     # POLL·0 C1·28 C1E·26 C3·12 C6·57 C7s·0 C8·86 C9·0 C10·36
	Benchmarkcrc32/py/4K 300000     3.435 µs/op     # POLL·0 C1·48 C1E·24 C3·8 C6·46 C7s·0 C8·64 C9·0 C10·79
	Benchmarkcrc32/py/4K 300000     3.434 µs/op     # POLL·1 C1·37 C1E·25 C3·11 C6·42 C7s·0 C8·72 C9·0 C10·55
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·0 C1·171 C1E·108 C3·17 C6·62 C7s·0 C8·164 C9·0 C10·295
	Benchmarkcrc32/go/4K 10000000   0.216 µs/op     # POLL·3 C1·131 C1E·128 C3·22 C6·82 C7s·0 C8·179 C9·0 C10·330
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·157 C1E·96 C3·22 C6·72 C7s·0 C8·141 C9·0 C10·301
	Benchmarkcrc32/go/4K 10000000   0.218 µs/op     # POLL·3 C1·154 C1E·104 C3·14 C6·63 C7s·0 C8·153 C9·0 C10·309
	Benchmarkcrc32/go/4K 10000000   0.219 µs/op     # POLL·1 C1·170 C1E·103 C3·25 C6·80 C7s·0 C8·177 C9·0 C10·328
	Benchmarksha1/py/4K 300000      4.553 µs/op     # POLL·1 C1·35 C1E·41 C3·14 C6·49 C7s·0 C8·95 C9·0 C10·94
	Benchmarksha1/py/4K 300000      4.459 µs/op     # POLL·2 C1·39 C1E·36 C3·19 C6·53 C7s·0 C8·127 C9·0 C10·92
	Benchmarksha1/py/4K 300000      4.492 µs/op     # POLL·2 C1·66 C1E·30 C3·15 C6·47 C7s·0 C8·96 C9·0 C10·62
	Benchmarksha1/py/4K 300000      4.550 µs/op     # POLL·1 C1·51 C1E·44 C3·10 C6·46 C7s·0 C8·92 C9·0 C10·93
	Benchmarksha1/py/4K 300000      4.518 µs/op     # POLL·3 C1·41 C1E·29 C3·18 C6·35 C7s·0 C8·81 C9·0 C10·78
	Benchmarksha1/go/4K 300000      4.312 µs/op     # POLL·0 C1·122 C1E·67 C3·24 C6·67 C7s·0 C8·131 C9·0 C10·190
	Benchmarksha1/go/4K 300000      4.383 µs/op     # POLL·2 C1·126 C1E·74 C3·17 C6·80 C7s·0 C8·123 C9·0 C10·182
	Benchmarksha1/go/4K 300000      4.387 µs/op     # POLL·2 C1·100 C1E·65 C3·27 C6·56 C7s·0 C8·127 C9·0 C10·186
	Benchmarksha1/go/4K 300000      4.328 µs/op     # POLL·1 C1·136 C1E·80 C3·14 C6·76 C7s·0 C8·113 C9·0 C10·179
	Benchmarksha1/go/4K 300000      4.337 µs/op     # POLL·1 C1·96 C1E·81 C3·21 C6·68 C7s·0 C8·132 C9·0 C10·191

Such raw output can be summarized with the help of benchstat - either
with Go[1] or Python[2] implementations:

	$ benchstat x.txt
	name         pystone/s
	pystone        288k ± 2%

	name         time/op
	crc32/py/4K  3.42µs ± 1%
	crc32/go/4K   218ns ± 1%
	sha1/py/4K   4.51µs ± 1%
	sha1/go/4K   4.35µs ± 1%

See http://navytux.spb.ru/~kirr/neo.html#results-and-discussion for some
discussion on SHA1 vs CRC32.

While at CPU topic, teach info/info-local to show related information
about node's CPU: available processors, frequency and idle governors.
Example of lines added:

	x/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info neotest@rio.kirr.nexedi.com:6
	...
	cpu:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
	cpu/[0-7]/freq: intel_pstate/powersave [1.60GHz - 3.90GHz]
	cpu/[0-7]/idle: intel_idle/menu: POLL·0/0 C1·1/1 C1E·10/20 C3·59/156 C6·80/300 # elat/tres µs
	WARNING: cpu: frequency not fixed - benchmark timings won't be stable
	WARNING: cpu: C-state exit-latency is max 80μs - benchmark timings won't be stable
	WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)

See http://navytux.spb.ru/~kirr/neo.html#measurements-stability to
understand why there are warnings in above example.

Some draft history related to this patch:

	lab.nexedi.com/kirr/neo/commit/cf1f7c24	X tcpu: Don't depend on running tests with cwd = .../go/neo/t/
	lab.nexedi.com/kirr/neo/commit/1e438610	fixup! X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/4af48245	X neotest: Also show target-latency for C-states
	lab.nexedi.com/kirr/neo/commit/2910cf56	X neotest: Prefer first part of FQDN for hostname
	lab.nexedi.com/kirr/neo/commit/c86ba1b0	X bench-cpu += crc32, adler32
	lab.nexedi.com/kirr/neo/commit/4ac3a550	X neotest: Don't use bc
	lab.nexedi.com/kirr/neo/commit/3918a997	X neotest: Don't assume we are invoked from the directory where neotest is
	lab.nexedi.com/kirr/neo/commit/9a266d11	X neotest/bench-cpu: Also benchmark sha1 for 2M; report size units as e.g. 4K not 4096B
	lab.nexedi.com/kirr/neo/commit/b6a830d8	X switch cpu benchmarks to go format
	lab.nexedi.com/kirr/neo/commit/4436b983	X neotest: Provide cpustat command so it is possible to cpustat something external
	lab.nexedi.com/kirr/neo/commit/b062b349	X microbenchmark CPU first
	lab.nexedi.com/kirr/neo/commit/a4a18b55	X first cut on C-state profiling
	lab.nexedi.com/kirr/neo/commit/ea1e0835	X found that cpuidle can be affecting latency a lot!

(*) see http://navytux.spb.ru/~kirr/neo.html#cpu-idle-c-states and
    http://navytux.spb.ru/~kirr/neo.html#appendix-ii-cpu-c-states for
    why this is important.

    Since being able to profile C-states can be generally useful, we
    expose such profiling with externally-visible `neotest cpustat` utility.

[1] https://godoc.org/golang.org/x/perf/cmd/benchstat
[2] https://lab.nexedi.com/kirr/pygolang/blob/master/golang/x/perf/benchlib.py
parent c12f2991
/tcpu
/tcpu_go
......@@ -72,6 +72,8 @@ EOF
. env.sh
pip install pygolang # for tcpu.py
mkdir -p src/lab.nexedi.com/kirr
pushd src/lab.nexedi.com/kirr
test -d neo || git clone -o kirr https://lab.nexedi.com/kirr/neo.git neo
......@@ -204,6 +206,23 @@ proginfo() {
which $prog >/dev/null 2>&1 && $prog "$@" || printf "%-16s: ø\n" "$prog"
}
# fkghz file - extract value from file (in KHz) and render it as GHz
fkghz() {
python -c "print '%.2fGHz' % (`cat $1` / 1E6)"
}
# xhostname - show short system host name
xhostname() {
# prefer first part of FQDN for misconfigured systems like
# fqdn=z6001.ivan.nexedi.com, hostname=z6001-COMP-2784
fqdn=`hostname --fqdn 2>/dev/null || :`
if test -n "$fqdn"; then
echo "$fqdn" |sed -e 's/\./ /' |awk '{print $1}'
else
hostname
fi
}
# show information about local system (os, hardware, versions, ...)
system_info() {
echo -ne "date:\t"; date --rfc-2822
......@@ -215,6 +234,93 @@ system_info() {
echo ")"
echo -ne "uname:\t"; uname -a
# cpu
echo -ne "cpu:\t"; grep "^model name" /proc/cpuinfo |head -1 |sed -e 's/model name\s*: //'
syscpu=/sys/devices/system/cpu
sysidle=$syscpu/cpuidle
cpuvabbrev() { # cpuvabbrev cpu0 cpu1 cpu2 ... cpuN -> cpu/[0-N]
test $# -le 1 && echo "$@" && return
min=""
max=""
while [ $# -ne 0 ]; do
v=$1
shift
n=${v#cpu}
test -z "$min" && min=$n && max=$n continue
if (( $n != $max + 1 )); then
die "cpuvabbrev: assert: nonconsecutive $max $n"
fi
max=$n
done
echo "cpu/[$min-$max]"
}
freqcpuv=() # [] of cpu
freqstr="" # text about cpufreq for cpus in ^^^
freqdump() {
test "${#freqcpuv[@]}" = 0 && return
echo "`cpuvabbrev ${freqcpuv[*]}`/freq: $freqstr"
freqcpuv=()
freqstr=""
}
idlecpuv=() # ----//---- for cpuidle
idlestr=""
idledump() {
test "${#idlecpuv[@]}" = 0 && return
echo "`cpuvabbrev ${idlecpuv[*]}`/idle: $idlestr"
idlecpuv=()
idlestr=""
}
freqstable=y
while read cpu; do
f="$cpu/cpufreq"
fmin=`fkghz $f/scaling_min_freq`
fmax=`fkghz $f/scaling_max_freq`
fs="`cat $f/scaling_driver`/`cat $f/scaling_governor` [$fmin - $fmax]"
if [ "$fs" != "$freqstr" ]; then
freqdump
freqstr="$fs"
fi
freqcpuv+=(`basename $cpu`)
test "$fmin" != "$fmax" && freqstable=n
done \
< <(ls -vd $syscpu/cpu[0-9]*)
freqdump
latmax=0
while read cpu; do
is="`cat $sysidle/current_driver`/`cat $sysidle/current_governor_ro`:"
while read state; do
is+=" "
lat=`cat $state/latency`
res=`cat $state/residency 2>/dev/null` || res="?" # added in linux 3.15
test "`cat $state/disable`" = "1" && is+="!" || latmax=$(($lat>$latmax?$lat:$latmax))
is+="`cat $state/name`·${lat}/${res}"
done \
< <(ls -vd $cpu/cpuidle/state[0-9]*)
is+=" # elat/tres µs"
if [ "$is" != "$idlestr" ]; then
idledump
idlestr="$is"
fi
idlecpuv+=(`basename $cpu`)
done \
< <(ls -vd $syscpu/cpu[0-9]*)
idledump
test "$freqstable" = y || echo "WARNING: cpu: frequency not fixed - benchmark timings won't be stable"
test "$latmax" -le 10 || {
echo "WARNING: cpu: C-state exit-latency is max ${latmax}μs - benchmark timings won't be stable"
echo "WARNING: cpu: (up to that might be adding to networked and IPC request-reply latency)"
}
printf "%-20s" "sw/python:"; proginfo python --version 2>&1 # https://bugs.python.org/issue18338
printf "%-20s" "sw/go:"; proginfo go version
printf "%-20s" "sw/sqlite:"; proginfo python -c \
......@@ -228,6 +334,97 @@ system_info() {
}
# ---- benchmarking ----
# cpustat ... - run ... and print CPU C-states statistic
cpustat() {
# XXX +cpufreq transition statistics (CPU_FREQ_STAT) ?
syscpu=/sys/devices/system/cpu
cpuv=( `ls -vd $syscpu/cpu[0-9]*` )
# XXX we assume cpuidle states are the same for all cpus and get list of them from cpu0
statev=( `ls -vd ${cpuv[0]}/cpuidle/state[0-9]* |xargs -n 1 basename` )
# get current [state]usage. usage for a state is summed across all cpus
statev_usage() {
usagev=()
for s in ${statev[*]}; do
#echo >&2 $s
susage=0
for u in `cat $syscpu/cpu[0-9]*/cpuidle/$s/usage`; do
#echo -e >&2 "\t$u"
((susage+=$u))
done
usagev+=($susage)
done
echo ${usagev[*]}
}
ustartv=( `statev_usage` )
#echo >&2 "--------"
#sleep 1
ret=0
out="$("$@" 2>&1)" || ret=$?
uendv=( `statev_usage` )
stat="#"
for ((i=0;i<${#statev[*]};i++)); do
s=${statev[$i]}
sname=`cat ${cpuv[0]}/cpuidle/$s/name`
du=$((${uendv[$i]} - ${ustartv[$i]}))
#stat+=" $sname(+$du)"
stat+=" $sname·$du"
#stat+=" $du·$sname"
done
if [ `echo "$out" | wc -l` -gt 1 ]; then
# multiline out - add another line
echo "$out"
echo "$stat"
else
# 1-line out - add stats at line tail
echo -n "$out"
echo -e "\t$stat"
fi
return $ret
}
Nrun=5 # repeat benchmarks N time
#profile=
profile=cpustat
# nrun ... - run ... $Nrun times serially
nrun() {
for i in `seq $Nrun`; do
$profile "$@"
done
}
# bench_cpu - microbenchmark CPU
bench_cpu() {
echo -ne "node:\t"; xhostname
echo "cluster:"
nrun sh -c "python -m test.pystone |tail -1 |sed -e \
\"s|^This machine benchmarks at \([0-9.]\+\) pystones/second$|Benchmarkpystone 1 \1 pystone/s|\""
sizev="4096" # 1024 $((2*1024*1024))
benchv="crc32 sha1" # adler32
for bench in $benchv; do
for size in $sizev; do
nrun tcpu.py $bench $size
nrun tcpu_go $bench $size
done
done
}
# command: benchmark local cpu
cmd_bench-cpu() {
bench_cpu
}
# command: print information about local node
cmd_info-local() {
......@@ -241,6 +438,11 @@ cmd_info() {
on $url ./neotest info-local
}
# utility: cpustat on running arbitrary command
cmd_cpustat() {
cpustat "$@"
}
# ---- main driver ----
usage() {
......@@ -260,11 +462,18 @@ The commands are:
test-py run NEO/py unit tests (part of test-local)
bench-cpu benchmark local cpu
deploy deploy NEO & needed software for tests to remote host
deploy-local deploy NEO & needed software for tests locally
info print information about a node
info-local print information about local deployment
Additional utility commands:
cpustat run a command and print CPU-related statistics
EOF
}
......@@ -278,9 +487,13 @@ test-local) f=(build );;
test-go) f=(build );;
test-py) f=( );;
bench-cpu) f=(build );;
info) f=( );;
info-local) f=( net );;
cpustat) f=( );;
-h)
usage
exit 0
......@@ -295,9 +508,14 @@ esac
for flag in ${f[*]}; do
case "$flag" in
build)
# make sure tcpu* is on PATH (because we could be invoked from another dir)
X=$(cd `dirname $0` && pwd)
export PATH=$X:$PATH
# rebuild go bits
# neo/py, wendelin.core, ... - must be pip install'ed - `neotest deploy` cares about that
go install -v lab.nexedi.com/kirr/neo/go/...
go build -o $X/tcpu_go $X/tcpu.go
;;
net)
......
// Copyright (C) 2017 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
// +build ignore
// tcpu - cpu-related benchmarks
package main
import (
"crypto/sha1"
"flag"
"fmt"
"hash"
"hash/adler32"
"hash/crc32"
"log"
"os"
"strconv"
"testing"
"time"
)
func dieusage() {
fmt.Fprintf(os.Stderr, "Usage: tcpu <benchmark> <block-size>\n")
os.Exit(1)
}
const unitv = "BKMGT" // (2^10)^i represents by corresponding char suffix
// fmtsize formats size in human readable form
func fmtsize(size int) string {
const order = 1<<10
norder := 0
for size != 0 && (size % order) == 0 && (norder + 1 < len(unitv)) {
size /= order
norder += 1
}
return fmt.Sprintf("%d%c", size, unitv[norder])
}
func prettyarg(arg string) string {
size, err := strconv.Atoi(arg)
if err != nil {
return arg
}
return fmtsize(size)
}
// benchit runs the benchmark for benchf
func benchit(benchname string, bencharg string, benchf func(*testing.B, string)) {
// FIXME testing.Benchmark does not allow to detect whether benchmark failed.
// (use log.Fatal, not {t,b}.Fatal as workaround)
r := testing.Benchmark(func (b *testing.B) {
benchf(b, bencharg)
})
fmt.Printf("Benchmark%s/go/%s %d\t%.3f µs/op\n", benchname, prettyarg(bencharg), r.N, float64(r.T) / float64(r.N) / float64(time.Microsecond))
}
func benchHash(b *testing.B, h hash.Hash, arg string) {
blksize, err := strconv.Atoi(arg)
if err != nil {
log.Fatal(err)
}
data := make([]byte, blksize)
b.ResetTimer()
for i := 0; i < b.N; i++ {
h.Write(data)
}
}
func BenchmarkAdler32(b *testing.B, arg string) { benchHash(b, adler32.New(), arg) }
func BenchmarkCrc32(b *testing.B, arg string) { benchHash(b, crc32.NewIEEE(), arg) }
func BenchmarkSha1(b *testing.B, arg string) { benchHash(b, sha1.New(), arg) }
var benchv = map[string]func(*testing.B, string) {
"adler32": BenchmarkAdler32,
"crc32": BenchmarkCrc32,
"sha1": BenchmarkSha1,
}
func main() {
flag.Parse() // so that test.* flags could be processed
argv := flag.Args()
if len(argv) != 2 {
dieusage()
}
benchname := argv[0]
bencharg := argv[1]
benchf, ok := benchv[benchname]
if !ok {
log.Fatalf("Unknown benchmark %q", benchname)
}
benchit(benchname, bencharg, benchf)
}
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (C) 2017-2018 Nexedi SA and Contributors.
# Kirill Smelkov <kirr@nexedi.com>
#
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
# option) any later version, as published by the Free Software Foundation.
#
# You can also Link and Combine this program with other software covered by
# the terms of any of the Free Software licenses or any of the Open Source
# Initiative approved licenses and Convey the resulting work. Corresponding
# source of such a combination shall include the source code for all other
# software used.
#
# This program is distributed WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See COPYING file for full licensing terms.
# See https://www.nexedi.com/licensing for rationale and options.
"""tcpu - cpu-related benchmarks"""
from __future__ import print_function
import sys
import hashlib
from zlib import crc32, adler32
from golang import testing
# adler32 in hashlib interface
class Adler32Hasher:
name = "adler32"
def __init__(self):
self.h = adler32('')
def update(self, data):
self.h = adler32(data, self.h)
def hexdigest(self):
return '%08x' % (self.h & 0xffffffff)
# crc32 in hashlib interface
class CRC32Hasher:
name = "crc32"
def __init__(self):
self.h = crc32('')
def update(self, data):
self.h = crc32(data, self.h)
def hexdigest(self):
return '%08x' % (self.h & 0xffffffff)
# fmtsize formats size in human readable form
_unitv = "BKMGT" # (2^10)^i represents by corresponding char suffix
def fmtsize(size):
order = 1<<10
norder = 0
while size and (size % order) == 0 and (norder + 1 < len(_unitv)):
size //= order
norder += 1
return "%d%s" % (size, _unitv[norder])
def prettyarg(arg):
try:
arg = int(arg)
except ValueError:
return arg # return as it is - e.g. "null-4K"
else:
return fmtsize(arg)
# benchit benchmarks benchf(bencharg)
def benchit(benchf, bencharg):
def _(b):
benchf(b, bencharg)
r = testing.benchmark(_)
benchname = benchf.__name__
if benchname.startswith('bench_'):
benchname = benchname[len('bench_'):]
print('Benchmark%s/py/%s %d\t%.3f µs/op' %
(benchname, prettyarg(bencharg), r.N, r.T * 1E6 / r.N))
def _bench_hasher(b, h, blksize):
blksize = int(blksize)
data = '\0'*blksize
b.reset_timer()
n = b.N
i = 0
while i < n:
h.update(data)
i += 1
def bench_adler32(b, blksize): _bench_hasher(b, Adler32Hasher(), blksize)
def bench_crc32(b, blksize): _bench_hasher(b, CRC32Hasher(), blksize)
def bench_sha1(b, blksize): _bench_hasher(b, hashlib.sha1(), blksize)
def main():
bench = sys.argv[1]
bencharg = sys.argv[2]
benchf = globals()['bench_%s' % bench]
benchit(benchf, bencharg)
if __name__ == '__main__':
main()
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment