py.bench: Rework output to match Go benchmarking format

Rework py.bench output to match output of Go benchmarking[1] so that go tools like benchstat & friends could be used to analyze and compare the timings. Before patch: ============================= test session starts ============================== platform linux2 -- Python 2.7.14, pytest-3.3.2.dev2+g88f2cc9b.d20171206, py-1.5.2, pluggy-0.6.0 rootdir: /home/kirr/src/wendelin/wendelin.core, inifile: collected 11 items pymod: bigfile/tests/bench_0virtmem.py bench_file_mmapread_hole 0.21 (0.39 0.22 0.21) bench_file_read_hole 0.24 (0.24 0.24 0.24) bench_file_readbig_hole 0.30 (0.30 0.31 0.31) bench_bigf_read_hole 0.44 (0.44 0.45 0.44) bench_file_mmapwrite0 0.13 (0.13 0.13 0.13) bench_file_write55 0.08 (0.08 0.08 0.08) bench_bigf_writeff 0.47 (0.47 0.48 0.47) bench_file_mmapread 0.22 (0.22 0.22 0.22) bench_file_read 0.25 (0.25 0.25 0.26) bench_file_readbig 0.31 (0.31 0.31 0.31) bench_bigf_read 0.44 (0.45 0.44 0.44) ========================== 11 passed in 12.92 seconds ========================== After patch: ============================= test session starts ============================== platform linux2 -- Python 2.7.14, pytest-3.3.2.dev2+g88f2cc9b.d20171206, py-1.5.2, pluggy-0.6.0 rootdir: /home/kirr/src/wendelin/wendelin.core, inifile: collected 11 items pymod: bigfile/tests/bench_0virtmem.py Benchmarkfile_mmapread_hole 1 385839.939 µs/op Benchmarkfile_mmapread_hole 1 219214.916 µs/op Benchmarkfile_mmapread_hole 1 210209.131 µs/op Benchmarkfile_read_hole 1 238974.094 µs/op Benchmarkfile_read_hole 1 237294.197 µs/op Benchmarkfile_read_hole 1 238043.070 µs/op Benchmarkfile_readbig_hole 1 301330.090 µs/op Benchmarkfile_readbig_hole 1 301767.111 µs/op Benchmarkfile_readbig_hole 1 301135.063 µs/op Benchmarkbigf_read_hole 1 434718.132 µs/op Benchmarkbigf_read_hole 1 435019.970 µs/op Benchmarkbigf_read_hole 1 434729.099 µs/op Benchmarkfile_mmapwrite0 1 126471.996 µs/op Benchmarkfile_mmapwrite0 1 125886.917 µs/op Benchmarkfile_mmapwrite0 1 125730.038 µs/op Benchmarkfile_write55 1 86760.044 µs/op Benchmarkfile_write55 1 87507.010 µs/op Benchmarkfile_write55 1 87735.891 µs/op Benchmarkbigf_writeff 1 448369.980 µs/op Benchmarkbigf_writeff 1 448238.850 µs/op Benchmarkbigf_writeff 1 447322.845 µs/op Benchmarkfile_mmapread 1 207049.131 µs/op Benchmarkfile_mmapread 1 207813.978 µs/op Benchmarkfile_mmapread 1 210857.868 µs/op Benchmarkfile_read 1 238364.935 µs/op Benchmarkfile_read 1 236908.913 µs/op Benchmarkfile_read 1 238602.161 µs/op Benchmarkfile_readbig 1 303429.842 µs/op Benchmarkfile_readbig 1 302191.973 µs/op Benchmarkfile_readbig 1 304115.057 µs/op Benchmarkbigf_read 1 435079.098 µs/op Benchmarkbigf_read 1 434193.850 µs/op Benchmarkbigf_read 1 434813.976 µs/op ========================== 11 passed in 12.54 seconds ========================== benchstat of new output: name µs/op file_mmapread_hole 272k ±42% file_read_hole 238k ± 0% file_readbig_hole 301k ± 0% bigf_read_hole 435k ± 0% file_mmapwrite0 126k ± 0% file_write55 87.3k ± 1% bigf_writeff 448k ± 0% file_mmapread 209k ± 1% file_read 238k ± 0% file_readbig 303k ± 0% bigf_read 435k ± 0% -------- Not only the output format is reworked here, but also b fixture is added to BenchPlugin so that functions which accept it as arg, can run the benchmark b.N times, can stop/restart timer etc - similar to https://golang.org/pkg/testing/#B . If a bench_* func does not accept b, B is still created, but b.N is assumed to be always 1. The first benchmark which actually uses b will come in the next patch. [1] https://github.com/golang/proposal/blob/master/design/14313-benchmark-format.md

py.bench: Rework output to match Go benchmarking format
Rework py.bench output to match output of Go benchmarking[1] so that go tools like benchstat & friends could be used to analyze and compare the timings. Before patch: ============================= test session starts ============================== platform linux2 -- Python 2.7.14, pytest-3.3.2.dev2+g88f2cc9b.d20171206, py-1.5.2, pluggy-0.6.0 rootdir: /home/kirr/src/wendelin/wendelin.core, inifile: collected 11 items pymod: bigfile/tests/bench_0virtmem.py bench_file_mmapread_hole 0.21 (0.39 0.22 0.21) bench_file_read_hole 0.24 (0.24 0.24 0.24) bench_file_readbig_hole 0.30 (0.30 0.31 0.31) bench_bigf_read_hole 0.44 (0.44 0.45 0.44) bench_file_mmapwrite0 0.13 (0.13 0.13 0.13) bench_file_write55 0.08 (0.08 0.08 0.08) bench_bigf_writeff 0.47 (0.47 0.48 0.47) bench_file_mmapread 0.22 (0.22 0.22 0.22) bench_file_read 0.25 (0.25 0.25 0.26) bench_file_readbig 0.31 (0.31 0.31 0.31) bench_bigf_read 0.44 (0.45 0.44 0.44) ========================== 11 passed in 12.92 seconds ========================== After patch: ============================= test session starts ============================== platform linux2 -- Python 2.7.14, pytest-3.3.2.dev2+g88f2cc9b.d20171206, py-1.5.2, pluggy-0.6.0 rootdir: /home/kirr/src/wendelin/wendelin.core, inifile: collected 11 items pymod: bigfile/tests/bench_0virtmem.py Benchmarkfile_mmapread_hole 1 385839.939 µs/op Benchmarkfile_mmapread_hole 1 219214.916 µs/op Benchmarkfile_mmapread_hole 1 210209.131 µs/op Benchmarkfile_read_hole 1 238974.094 µs/op Benchmarkfile_read_hole 1 237294.197 µs/op Benchmarkfile_read_hole 1 238043.070 µs/op Benchmarkfile_readbig_hole 1 301330.090 µs/op Benchmarkfile_readbig_hole 1 301767.111 µs/op Benchmarkfile_readbig_hole 1 301135.063 µs/op Benchmarkbigf_read_hole 1 434718.132 µs/op Benchmarkbigf_read_hole 1 435019.970 µs/op Benchmarkbigf_read_hole 1 434729.099 µs/op Benchmarkfile_mmapwrite0 1 126471.996 µs/op Benchmarkfile_mmapwrite0 1 125886.917 µs/op Benchmarkfile_mmapwrite0 1 125730.038 µs/op Benchmarkfile_write55 1 86760.044 µs/op Benchmarkfile_write55 1 87507.010 µs/op Benchmarkfile_write55 1 87735.891 µs/op Benchmarkbigf_writeff 1 448369.980 µs/op Benchmarkbigf_writeff 1 448238.850 µs/op Benchmarkbigf_writeff 1 447322.845 µs/op Benchmarkfile_mmapread 1 207049.131 µs/op Benchmarkfile_mmapread 1 207813.978 µs/op Benchmarkfile_mmapread 1 210857.868 µs/op Benchmarkfile_read 1 238364.935 µs/op Benchmarkfile_read 1 236908.913 µs/op Benchmarkfile_read 1 238602.161 µs/op Benchmarkfile_readbig 1 303429.842 µs/op Benchmarkfile_readbig 1 302191.973 µs/op Benchmarkfile_readbig 1 304115.057 µs/op Benchmarkbigf_read 1 435079.098 µs/op Benchmarkbigf_read 1 434193.850 µs/op Benchmarkbigf_read 1 434813.976 µs/op ========================== 11 passed in 12.54 seconds ========================== benchstat of new output: name µs/op file_mmapread_hole 272k ±42% file_read_hole 238k ± 0% file_readbig_hole 301k ± 0% bigf_read_hole 435k ± 0% file_mmapwrite0 126k ± 0% file_write55 87.3k ± 1% bigf_writeff 448k ± 0% file_mmapread 209k ± 1% file_read 238k ± 0% file_readbig 303k ± 0% bigf_read 435k ± 0% -------- Not only the output format is reworked here, but also b fixture is added to BenchPlugin so that functions which accept it as arg, can run the benchmark b.N times, can stop/restart timer etc - similar to https://golang.org/pkg/testing/#B . If a bench_* func does not accept b, B is still created, but b.N is assumed to be always 1. The first benchmark which actually uses b will come in the next patch. [1] https://github.com/golang/proposal/blob/master/design/14313-benchmark-format.md
51f252d4 · Kirill Smelkov · db1f1455 · 51f252d4
Commit 51f252d4 authored Dec 06, 2017 by Kirill Smelkov
Hide whitespace changes
Inline Side-by-side

Showing with 99 additions and 21 deletions

t/py.bench t/py.bench +99 -21

No files found.
--- a/t/py.bench
+++ b/t/py.bench
 #!/usr/bin/env python
+# -*- coding: utf-8 -*-
 # benchmarking via py.test
 # Copyright (C) 2014-2015  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
@@ -26,6 +27,7 @@ import pytest
 import _pytest.runner
 from _pytest.terminal import TerminalReporter as _pytest_TerminalReporter
 from time import time
+from math import ceil, log10
 from py.process import ForkedFunc

 import sys
@@ -52,6 +54,36 @@ class XForkedFunc(ForkedFunc):



+# benchmarking timer/request passed to benchmarks as fixture
+# similar to https://golang.org/pkg/testing/#B
+class B:
+
+    def __init__(self):
+        self.N = 1              # default when func does not accept `b` arg
+        self._t_start = None    # t of timer started; None if timer is currently stopped
+        self.reset_timer()
+
+    def reset_timer(self):
+        self._t_total = 0.
+
+    def start_timer(self):
+        if self._t_start is not None:
+            return
+
+        self._t_start = time()
+
+    def stop_timer(self):
+        if self._t_start is None:
+            return
+
+        t = time()
+        self._t_total += t - self._t_start
+        self._t_start = None
+
+    def total_time(self):
+        return self._t_total
+
+
 # plugin to collect & run benchmarks
 class BenchPlugin:

@@ -72,20 +104,57 @@ class BenchPlugin:
        g.addoption('--dontfork',  action='store_true',      dest='dontfork', default=False,
                    help="run all benchmarks in the same process")

+    # create b for every item to run and make it available to b fixture via item.request
+    def pytest_runtest_setup(self, item):
+        b = B()
+        b._used_by_func = False
+        item._request._bench_b = b
+
+    # make created b be available as `b` func arg
+    @pytest.fixture(scope="function")
+    def b(self, request):
+        """Provides access to benchmarking timer"""
+        # NOTE here request is subrequest of item._request in pytest_runtest_setup
+        b = request._parent_request._bench_b
+        b._used_by_func = True
+        return b

    # run benchmark several times, and get best timing
    # each benchmark run is executed in separate process (if not told otherwise)
    def pytest_runtest_call(self, item):
+        b = item._request._bench_b
        def run():
-            tstart = time()
-            item.runtest()
-            tend = time()
-            return tend-tstart
-
-        tv = []
+            b.N = 0
+            t = 0.
+            ttarget = 1.
+            while t < (ttarget * 0.9):
+                if b.N == 0:
+                    b.N = 1
+                else:
+                    n = b.N * (ttarget / t)     # exact how to adjust b.N to reach ttarget
+                    order = int(log10(n))       # n = k·10^order, k ∈ [1,10)
+                    k = float(n) / (10**order)
+                    k = ceil(k)                 # lift up k to nearest int
+                    b.N = int(k * 10**order)    # b.N = int([1,10))·10^order
+
+                b.reset_timer()
+                b.start_timer()
+                item.runtest()
+                b.stop_timer()
+                t = b.total_time()
+
+                #print((b.N, t))
+
+                # break if func does not accept b as arg
+                if not b._used_by_func:
+                    break
+
+            return (b.N, t)
+
+        rv = []
        for i in xrange(item.config.option.benchruns):
            if item.config.option.dontfork:
-                t = run()
+                r = run()

            else:
                runf = XForkedFunc(run)
@@ -97,10 +166,11 @@ class BenchPlugin:
                    print(result.err, file=sys.stderr)  # XXX vs runf doesn't capture stderr
                    1/0 # TODO handle properly

-                t = result.retval
+                r = result.retval

-            tv.append(t)
-        return tv
+            rv.append(r)
+        #print ('RET', rv)
+        return rv


    # set benchmarking time in report, if run ok
@@ -110,7 +180,7 @@ class BenchPlugin:
            # in pytest3 there is no way to mark pytest_runtest_call as 'firstresult'
            # let's emulate firstresult logic here
            assert len(call.result) == 1
-            report.bench_times = call.result[0]
+            report.bench_resv = call.result[0]
        return report


@@ -136,7 +206,12 @@ def max0(seq):

 # benchname(nodeid) returns name of a benchmark from a function nodeid
 def benchname(nodeid):
-    return nodeid.split("::", 1)[1] # everything after fspath
+    pyname = nodeid.split("::", 1)[1] # everything after fspath
+    # replace 'bench_' with 'Benchmark' prefix so that benchmark output matches
+    # golang format
+    if pyname.startswith('bench_'):
+        pyname = pyname[len('bench_'):]
+    return 'Benchmark' + pyname

 # Adjujsted terminal reporting to benchmarking needs
 class XTerminalReporter(_pytest_TerminalReporter):
@@ -170,16 +245,19 @@ class XTerminalReporter(_pytest_TerminalReporter):
            # passed setup/teardown
            return

-        # TODO ralign timing
-        self._tw.line()
-        self._tw.write('%-*s\t' % (self.benchname_maxlen, benchname(report.nodeid)))
-        if report.passed:
-            self._tw.write('%.2f' % min(report.bench_times))
-            if not self.config.option.quiet:
-                self._tw.write('  (%s)' %
-                        ' '.join('%.2f' % _ for _ in report.bench_times))
-        else:
+        def printname():
+            self._tw.line()
+            self._tw.write('%-*s\t' % (self.benchname_maxlen, benchname(report.nodeid)))
+
+        if not report.passed:
+            printname()
            self._tw.write('[%s]' % word, **report_markup(report))
+            return
+
+        # TODO ralign timing
+        for niter, t in report.bench_resv:
+            printname()
+            self._tw.write('%d\t%.3f µs/op' % (niter, t * 1E6 / niter))


 # there is no way to override it otherwise - it is hardcoded in