Commits · 2928fd63f4d0637d353585f196a26780ba431139 · Levin Zimmermann / neoppod

13 Oct, 2017 13 commits
- X time neo1* · 2928fd63
  Test authored Oct 13, 2017
  
  2928fd63
- X z600* timings · ee09f30c
  Kirill Smelkov authored Oct 13, 2017
  
  ee09f30c
- . · 8c7b203b
  Test authored Oct 13, 2017
  
  8c7b203b
- . · c07103bb
  Kirill Smelkov authored Oct 13, 2017
  
  c07103bb
- X benchify rest of bench-cluster · 046370db
  Test authored Oct 13, 2017
  
  046370db
- X bench-remote / tcp : std benchmark output · 3a1bdf4a
  Test authored Oct 13, 2017
  
  3a1bdf4a
- X bench-remote / ping += std bench output · 9450b6db
  Test authored Oct 13, 2017
  
  9450b6db
- . · eb2ce7b7
  Test authored Oct 13, 2017
  
  eb2ce7b7
- X bench-local + zhash: Add output in std bench format · 2d13818e
  Kirill Smelkov authored Oct 13, 2017
  
  2d13818e
- X bench_disk: Add std bench format · cd2cd093
  Kirill Smelkov authored Oct 13, 2017
  
  cd2cd093
- . · 98c8aaac
  Kirill Smelkov authored Oct 13, 2017
  
  98c8aaac
- X switch cpu benchmarks to go format · b6a830d8
  Kirill Smelkov authored Oct 13, 2017
  
  b6a830d8
- X neo/server: benchmark parallel client requests over same link · 94d79c29
  Kirill Smelkov authored Oct 13, 2017
  
  94d79c29
12 Oct, 2017 4 commits

X neo1 - neo2 timings; delta tgroX, txc · 9cab8f1f

Test authored Oct 13, 2017

compared to tgro5:

tgro0   py -> noise (ZEO, NEO/pylite, NEO/pysql, partly neo/go)
tgro10  py=no-noise, tpy~=tgro5; tgo serial~same, +prefetch more noise (?) 40|65
tgro15  lat_tcp 1B 49->45 sometime (consistently) otherwise more-or-less same
tgro20  lat_tcp 4K 170µs->167µs (c)
tgro25  lat_tcp 4K 170µs->167µs (c), lat_tcp 4K 175 -> 171µs (go)
        ZEOv a bit
        go-go + prefetch !sha1 -> noise + higher (~65µs -> 70 - 110µs)

txc40µs4f seems to be worse than txc200µs4f

9cab8f1f

X time neo1 - neo2; δ: gro_flush_timeout=5µs, txc=20μs/0f/0μs-irq/0f-irq · 16f249f1

Test authored Oct 12, 2017

Link and TCP* latencies are ~ same and stable.

Compared to b288e62f py timings stabilize:

ZEO		~580-1045μs	-> 585µs
NEO/pylite	~600-700μs	-> ~550-570µs	(Cpy)
NEO/pylite	~525-580μs	-> ~450µs	(Cgo)
NEO/pysql	~820-930μs	-> ~840µs	(Cpy)
NEO/pysql	~740-800μs	-> ~740µs	(Cgo)

Go timings get a bit worse:

NEO/go		~360µs		-> ~380µs	(Cpy)
NEO/go		~160µs		-> ~165-170µs	(Cgo)
NEO/go-nosha1	~140µs		-> ~150µs

Compared to 15a9ccef go+pretech128 timings stabilize:

go-go+prefetch128		~65-160µs	-> ~40-45µs, 60µs(x1)
go-go+prefetch128 (!sha1)	~60-150µs	-> ~60µs,    40µs(x1)

16f249f1

X show gro_flush_timeout + friends · 68d5b015
Kirill Smelkov authored Oct 12, 2017

68d5b015
. · 86ab45eb
Kirill Smelkov authored Oct 12, 2017

86ab45eb

11 Oct, 2017 1 commit

X reran neo1 - neo2 after reloading r8169 driver on both machines · 15a9ccef

Test authored Oct 11, 2017

Compared to run from Oct10 txc settings were not changed and they are now
200μs/4f/0μs-irq/0f-irq on both machines instead of 200μs/4f/0μs-irq/0f-irq on
neo1 and 200μs/0f/0μs-irq/0f-irq on neo2.

The noise about py runs remains.

Go-go serial time remains the same ~155μs without much noise, but Go-go time
with +prefetch128 gets worse:

NEO/go Cgo +prefetch128

        ~45μs   -> 65 - 160μs

Probably it makes sense to tune txc too...

15a9ccef

10 Oct, 2017 8 commits

X neo1 - neo2 timings with turned on TSO, SG, TX · b288e62f

Test authored Oct 10, 2017

See previous commit for details about why latency is bad for TCP payload > MSS
without TSO.

Compared to latest neo1-neo2 timings from Oct05 (C-states disabled, no
rx-delay) it improves:

TCP1             ~45μs  ->   ~45μs
TCP1472         ~430μs			# TCP lat. anaomaly
TCP1400                 ->  ~120μs	# finally
TCP1500                 ->  ~130μs	# fixed
TCP4096         ~285μs  ->  ~170μs	# !

ZEO             ~670μs  ->  ~580-1045μs (?)
NEO/pylite      ~605μs  ->  ~600-700μs  (?)     (Cpy)
NEO/pylite      ~505μs  ->  ~525-580μs  (?)     (Cgo)
NEO/pysql       ~900μs  ->  ~820-930μs  (?)     (Cpy)
NEO/pysql       ~780μs  ->  ~740-800μs  (?)     (Cgo)
NEO/go          ~430μs  ->  ~360μs              (Cpy)	# <-- NOTE
NEO/go          ~210μs  ->  ~160μs              (Cgo)	# <-- NOTE
NEO/go-nosha1   ~190μs  ->  ~140μs			# <-- NOTE

not sure about noise in pure py runs but given raw tcp latency absolutely
improves this should be a good change to make.

b288e62f

X neotest: Show NIC features and emit warning if !TSO · 4c815af9

Kirill Smelkov authored Oct 10, 2017

On neo1 - neo2 without TSO latency becomes very poor in lat_tcp when
payload size becomes greater TCP MSS (lat_tcp -m 1448 ~ 130μs; lat_tcp
-m 1449 ~ 500μs and more)

4c815af9

X neotest: Provide cpustat command so it is possible to cpustat something external · 4436b983
Kirill Smelkov authored Oct 10, 2017
```
e.g.

	neotest cpustat ls
```
4436b983
. · 474c17bc
Kirill Smelkov authored Oct 10, 2017

474c17bc

X time for z6001 (localhost) and z6001-z600 · b6d6344d

Kirill Smelkov authored Oct 10, 2017

Same a before with added information.
In particular z6001-z600 shows there is no TCP RR step-wise 400μs increase
while going 1400B -> 1500B as it is currently the case on RTL.

b6d6344d

. · 50b0d130
Kirill Smelkov authored Oct 10, 2017

50b0d130

X neo1 - neo2 timings · c5df0f47

Test authored Oct 10, 2017

Same as before but with more information and adjusted ping and TCP RR.

In particular TCP RR shows:

TCPRR1400	~ 120μs
TCPRR1500	~ 430μs

The time when this increases step-wise is when TCP packet crosses fitting 1 ethernet frame.
It does not happen on z600* with their current settings.

c5df0f47

X neotest: Adjust ping and TCP RR sizes to fit 1 Ethernet frame, etc... · 659ce938
Kirill Smelkov authored Oct 10, 2017

659ce938

09 Oct, 2017 6 commits

X neo1 localhost timings Oct09 · af41382e

Test authored Oct 09, 2017

Same as Oct05 5μs C-latency with added:

- pystones
- t(sha1) py & go
- zhash.go working in prefetch mode
- 16 clients loading 1 server at the same time

af41382e

X deco localhost timings Oct09 · 1e9f23cb

Kirill Smelkov authored Oct 09, 2017

Same as Oct05 5μs C-latency with added:

- pystones
- t(sha1) py & go
- zhash.go working in prefetch mode
- 16 clients loading 1 server at the same time

1e9f23cb

. · 0d1206b9
Kirill Smelkov authored Oct 09, 2017

0d1206b9

X neo/connection: Fix race between link.shutdown() and conn.lightClose() · 8eac771c

Kirill Smelkov authored Oct 09, 2017

The scenario could be: nl.shutdown sets nl.connTab=nil and
then iterates connTab snapshot taken under nl.connMu lock. If so
this activity (which calls e.g. Conn.shutdown) could be running with
conn.lightClose -> conn.release() in parallel.

Bug triggers under neotest in zhash.go +prefetch128:

	crc32:bf9deea9   ; oid=0..8499  nread=34134938  t=9.010696705s (1.060081ms / object)  x=zhash.go +prefetch128
	==================
	WARNING: DATA RACE
	Write at 0x00c424b55ed0 by goroutine 147:
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).reinit()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:258 +0x102
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).release()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:240 +0x38
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).lightClose()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1633 +0xa4
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).Ask1()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:1719 +0xde
	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client)._Load()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:466 +0x5d3
	  lab.nexedi.com/kirr/neo/go/neo/client.(*Client).Load()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/client/client.go:392 +0xa8
	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).loadRCE()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:297 +0xfb

	Previous write at 0x00c424b55ed0 by goroutine 75:
	  sync/atomic.AddInt32()
	      /home/kirr/src/tools/go/go/src/runtime/race_amd64.s:269 +0xb
	  sync.(*Mutex).Unlock()
	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
	  sync.(*Once).Do()
	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f

	Goroutine 147 (running) created at:
	  lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).Prefetch()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:198 +0xbb
	  main.zhash.func2()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:134 +0x86
	  main.zhash.func4()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:157 +0x8f
	  main.zhash()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:193 +0xc7b
	  main.main()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/zhash.go:107 +0x48d

	Goroutine 75 (finished) created at:
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:391 +0x116
	  sync.(*Once).Do()
	      /home/kirr/src/tools/go/go/src/sync/once.go:44 +0xe1
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:383 +0x99
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).serveRecv()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:748 +0x651
	==================

and without -race zhash.go instrumentation it was showing itself as
"panic: unlock of unlocked mutex" in

	  sync.(*Mutex).Unlock()
	      /home/kirr/src/tools/go/go/src/sync/mutex.go:182 +0x54
	  sync.(*Once).Do()
	      /home/kirr/src/tools/go/go/src/sync/once.go:46 +0x95
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdownRX()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:471 +0xb1
	  lab.nexedi.com/kirr/neo/go/neo.(*Conn).shutdown()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:442 +0x62
	  lab.nexedi.com/kirr/neo/go/neo.(*NodeLink).shutdown.func1.1()
	      /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/connection.go:408 +0x18f

8eac771c

X zodb: Disable Buf tests under race · e4a100de

Kirill Smelkov authored Oct 09, 2017

It was failing in a strange way, e.g.

panic: Buf: refcnt < 0
fatal error: panic holding locks

goroutine 19 [running]:
runtime.throw(0x5f9d65, 0x13)
        /home/kirr/src/tools/go/go/src/runtime/panic.go:605 +0x95 fp=0xc42003cc20 sp=0xc42003cc00 pc=0x452935
panic(0x5c21c0, 0x611f60)
        /home/kirr/src/tools/go/go/src/runtime/panic.go:451 +0x585 fp=0xc42003ccc8 sp=0xc42003cc20 pc=0x4524c5
lab.nexedi.com/kirr/neo/go/zodb.(*Buf).Release(0xc4200900e0)
        /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer.go:112 +0x161 fp=0xc42003ccf8 sp=0xc42003ccc8 pc=0x59ebc1
lab.nexedi.com/kirr/neo/go/zodb.TestBufAllocFree(0xc4200a80f0)
        /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/buffer_test.go:111 +0x6ae fp=0xc42003cf78 sp=0xc42003ccf8 pc=0x5a137e
testing.tRunner(0xc4200a80f0, 0x600930)
        /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d fp=0xc42003cfd0 sp=0xc42003cf78 pc=0x51688d
runtime.goexit()
        /home/kirr/src/tools/go/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42003cfd8 sp=0xc42003cfd0 pc=0x4835f1
created by testing.(*T).Run
        /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x569

goroutine 1 [chan receive]:
testing.(*T).Run(0xc4200a8000, 0x5f9180, 0x10, 0x600930, 0xc420051c50)
        /home/kirr/src/tools/go/go/src/testing/testing.go:790 +0x59b
testing.runTests.func1(0xc4200a8000)
        /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa8
testing.tRunner(0xc4200a8000, 0xc420051d90)
        /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16d
testing.runTests(0xc420090040, 0x6ed4a0, 0x7, 0x7, 0x4159b0)
        /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x522
testing.(*M).Run(0xc420051f20, 0x1d7ab28)
        /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x207
main.main()
        lab.nexedi.com/kirr/neo/go/zodb/_test/_testmain.go:56 +0x1d4
exit status 2
FAIL    lab.nexedi.com/kirr/neo/go/zodb 0.006s

becuase upon release object was not put back into pool, then we were
getting another object from pool, complainig but still doing
buf.Release.

e4a100de

X tracing: Silence race-detector about probe.Detach · bab79665

Kirill Smelkov authored Oct 09, 2017

Race-detector does not know Probe.Detach works under world stopped and
that this way it cannot break consistency of probes list attached to a
trace event - on event signalling either a probe will be run or not run
at all.

And we do not mind that e.g. while Detach was in progress a probe was
was read from traceevent list and deceded to be run and the probe
function was actually called just after Detach finished.

For this reason tell race-detector to not take into account all memory
read/write that are performed while the world is stopped.

If we do not it complains e.g. this way:

    ==================
    WARNING: DATA RACE
    Read at 0x00c42000d760 by goroutine 7:
      lab.nexedi.com/kirr/neo/go/zodb/storage._traceCacheGCFinish_run()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:265 +0x81
      lab.nexedi.com/kirr/neo/go/zodb/storage.traceCacheGCFinish()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/ztrace.go:22 +0x63
      lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gc()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:497 +0x62c
      lab.nexedi.com/kirr/neo/go/zodb/storage.(*Cache).gcmain()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:478 +0x4c

    Previous write at 0x00c42000d760 by goroutine 6:
      lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*Probe).Detach()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:319 +0x103
      lab.nexedi.com/kirr/neo/go/xcommon/tracing.(*ProbeGroup).Done()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/xcommon/tracing/tracing.go:344 +0xa5
      lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:576 +0x7f94
      testing.tRunner()
          /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c

    Goroutine 7 (running) created at:
      lab.nexedi.com/kirr/neo/go/zodb/storage.NewCache()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache.go:129 +0x227
      lab.nexedi.com/kirr/neo/go/zodb/storage.TestCache()
          /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/cache_test.go:165 +0x7b1
      testing.tRunner()
          /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c

    Goroutine 6 (finished) created at:
      testing.(*T).Run()
          /home/kirr/src/tools/go/go/src/testing/testing.go:789 +0x568
      testing.runTests.func1()
          /home/kirr/src/tools/go/go/src/testing/testing.go:1004 +0xa7
      testing.tRunner()
          /home/kirr/src/tools/go/go/src/testing/testing.go:746 +0x16c
      testing.runTests()
          /home/kirr/src/tools/go/go/src/testing/testing.go:1002 +0x521
      testing.(*M).Run()
          /home/kirr/src/tools/go/go/src/testing/testing.go:921 +0x206
      main.main()
          lab.nexedi.com/kirr/neo/go/zodb/storage/_test/_testmain.go:44 +0x1d3
    ==================

bab79665

06 Oct, 2017 5 commits
- . · 29adac06
  Kirill Smelkov authored Oct 06, 2017
  
  29adac06
- . · 0fb48b65
  Kirill Smelkov authored Oct 06, 2017
  
  0fb48b65
- X microbenchmark CPU first · b062b349
  Kirill Smelkov authored Oct 06, 2017
  
  b062b349
- . · f7e23ed8
  Kirill Smelkov authored Oct 06, 2017
  
  f7e23ed8
- . · 9ccda1cc
  Kirill Smelkov authored Oct 06, 2017
  
  9ccda1cc
05 Oct, 2017 3 commits
- X time z6001 - z600 + prefetch · b0cb5454
  Kirill Smelkov authored Oct 05, 2017
```
Max < 3x (nprefetch=8) in go-go case.
```
  b0cb5454
- X time z6001 (localhost) +prefetch · 6be37f4e
  Kirill Smelkov authored Oct 05, 2017
```
improves max 2x becase probably of go scheduler.
```
  6be37f4e
- . · 41d2b5b4
  Kirill Smelkov authored Oct 05, 2017
  
  41d2b5b4