Commits · fff63c2448109cb5a889f99dc2ab330f31d53213 · Kirill Smelkov / go

An error occurred fetching the project authors.

25 Mar, 2014 4 commits

runtime: WriteHeapDump dumps the heap to a file. · fff63c24

Keith Randall authored 10 years ago

See http://golang.org/s/go13heapdump for the file format.

LGTM=rsc
R=rsc, bradfitz, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/37540043

fff63c24

runtime: redo stack map entries to avoid false retention · 1b45cc45

Keith Randall authored 10 years ago

Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword

If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface

That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity.  We can avoid
following the contained pointer in those cases.  It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.

Slice feature turned off until issue 7564 is fixed.

Update #7549

LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043

1b45cc45

runtime: accurately record whether heap memory is reserved · 4ebfa831

Ian Lance Taylor authored 10 years ago

The existing code did not have a clear notion of whether
memory has been actually reserved.  It checked based on
whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the
requested address, but it confused the requested address and
the returned address.

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews, michael.hudson
https://golang.org/cl/79610043

4ebfa831

runtime: change nproc local variable to uint32 · 8de04c78

Ian Lance Taylor authored 10 years ago

The nproc and ndone fields are uint32.  This makes the type
consistent.

LGTM=minux.ma
R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/79340044

8de04c78

14 Mar, 2014 3 commits

runtime: fix another race in bgsweep · fed5428c

Dmitriy Vyukov authored 10 years ago

It's possible that bgsweep constantly does not catch up for some reason,
in this case runfinq was not woken at all.

R=rsc
CC=golang-codereviews
https://golang.org/cl/75940043

fed5428c

runtime: fix spans corruption · 8d321625

Dmitriy Vyukov authored 10 years ago

The problem was that spans end up in wrong lists after split
(e.g. in h->busy instead of h->central->empty).
Also the span can be non-swept before split,
I don't know what it can cause, but it's safer to operate on swept spans.
Fixes #7544.

R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/76160043

8d321625

runtime: fix a race in bgsweep · 0da73b9f

Dmitriy Vyukov authored 10 years ago

See the comment for description.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75670044

0da73b9f

12 Mar, 2014 1 commit

runtime: fix empty string handling in garbage collector · 54c901cd

Russ Cox authored 10 years ago

The garbage collector uses type information to guide the
traversal of the heap. If it sees a field that should be a string,
it marks the object pointed at by the string data pointer as
visited but does not bother to look at the data, because
strings contain bytes, not pointers.

If you save s[len(s):] somewhere, though, the string data pointer
actually points just beyond the string data; if the string data
were exactly the size of an allocated block, the string data
pointer would actually point at the next block. It is incorrect
to mark that next block as visited and not bother to look at
the data, because the next block may be some other type
entirely.

The fix is to ignore strings with zero length during collection:
they are empty and can never become non-empty: the base
pointer will never be used again. The handling of slices already
does this (but using cap instead of len).

This was not a bug in Go 1.2, because until January all string
allocations included a trailing NUL byte not included in the
length, so s[len(s):] still pointed inside the string allocation
(at the NUL).

This bug was causing the crashes in test/run.go. Specifically,
the parsing of a regexp in package regexp/syntax allocated a
[]syntax.Inst with rounded size 1152 bytes. In fact it
allocated many such slices, because during the processing of
test/index2.go it creates thousands of regexps that are all
approximately the same complexity. That takes a long time, and
test/run works on other tests in other goroutines. One such
other test is chan/perm.go, which uses an 1152-byte source
file. test/run reads that file into a []byte and then calls
strings.Split(string(src), "\n"). The string(src) creates an
1152-byte string - and there's a very good chance of it
landing next to one of the many many regexp slices already
allocated - and then because the file ends in a \n,
strings.Split records the tail empty string as the final
element in the slice. A garbage collection happens at this
point, the collection finds that string before encountering
the []syntax.Inst data it now inadvertently points to, and the
[]syntax.Inst data is not scanned for the pointers that it
contains. Each syntax.Inst contains a []rune, those are
missed, and the backing rune arrays are freed for reuse. When
the regexp is later executed, the runes being searched for are
no longer runes at all, and there is no match, even on text
that should match.

On 64-bit machines the pointer in the []rune inside the
syntax.Inst is larger (along with a few other pointers),
pushing the []syntax.Inst backing array into a larger size
class, avoiding the collision with chan/perm.go's
inadvertently sized file.

I expect this was more prevalent on OS X than on Linux or
Windows because those managed to run faster or slower and
didn't overlap index2.go with chan/perm.go as often. On the
ARM systems, we only run one errorcheck test at a time, so
index2 and chan/perm would never overlap.

It is possible that this bug is the root cause of other crashes
as well. For now we only know it is the cause of the test/run crash.

Many thanks to Dmitriy for help debugging.

Fixes #7344.
Fixes #7455.

LGTM=r, dvyukov, dave, iant
R=golang-codereviews, dave, r, dvyukov, delpontej, iant
CC=golang-codereviews, khr
https://golang.org/cl/74250043

54c901cd

11 Mar, 2014 2 commits

runtime: remove atomic CAS loop from marknogc · 3877f1d9

Dmitriy Vyukov authored 10 years ago

Spans are now private to threads, and the loop
is removed from all other functions.
Remove it from marknogc for consistency.

LGTM=khr, rsc
R=golang-codereviews, bradfitz, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/72520043

3877f1d9

runtime: wipe out bitSpecial from GC code · 38f6c3f5

Dmitriy Vyukov authored 10 years ago

LGTM=khr, rsc
R=golang-codereviews, bradfitz, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/72480044

38f6c3f5

10 Mar, 2014 1 commit

runtime: remove unused declarations. · 1306279c

Keith Randall authored 10 years ago

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/73720044

1306279c

07 Mar, 2014 1 commit

runtime: fix memory leak in runfinq · b08156cd

Russ Cox authored 10 years ago

One reason the sync.Pool finalizer test can fail is that
this function's ef1 contains uninitialized data that just
happens to point at some of the old pool. I've seen this cause
retention of a single pool cache line (32 elements) on arm.

Really we need liveness information for C functions, but
for now we can be more careful about data in long-lived
C functions that block.

LGTM=bradfitz, dvyukov
R=golang-codereviews, bradfitz, dvyukov
CC=golang-codereviews, iant, khr
https://golang.org/cl/72490043

b08156cd

06 Mar, 2014 3 commits

runtime: fix malloc page alignment + efence · da1bea0e

Russ Cox authored 10 years ago

Two memory allocator bug fixes.

- efence is not maintaining the proper heap metadata
  to make eventual memory reuse safe, so use SysFault.

- now that our heap PageSize is 8k but most hardware
  uses 4k pages, SysAlloc and SysReserve results must be
  explicitly aligned. Do that in a few more call sites and
  document this fact in malloc.h.

Fixes #7448.

LGTM=iant
R=golang-codereviews, josharian, iant
CC=dvyukov, golang-codereviews
https://golang.org/cl/71750048

da1bea0e

runtime: fix warnings on Plan 9 · fb5e1e1f

David du Colombier authored 10 years ago

warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 2
warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 3

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/71950044

fb5e1e1f

runtime: fix runaway memory usage · aea99eda

Dmitriy Vyukov authored 10 years ago

It was caused by mstats.heap_alloc skew.
Fixes #7430.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/69870055

aea99eda

27 Feb, 2014 2 commits

runtime: move stack shrinking until after sweepgen is incremented. · e9445547

Keith Randall authored 11 years ago

Before GC, we flush all the per-P allocation caches.  Doing
stack shrinking mid-GC causes these caches to fill up.  At the
end of gc, the sweepgen is incremented which causes all of the
data in these caches to be in a bad state (cached but not yet
swept).

Move the stack shrinking until after sweepgen is incremented,
so any caching that happens as part of shrinking is done with
already-swept data.

Reenable stack copying.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/69620043

e9445547

runtime: grow stack by copying · 1665b006

Keith Randall authored 11 years ago

On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one.  During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.

TODO
- Do something about C frames.  When a C frame is in the
  stack segment, it isn't copyable.  We allocate a new segment
  in this case.
  - For idempotent C code, we can abort it, copy the stack,
    then retry.  I'm working on a separate CL for this.
  - For other C code, we can raise the stackguard
    to the lowest Go frame so the next call that Go frame
    makes triggers a copy, which will then succeed.
- Pick a starting stack size?

The plan is that eventually we reach a point where the
stack contains only copyable frames.

LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044

1665b006

26 Feb, 2014 1 commit

runtime: get rid of the settype buffer and lock. · 3b5278fc

Keith Randall authored 11 years ago

MCaches	now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.

Modifying the heap bitmaps also no longer requires a cas.

runtime.free gets more expensive.  But we don't use it
much any more.

It's not much faster on 1 processor, but it's a lot
faster on multiple processors.

benchmark                 old ns/op    new ns/op    delta
BenchmarkSetTypeNoPtr1           24           23   -0.42%
BenchmarkSetTypeNoPtr2           33           34   +0.89%
BenchmarkSetTypePtr1             51           49   -3.72%
BenchmarkSetTypePtr2             55           54   -1.98%

benchmark                old ns/op    new ns/op    delta
BenchmarkAllocation          52739        50770   -3.73%
BenchmarkAllocation-2        33957        34141   +0.54%
BenchmarkAllocation-3        33326        29015  -12.94%
BenchmarkAllocation-4        38105        25795  -32.31%
BenchmarkAllocation-5        68055        24409  -64.13%
BenchmarkAllocation-6        71544        23488  -67.17%
BenchmarkAllocation-7        68374        23041  -66.30%
BenchmarkAllocation-8        70117        20758  -70.40%

LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043

3b5278fc

25 Feb, 2014 1 commit

all: merge NaCl branch (part 1) · 7c8280c9

Dave Cheney authored 11 years ago

See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047

7c8280c9

24 Feb, 2014 3 commits

runtime: fix heap memory corruption · ea875017

Dmitriy Vyukov authored 11 years ago

With concurrent sweeping finc if modified by runfinq and queuefinalizer concurrently.
Fixes crashes like this one:
http://build.golang.org/log/6ad7b59ef2e93e3c9347eabfb4c4bd66df58fd5a
Fixes #7324.
Update #7396

LGTM=rsc
R=golang-codereviews, minux.ma, rsc
CC=golang-codereviews, khr
https://golang.org/cl/67980043

ea875017

runtime: fix potential memory corruption · 6e612ae0

Dmitriy Vyukov authored 11 years ago

Reinforce the guarantee that MSpan_EnsureSwept actually ensures that the span is swept.
I have not observed crashes related to this, but I do not see why it can't crash as well.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/67990043

6e612ae0

runtime: fix double symbol definition · 0ef0d6cd

Dmitriy Vyukov authored 11 years ago

runfinqv is already defined the same way on line 271.
There may also be something to fix in compiler/linker wrt diagnostics.
Fixes #7375.

LGTM=bradfitz
R=golang-codereviews, dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/67850044

0ef0d6cd

20 Feb, 2014 1 commit

runtime: use goc2c as much as possible · 67c83db6

Russ Cox authored 11 years ago

Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.

For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.

Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.

Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.

For both reasons, use goc2c for as many Go-called C functions
as possible.

This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.

No new code here, just reformatting and occasional movement
into .h files.

LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044

67c83db6

13 Feb, 2014 2 commits

runtime: introduce MSpan.needzero instead of writing to span data · 86e3cb8d

Russ Cox authored 11 years ago

This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.

Fixes #6400.

LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046

86e3cb8d

runtime: fix concurrent GC sweep · f8e4a2ef

Dmitriy Vyukov authored 11 years ago

The issue was that one of the MSpan_Sweep callers
was doing sweep with preemption enabled.
Additional checks are added.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62990043

f8e4a2ef

12 Feb, 2014 3 commits

runtime: fix non-concurrent sweep · 73a30435

Russ Cox authored 11 years ago

State of the world:

CL 46430043 introduced a new concurrent sweep but is broken.

CL 62360043 made the new sweep non-concurrent
to try to fix the world while we understand what's wrong with
the concurrent version.

This CL fixes the non-concurrent form to run finalizers.
This CL is just a band-aid to get the build green again.

Dmitriy is working on understanding and then fixing what's
wrong with the concurrent sweep.

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/62370043

73a30435

runtime: temporary disable concurrent GC sweep · 3cac829f

Dmitriy Vyukov authored 11 years ago

We see failures on builders, e.g.:
http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4

LGTM=rsc, dave
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62360043

3cac829f

runtime: concurrent GC sweep · 3c3be622

Dmitriy Vyukov authored 11 years ago

Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.

It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.

At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.

For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.

Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.

http-1
allocated                    7036         7033      -0.04%
allocs                         60           60      +0.00%
cputime                     51050        46700      -8.52%
gc-pause-one             34060569      1777993     -94.78%
gc-pause-total               2554          133     -94.79%
latency-50                 178448       170926      -4.22%
latency-95                 284350       198294     -30.26%
latency-99                 345191       220652     -36.08%
rss                     101564416    101007360      -0.55%
sys-gc                    6606832      6541296      -0.99%
sys-heap                 88801280     87752704      -1.18%
sys-other                 7334208      7405928      +0.98%
sys-stack                  524288       524288      +0.00%
sys-total               103266608    102224216      -1.01%
time                        50339        46533      -7.56%
virtual-mem             292990976    293728256      +0.25%

garbage-1
allocated                 2983818      2990889      +0.24%
allocs                      62880        62902      +0.03%
cputime                  16480000     16190000      -1.76%
gc-pause-one            828462467    487875135     -41.11%
gc-pause-total            4142312      2439375     -41.11%
rss                    1151709184   1153712128      +0.17%
sys-gc                   66068352     66068352      +0.00%
sys-heap               1039728640   1039728640      +0.00%
sys-other                37776064     40770176      +7.93%
sys-stack                 8781824      8781824      +0.00%
sys-total              1152354880   1155348992      +0.26%
time                     16496998     16199876      -1.80%
virtual-mem            1409564672   1402281984      -0.52%

LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043

3c3be622

30 Jan, 2014 1 commit

runtime: increase page size to 8K · e48751e2

Dmitriy Vyukov authored 11 years ago

Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

This is the exact reincarnation of already LGTMed:
https://golang.org/cl/45770044
which must not break darwin/freebsd after:
https://golang.org/cl/56630043
TBR=iant

LGTM=khr, iant
R=iant, khr
CC=golang-codereviews
https://golang.org/cl/58230043

e48751e2

27 Jan, 2014 1 commit

runtime: fix windows build · 86a3a542

Dmitriy Vyukov authored 11 years ago

Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045

86a3a542

24 Jan, 2014 3 commits

runtime: combine small NoScan allocations · 1fa70294

Dmitriy Vyukov authored 11 years ago

Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047

1fa70294

sync: scalable Pool · f8e0057b

Dmitriy Vyukov authored 11 years ago

Introduce fixed-size P-local caches.
When local caches overflow/underflow a batch of items
is transferred to/from global mutex-protected cache.

benchmark                    old ns/op    new ns/op    delta
BenchmarkPool                    50554        22423  -55.65%
BenchmarkPool-4                 400359         5904  -98.53%
BenchmarkPool-16                403311         1598  -99.60%
BenchmarkPool-32                367310         1526  -99.58%

BenchmarkPoolOverlflow            5214         3633  -30.32%
BenchmarkPoolOverlflow-4         42663         9539  -77.64%
BenchmarkPoolOverlflow-8         46919        11385  -75.73%
BenchmarkPoolOverlflow-16        39454        13048  -66.93%

BenchmarkSprintfEmpty                    84           63  -25.68%
BenchmarkSprintfEmpty-2                 371           32  -91.13%
BenchmarkSprintfEmpty-4                 465           22  -95.25%
BenchmarkSprintfEmpty-8                 565           12  -97.77%
BenchmarkSprintfEmpty-16                498            5  -98.87%
BenchmarkSprintfEmpty-32                492            4  -99.04%

BenchmarkSprintfString                  259          229  -11.58%
BenchmarkSprintfString-2                574          144  -74.91%
BenchmarkSprintfString-4                651           77  -88.05%
BenchmarkSprintfString-8                868           47  -94.48%
BenchmarkSprintfString-16               825           33  -95.96%
BenchmarkSprintfString-32               825           30  -96.28%

BenchmarkSprintfInt                     213          188  -11.74%
BenchmarkSprintfInt-2                   448          138  -69.20%
BenchmarkSprintfInt-4                   624           52  -91.63%
BenchmarkSprintfInt-8                   691           31  -95.43%
BenchmarkSprintfInt-16                  724           18  -97.46%
BenchmarkSprintfInt-32                  718           16  -97.70%

BenchmarkSprintfIntInt                  311          282   -9.32%
BenchmarkSprintfIntInt-2                333          145  -56.46%
BenchmarkSprintfIntInt-4                642          110  -82.87%
BenchmarkSprintfIntInt-8                832           42  -94.90%
BenchmarkSprintfIntInt-16               817           24  -97.00%
BenchmarkSprintfIntInt-32               805           22  -97.17%

BenchmarkSprintfPrefixedInt             309          269  -12.94%
BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
BenchmarkSprintfPrefixedInt-32          824           50  -93.83%

BenchmarkSprintfFloat                   418          398   -4.78%
BenchmarkSprintfFloat-2                 295          203  -31.19%
BenchmarkSprintfFloat-4                 585          128  -78.12%
BenchmarkSprintfFloat-8                 873           60  -93.13%
BenchmarkSprintfFloat-16                884           33  -96.24%
BenchmarkSprintfFloat-32                881           29  -96.62%

BenchmarkManyArgs                      1097         1069   -2.55%
BenchmarkManyArgs-2                     705          567  -19.57%
BenchmarkManyArgs-4                     792          319  -59.72%
BenchmarkManyArgs-8                     963          172  -82.14%
BenchmarkManyArgs-16                   1115          103  -90.76%
BenchmarkManyArgs-32                   1133           90  -92.03%

LGTM=rsc
R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/46010043

f8e0057b

cmd/gc: add zeroing to enable precise stack accounting · a81692e2

Russ Cox authored 11 years ago

There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.

This CL only has an effect if you are building with

        GOEXPERIMENT=precisestack ./all.bash

(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.

amd64 only (and it's not really great generated code).

TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043

a81692e2

23 Jan, 2014 2 commits

undo CL 45770044 / d795425bfa18 · 8371b014

Dmitriy Vyukov authored 11 years ago

Breaks darwin and freebsd.

««« original CL description
runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
»»»

R=golang-codereviews
CC=golang-codereviews
https://golang.org/cl/56060043

8371b014

runtime: increase page size to 8K · 6d603af6

Dmitriy Vyukov authored 11 years ago

Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044

6d603af6

22 Jan, 2014 1 commit

runtime: remove locks from netpoll hotpaths · 9cbd2fb1

Dmitriy Vyukov authored 11 years ago

Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark old ns/op new ns/op delta
BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78%
BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22%
BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39%
BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85%
BenchmarkTCP4Persistent 9411 9240 -1.82%
BenchmarkTCP4Persistent-2 5888 5813 -1.27%
BenchmarkTCP4Persistent-4 4016 3968 -1.20%
BenchmarkTCP4Persistent-8 3943 3857 -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043

9cbd2fb1

21 Jan, 2014 3 commits

runtime: do not collect GC roots explicitly · cb133c66

Dmitriy Vyukov authored 11 years ago

Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.

With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.

garbage-8
allocated                 2987886      2989221      +0.04%
allocs                      62885        62887      +0.00%
cputime                  21286000     21272000      -0.07%
gc-pause-one             26633247     24885421      -6.56%
gc-pause-total             873570       811264      -7.13%
rss                     242089984    242515968      +0.18%
sys-gc                   13934336     13869056      -0.47%
sys-heap                205062144    205062144      +0.00%
sys-other                12628288     12628288      +0.00%
sys-stack                11534336     11927552      +3.41%
sys-total               243159104    243487040      +0.13%
time                      2809477      2740795      -2.44%

R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043

cb133c66

runtime: per-P defer pool · 1ba04c17

Dmitriy Vyukov authored 11 years ago

Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.

For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/42750044

1ba04c17

runtime: zero 2-word memory blocks in-place · d5a36cd6

Dmitriy Vyukov authored 11 years ago

Currently for 2-word blocks we set the flag to clear the flag. Makes no sense.
In particular on 32-bits we call memclr always.

R=golang-codereviews, dave, iant
CC=golang-codereviews, khr, rsc
https://golang.org/cl/41170044

d5a36cd6

16 Jan, 2014 1 commit

runtime: output how long goroutines are blocked · c0b9e621

Dmitriy Vyukov authored 11 years ago

Example of output:

goroutine 4 [sleep for 3 min]:
time.Sleep(0x34630b8a000)
        src/pkg/runtime/time.goc:31 +0x31
main.func·002()
        block.go:16 +0x2c
created by main.main
        block.go:17 +0x33

Full program and output are here:
http://play.golang.org/p/NEZdADI3Td

Fixes #6809.

R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/50420043

c0b9e621