Commits · 1bd39e79dbc146ae953284f82febf7d1fb461f4e · Kirill Smelkov / go

28 Oct, 2016 40 commits

runtime: fix SP adjustment on amd64p32 · 1bd39e79

Austin Clements authored Oct 28, 2016

On amd64p32, rt0_go attempts to reserve 128 bytes of scratch space on
the stack, but due to a register mixup this ends up being a no-op. Fix
this so we actually reserve the stack space.

Change-Id: I04dbfbeb44f3109528c8ec74e1136bc00d7e1faa
Reviewed-on: https://go-review.googlesource.com/32331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

1bd39e79

runtime: disable stack rescanning by default · bd640c88

Austin Clements authored Oct 23, 2016

With the hybrid barrier in place, we can now disable stack rescanning
by default. This commit adds a "gcrescanstacks" GODEBUG variable that
is off by default but can be set to re-enable STW stack rescanning.
The plan is to leave this off but available in Go 1.8 for debugging
and as a fallback.

With this change, worst-case mark termination time at GOMAXPROCS=12
*not* including time spent stopping the world (which is still
unbounded) is reliably under 100 µs, with a 95%ile around 50 µs in
every benchmark I tried (the go1 benchmarks, the x/benchmarks garbage
benchmark, and the gcbench activegs and rpc benchmarks). Including
time spent stopping the world usually adds about 20 µs to total STW
time at GOMAXPROCS=12, but I've seen it add around 150 µs in these
benchmarks when a goroutine takes time to reach a safe point (see
issue #10958) or when stopping the world races with goroutine
switches. At GOMAXPROCS=1, where this isn't an issue, worst case STW
is typically 30 µs.

The go-gcbench activegs benchmark is designed to stress large numbers
of dirty stacks. This commit reduces 95%ile STW time for 500k dirty
stacks by nearly three orders of magnitude, from 150ms to 195µs.

This has little effect on the throughput of the go1 benchmarks or the
x/benchmarks benchmarks.

name old time/op new time/op delta
XGarbage-12 2.31ms ± 0% 2.32ms ± 1% +0.28% (p=0.001 n=17+16)
XJSON-12 12.4ms ± 0% 12.4ms ± 0% +0.41% (p=0.000 n=18+18)
XHTTP-12 11.8µs ± 0% 11.8µs ± 1% ~ (p=0.492 n=20+18)

It reduces the tail latency of the x/benchmarks HTTP benchmark:

name old p50-time new p50-time delta
XHTTP-12 489µs ± 0% 491µs ± 1% +0.54% (p=0.000 n=20+18)

name old p95-time new p95-time delta
XHTTP-12 957µs ± 1% 960µs ± 1% +0.28% (p=0.002 n=20+17)

name old p99-time new p99-time delta
XHTTP-12 1.76ms ± 1% 1.64ms ± 1% -7.20% (p=0.000 n=20+18)

Comparing to the beginning of the hybrid barrier implementation
("runtime: parallelize STW mcache flushing") shows that the hybrid
barrier trades a small performance impact for much better STW latency,
as expected. The magnitude of the performance impact is generally
small:

name old time/op new time/op delta
BinaryTree17-12 2.37s ± 1% 2.42s ± 1% +2.04% (p=0.000 n=19+18)
Fannkuch11-12 2.84s ± 0% 2.72s ± 0% -4.00% (p=0.000 n=19+19)
FmtFprintfEmpty-12 44.2ns ± 1% 45.2ns ± 1% +2.20% (p=0.000 n=17+19)
FmtFprintfString-12 130ns ± 1% 134ns ± 0% +2.94% (p=0.000 n=18+16)
FmtFprintfInt-12 114ns ± 1% 117ns ± 0% +3.01% (p=0.000 n=19+15)
FmtFprintfIntInt-12 176ns ± 1% 182ns ± 0% +3.17% (p=0.000 n=20+15)
FmtFprintfPrefixedInt-12 186ns ± 1% 187ns ± 1% +1.04% (p=0.000 n=20+19)
FmtFprintfFloat-12 251ns ± 1% 250ns ± 1% -0.74% (p=0.000 n=17+18)
FmtManyArgs-12 746ns ± 1% 761ns ± 0% +2.08% (p=0.000 n=19+20)
GobDecode-12 6.57ms ± 1% 6.65ms ± 1% +1.11% (p=0.000 n=19+20)
GobEncode-12 5.59ms ± 1% 5.65ms ± 0% +1.08% (p=0.000 n=17+17)
Gzip-12 223ms ± 1% 223ms ± 1% -0.31% (p=0.006 n=20+20)
Gunzip-12 38.0ms ± 0% 37.9ms ± 1% -0.25% (p=0.009 n=19+20)
HTTPClientServer-12 77.5µs ± 1% 78.9µs ± 2% +1.89% (p=0.000 n=20+20)
JSONEncode-12 14.7ms ± 1% 14.9ms ± 0% +0.75% (p=0.000 n=20+20)
JSONDecode-12 53.0ms ± 1% 55.9ms ± 1% +5.54% (p=0.000 n=19+19)
Mandelbrot200-12 3.81ms ± 0% 3.81ms ± 1% +0.20% (p=0.023 n=17+19)
GoParse-12 3.17ms ± 1% 3.18ms ± 1% ~ (p=0.057 n=20+19)
RegexpMatchEasy0_32-12 71.7ns ± 1% 70.4ns ± 1% -1.77% (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12 946ns ± 0% 946ns ± 0% ~ (p=0.405 n=18+18)
RegexpMatchEasy1_32-12 67.2ns ± 2% 67.3ns ± 2% ~ (p=0.732 n=20+20)
RegexpMatchEasy1_1K-12 374ns ± 1% 378ns ± 1% +1.14% (p=0.000 n=18+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 1% ~ (p=0.259 n=18+20)
RegexpMatchMedium_1K-12 34.2µs ± 1% 34.5µs ± 1% +1.03% (p=0.000 n=18+18)
RegexpMatchHard_32-12 1.77µs ± 1% 1.79µs ± 1% +0.73% (p=0.000 n=19+18)
RegexpMatchHard_1K-12 53.6µs ± 1% 54.2µs ± 1% +1.10% (p=0.000 n=19+19)
Template-12 61.5ms ± 1% 63.9ms ± 0% +3.96% (p=0.000 n=18+18)
TimeParse-12 303ns ± 1% 300ns ± 1% -1.08% (p=0.000 n=19+20)
TimeFormat-12 318ns ± 1% 320ns ± 0% +0.79% (p=0.000 n=19+19)
Revcomp-12 (*) 509ms ± 3% 504ms ± 0% ~ (p=0.967 n=7+12)
[Geo mean] 54.3µs 54.8µs +0.88%

(*) Revcomp is highly non-linear, so I only took samples with 2
iterations.

name old time/op new time/op delta
XGarbage-12 2.25ms ± 0% 2.32ms ± 1% +2.74% (p=0.000 n=16+16)
XJSON-12 11.6ms ± 0% 12.4ms ± 0% +6.81% (p=0.000 n=18+18)
XHTTP-12 11.6µs ± 1% 11.8µs ± 1% +1.62% (p=0.000 n=17+18)

Updates #17503.

Updates #17099, since you can't have a rescan list bug if there's no
rescan list. I'm not marking it as fixed, since gcrescanstacks can
still be set to re-enable the rescan lists.

Change-Id: I6e926b4c2dbd4cd56721869d4f817bdbb330b851
Reviewed-on: https://go-review.googlesource.com/31766Reviewed-by: Rick Hudson <rlh@golang.org>

bd640c88

runtime: implement unconditional hybrid barrier · 5380b229

Austin Clements authored Oct 23, 2016

This implements the unconditional version of the hybrid deletion write
barrier, which always shades both the old and new pointer. It's
unconditional for now because barriers on channel operations require
checking both the source and destination stacks and we don't have a
way to funnel this information into the write barrier at the moment.

As part of this change, we modify the typed memclr operations
introduced earlier to invoke the write barrier.

This has basically no overall effect on benchmark performance. This is
good, since it indicates that neither the extra shade nor the new bulk
clear barriers have much effect. It also has little effect on latency.
This is expected, since we haven't yet modified mark termination to
take advantage of the hybrid barrier.

Updates #17503.

Change-Id: Iebedf84af2f0e857bd5d3a2d525f760b5cf7224b
Reviewed-on: https://go-review.googlesource.com/31765Reviewed-by: Rick Hudson <rlh@golang.org>

5380b229

runtime: avoid getfull() barrier most of the time · ee3d2012

Austin Clements authored Oct 26, 2016

With the hybrid barrier, unless we're doing a STW GC or hit a very
rare race (~once per all.bash) that can start mark termination before
all of the work is drained, we don't need to drain the work queue at
all. Even draining an empty work queue is rather expensive since we
have to enter the getfull() barrier, so it's worth avoiding this.

Conveniently, it's quite easy to detect whether or not we actually
need the getufull() barrier: since the world is stopped when we enter
mark termination, everything must have flushed its work to the work
queue, so we can just check the queue. If the queue is empty and we
haven't queued up any jobs that may create more work (which should
always be the case with the hybrid barrier), we can simply have all GC
workers perform non-blocking drains.

Also conveniently, this solution is quite safe. If we do somehow screw
something up and there's work on the work queue, some worker will
still process it, it just may not happen in parallel.

This is not the "right" solution, but it's simple, expedient,
low-risk, and maintains compatibility with debug.gcrescanstacks. When
we remove the gcrescanstacks fallback in Go 1.9, we should also fix
the race that starts mark termination early, and then we can eliminate
work draining from mark termination.

Updates #17503.

Change-Id: I7b3cd5de6a248ab29d78c2b42aed8b7443641361
Reviewed-on: https://go-review.googlesource.com/32186Reviewed-by: Rick Hudson <rlh@golang.org>

ee3d2012

runtime: remove unnecessary step from bulkBarrierPreWrite · d8256824

Austin Clements authored Oct 24, 2016

Currently bulkBarrierPreWrite calls writebarrierptr_prewrite, but this
means that we check writeBarrier.needed twice and perform cgo checks
twice.

Change bulkBarrierPreWrite to call writebarrierptr_prewrite1 to skip
over these duplicate checks.

This may speed up bulkBarrierPreWrite slightly, but mostly this will
save us from running out of nosplit stack space on ppc64x in the near
future.

Updates #17503.

Change-Id: I1cea1a2207e884ab1a279c6a5e378dcdc048b63e
Reviewed-on: https://go-review.googlesource.com/31890Reviewed-by: Rick Hudson <rlh@golang.org>

d8256824

runtime: add deletion barriers on gobuf.ctxt · 70c107c6

Austin Clements authored Oct 19, 2016

gobuf.ctxt is set to nil from many places in assembly code and these
assignments require write barriers with the hybrid barrier.

Conveniently, in most of these places ctxt should already be nil, in
which case we don't need the barrier. This commit changes these places
to assert that ctxt is already nil.

gogo is more complicated, since ctxt may not already be nil. For gogo,
we manually perform the write barrier if ctxt is not nil.

Updates #17503.

Change-Id: I9d75e27c75a1b7f8b715ad112fc5d45ffa856d30
Reviewed-on: https://go-review.googlesource.com/31764Reviewed-by: Cherry Zhang <cherryyz@google.com>

70c107c6

runtime: perform write barrier before pointer write · 8f81dfe8

Austin Clements authored Aug 22, 2016

Currently, we perform write barriers after performing pointer writes.
At the moment, it simply doesn't matter what order this happens in, as
long as they appear atomic to GC. But both the hybrid barrier and ROC
are going to require a pre-write write barrier.

For the hybrid barrier, this is important because the barrier needs to
observe both the current value of the slot and the value that will be
written to it. (Alternatively, the caller could do the write and pass
in the old value, but it seems easier and more useful to just swap the
order of the barrier and the write.)

For ROC, this is necessary because, if the pointer write is going to
make the pointer reachable to some goroutine that it currently is not
visible to, the garbage collector must take some special action before
that pointer becomes more broadly visible.

This commits swaps pointer writes around so the write barrier occurs
before the pointer write.

The main subtlety here is bulk memory writes. Currently, these copy to
the destination first and then use the pointer bitmap of the
destination to find the copied pointers and invoke the write barrier.
This is necessary because the source may not have a pointer bitmap. To
handle these, we pass both the source and the destination to the bulk
memory barrier, which uses the pointer bitmap of the destination, but
reads the pointer values from the source.

Updates #17503.

Change-Id: I78ecc0c5c94ee81c29019c305b3d232069294a55
Reviewed-on: https://go-review.googlesource.com/31763Reviewed-by: Rick Hudson <rlh@golang.org>

8f81dfe8

cmd/go: apply import restrictions to test code too · 0f06d0a0

Russ Cox authored Oct 21, 2016

We reject import of main packages, but we missed tests.
Reject in all tests except test of that main package.

We reject local (relative) imports from code with a
non-local import path, but again we missed tests.
Reject those too.

Fixes #14811.
Fixes #15795.
Fixes #17475.

Change-Id: I535ff26889520276a891904f54f1a85b2c40207d
Reviewed-on: https://go-review.googlesource.com/31821
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>

0f06d0a0

cmd/compile: add Param to Sizeof test · 91c1cdfb

Josh Bleecher Snyder authored Oct 28, 2016

Change-Id: I2a710f0e9b484b3dfc581d3a9a23aa13321ec267
Reviewed-on: https://go-review.googlesource.com/32316
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

91c1cdfb

cmd/internal/obj/arm64: materialize float constant 0 from zero register · a866df26

Cherry Zhang authored Oct 26, 2016

Materialize float constant 0 from integer zero register, instead
of loading from constant pool.

Also fix assembling FMOV from zero register to FP register.

Change-Id: Ie413dd342cedebdb95ba8cfc220e23ed2a39e885
Reviewed-on: https://go-review.googlesource.com/32250
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>

a866df26

cmd/link: put text at address 0x1000000 on darwin/amd64 · 9d1efba2

Cherry Zhang authored Oct 27, 2016

Apparently on macOS Sierra LLDB thinks /usr/lib/dyld is mapped
at address 0, even if Go code starts at 0x1000, and it looks up
addresses from dyld which shadows Go symbols. Move Go binary at
a higher address to avoid clash.

Fixes #17463. Re-enable TestLldbPython.

Change-Id: I89ca6f3ee48aa6da9862bfa0c2da91477cc93255
Reviewed-on: https://go-review.googlesource.com/32185
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>

9d1efba2

cmd/compile: disable various write barrier optimizations · c39918a0

Austin Clements authored Oct 18, 2016

Several of our current write barrier elision optimizations are invalid
with the hybrid barrier. Eliding the hybrid barrier requires that
*both* the current and new pointer be already shaded and, since we
don't have the flow analysis to figure out anything about the slot's
current value, for now we have to just disable several of these
optimizations.

This has a slight impact on binary size. On linux/amd64, the go tool
binary increases by 0.7% and the compile binary increases by 1.5%.

It also has a slight impact on performance, as one would expect. We'll
win some of this back in subsequent commits.

name old time/op new time/op delta
BinaryTree17-12 2.38s ± 1% 2.40s ± 1% +0.82% (p=0.000 n=18+20)
Fannkuch11-12 2.84s ± 1% 2.70s ± 0% -4.97% (p=0.000 n=18+18)
FmtFprintfEmpty-12 44.2ns ± 1% 46.4ns ± 2% +4.89% (p=0.000 n=16+18)
FmtFprintfString-12 131ns ± 0% 134ns ± 1% +2.05% (p=0.000 n=12+19)
FmtFprintfInt-12 114ns ± 1% 117ns ± 1% +3.26% (p=0.000 n=19+20)
FmtFprintfIntInt-12 176ns ± 1% 181ns ± 1% +3.25% (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12 185ns ± 1% 190ns ± 1% +2.77% (p=0.000 n=19+18)
FmtFprintfFloat-12 249ns ± 1% 254ns ± 1% +1.71% (p=0.000 n=18+20)
FmtManyArgs-12 747ns ± 1% 743ns ± 1% -0.58% (p=0.000 n=19+18)
GobDecode-12 6.57ms ± 1% 6.61ms ± 0% +0.73% (p=0.000 n=19+20)
GobEncode-12 5.58ms ± 1% 5.60ms ± 0% +0.27% (p=0.001 n=18+18)
Gzip-12 223ms ± 1% 223ms ± 1% ~ (p=0.351 n=19+20)
Gunzip-12 37.9ms ± 0% 37.9ms ± 1% ~ (p=0.095 n=16+20)
HTTPClientServer-12 77.8µs ± 1% 78.5µs ± 1% +0.97% (p=0.000 n=19+20)
JSONEncode-12 14.8ms ± 1% 14.8ms ± 1% ~ (p=0.079 n=20+19)
JSONDecode-12 53.7ms ± 1% 54.2ms ± 1% +0.92% (p=0.000 n=20+19)
Mandelbrot200-12 3.81ms ± 1% 3.81ms ± 0% ~ (p=0.916 n=19+18)
GoParse-12 3.19ms ± 1% 3.19ms ± 1% ~ (p=0.175 n=20+19)
RegexpMatchEasy0_32-12 71.9ns ± 1% 70.6ns ± 1% -1.87% (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12 946ns ± 0% 944ns ± 0% -0.22% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 67.3ns ± 2% 66.8ns ± 1% -0.72% (p=0.008 n=20+20)
RegexpMatchEasy1_1K-12 374ns ± 1% 384ns ± 1% +2.69% (p=0.000 n=18+20)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 1% ~ (p=1.000 n=20+20)
RegexpMatchMedium_1K-12 34.3µs ± 1% 34.6µs ± 1% +0.90% (p=0.000 n=20+20)
RegexpMatchHard_32-12 1.78µs ± 1% 1.80µs ± 1% +1.45% (p=0.000 n=20+19)
RegexpMatchHard_1K-12 53.6µs ± 0% 54.5µs ± 1% +1.52% (p=0.000 n=19+18)
Revcomp-12 417ms ± 5% 391ms ± 1% -6.42% (p=0.000 n=16+19)
Template-12 61.1ms ± 1% 64.2ms ± 0% +5.07% (p=0.000 n=19+20)
TimeParse-12 302ns ± 1% 305ns ± 1% +0.90% (p=0.000 n=18+18)
TimeFormat-12 319ns ± 1% 315ns ± 1% -1.25% (p=0.000 n=18+18)
[Geo mean] 54.0µs 54.3µs +0.58%

name old time/op new time/op delta
XGarbage-12 2.24ms ± 2% 2.28ms ± 1% +1.68% (p=0.000 n=18+17)
XHTTP-12 11.4µs ± 1% 11.6µs ± 2% +1.63% (p=0.000 n=18+18)
XJSON-12 11.6ms ± 0% 12.5ms ± 0% +7.84% (p=0.000 n=18+17)

Updates #17503.

Change-Id: I1899f8e35662971e24bf692b517dfbe2b533c00c
Reviewed-on: https://go-review.googlesource.com/31572Reviewed-by: Keith Randall <khr@golang.org>

c39918a0

runtime: eliminate write barriers from save · c3163d23

Austin Clements authored Oct 19, 2016

As for dropg, save is writing a nil pointer that will generate a write
barrier with the hybrid barrier. However, in this case, ctxt always
should already be nil, so replace the write with an assertion that
this is the case.

At this point, we're ready to disable the write barrier elision
optimizations that interfere with the hybrid barrier.

Updates #17503.

Change-Id: I83208e65aa33403d442401f355b2e013ab9a50e9
Reviewed-on: https://go-review.googlesource.com/31571Reviewed-by: Rick Hudson <rlh@golang.org>

c3163d23

runtime: eliminate write barriers from dropg · 8044b77a

Austin Clements authored Oct 19, 2016

Currently this contains no write barriers because it's writing nil
pointers, but with the hybrid barrier, even these will produce write
barriers. However, since these are *gs and *ms, they don't need write
barriers, so we can simply eliminate them.

Updates #17503.

Change-Id: Ib188a60492c5cfb352814bf9b2bcb2941fb7d6c0
Reviewed-on: https://go-review.googlesource.com/31570Reviewed-by: Rick Hudson <rlh@golang.org>

8044b77a

runtime: mark tiny blocks at GC start · 85c22bc3

Austin Clements authored Sep 09, 2016

The hybrid barrier requires allocate-black, but there's one case where
we don't currently allocate black: the tiny allocator. If we allocate
a *new* tiny alloc block during GC, it will be allocated black, but if
we allocated the current block before GC, it won't be black, and the
further allocations from it won't mark it, which means we may free a
reachable tiny block during sweeping.

Fix this by passing over all mcaches at the beginning of mark, while
the world is still stopped, and greying their tiny blocks.

Updates #17503.

Change-Id: I04d4df7cc2f553f8f7b1e4cb0b52e2946588111a
Reviewed-on: https://go-review.googlesource.com/31456Reviewed-by: Rick Hudson <rlh@golang.org>

85c22bc3

runtime: shade stack-to-stack copy when starting a goroutine · ee785f03

Austin Clements authored Oct 13, 2016

The hybrid barrier requires barriers on stack-to-stack copies if
either stack is grey. There are only two instances of this in the
runtime: channel sends and starting a goroutine. Channel sends already
use typedmemmove and hence have the necessary barriers. This commits
adds barriers for the stack-to-stack copy when starting a goroutine.

Updates #17503.

Change-Id: Ibb55e08127ca4d021ac54be61cb96732efa5df5b
Reviewed-on: https://go-review.googlesource.com/31455Reviewed-by: Rick Hudson <rlh@golang.org>

ee785f03

runtime/pprof/internal/profile: add copyright notice to profile_memmap.go · 0c0960a2

Michael Matloob authored Oct 28, 2016

Change-Id: Ia511b0aadc87eb53e084d14cdb90ba4be958a43e
Reviewed-on: https://go-review.googlesource.com/32259Reviewed-by: Austin Clements <austin@google.com>

0c0960a2

runtime/pprof: write profiles in protobuf format. · b33030a7

Michael Matloob authored Oct 28, 2016

Original Change by Daria Kolistratova <daria.kolistratova@intel.com>

Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes #16093

Change-Id: I2751345b09a66ee2b6aa64be76cba4cd1c326aa6
Reviewed-on: https://go-review.googlesource.com/32257
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>

b33030a7

encoding/csv: document Read error behavior · 30651b3b

Russ Cox authored Oct 26, 2016

Fixes #17342.

Change-Id: I76af756d7aff464554c5564d444962a468d0eccc
Reviewed-on: https://go-review.googlesource.com/32172
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>

30651b3b

cmd/go, go/build: document form of import paths · e2bcae78

Russ Cox authored Oct 26, 2016

Fixes #16164.

Change-Id: Ic8f51ebd8235640143913a07b70f5b41ee061fe4
Reviewed-on: https://go-review.googlesource.com/32114Reviewed-by: Quentin Smith <quentin@golang.org>

e2bcae78

runtime/trace: deflake TestTraceSymbolize · 2a7272b4

Russ Cox authored Oct 26, 2016

Waiting 2ms for all the kicked-off goroutines to run and block
seems a little optimistic. No harm done by waiting for 200ms instead.

Fixes #17238.

Change-Id: I827532ea2f5f1f3ed04179f8957dd2c563946ed0
Reviewed-on: https://go-review.googlesource.com/32103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

2a7272b4

os: adjust (*File).Read comment · 3366d6a3

Russ Cox authored Oct 26, 2016

Fixes #6639.

Change-Id: Iefce87c5521504fd41843df8462cfd840c24410f
Reviewed-on: https://go-review.googlesource.com/32102
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>

3366d6a3

os/exec: document how Command fills in Cmd.Args · bd8103d5

Quentin Smith authored Oct 21, 2016

Fixes #17536

Change-Id: Ica8c3d696848822ac65b7931455b1fd94809bfe8
Reviewed-on: https://go-review.googlesource.com/31710Reviewed-by: Russ Cox <rsc@golang.org>

bd8103d5

runtime: zero-initialize LR on new stacks · 88518e7d

Austin Clements authored Oct 26, 2016

Currently we initialize LR on a new stack by writing nil to it. But
this is an initializing write since the newly allocated stack is not
zeroed, so this is unsafe with the hybrid barrier. Change this is a
uintptr write to avoid a bad write barrier.

Updates #17503.

Change-Id: I062ac352e35df7da4644c1f2a5aaab87049d1f60
Reviewed-on: https://go-review.googlesource.com/32093Reviewed-by: Rick Hudson <rlh@golang.org>

88518e7d

runtime: ensure finalizers are zero-initialized before reuse · d3836aba

Austin Clements authored Oct 14, 2016

We reuse finalizers in finblocks, which are allocated off-heap. This
means they have to be zero-initialized before becoming visible to the
garbage collector. We actually already do this by clearing the
finalizer before returning it to the pool, but we're not careful to
enforce correct memory ordering. Fix this by manipulating the
finalizer count atomically so these writes synchronize properly with
the garbage collector.

Updates #17503.

Change-Id: I7797d31df3c656c9fe654bc6da287f66a9e2037d
Reviewed-on: https://go-review.googlesource.com/31454Reviewed-by: Rick Hudson <rlh@golang.org>

d3836aba

runtime: avoid write barriers to uninitialized finalizer frame memory · db56a635

Austin Clements authored Oct 03, 2016

runfinq allocates a stack frame on the heap for constructing the
finalizer function calls and reuses it for each call. However, because
the type of this frame is constantly shifting, it tells mallocgc there
are no pointers in it and it acts essentially like uninitialized
memory between uses. But runfinq uses pointer writes with write
barriers to "initialize" this memory, which is not going to be safe
with the hybrid barrier, since the hybrid barrier may see a stale
pointer left in the "uninitialized" frame.

Fix this by zero-initializing the argument values in the frame before
writing the argument pointers.

Updates #17503.

Change-Id: I951c0a2be427eb9082a32d65c4410e6fdef041be
Reviewed-on: https://go-review.googlesource.com/31453Reviewed-by: Rick Hudson <rlh@golang.org>

db56a635

runtime: document rules about unmanaged memory · c5e70065

Austin Clements authored Oct 14, 2016

Updates #17503.

Change-Id: I109d8742358ae983fdff3f3dbb7136973e81f4c3
Reviewed-on: https://go-review.googlesource.com/31452Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

c5e70065

cmd/compile: use typedmemclr for zeroing if there are pointers · 8a7f0ad0

Austin Clements authored Oct 18, 2016

Currently, zeroing generates an ssa.OpZero, which never has write
barriers, even if the assignment is an OASWB. The hybrid barrier
requires write barriers on zeroing, so change OASWB to generate an
ssa.OpZeroWB when assigning the zero value, which turns into a
typedmemclr.

Updates #17503.

Change-Id: Ib37ac5e39f578447dbd6b36a6a54117d5624784d
Reviewed-on: https://go-review.googlesource.com/31451Reviewed-by: Cherry Zhang <cherryyz@google.com>

8a7f0ad0

cmd/compile: lower slice clears to memclrHasPointers · 58e2edaf

Austin Clements authored Oct 18, 2016

If a slice's backing store has pointers, we need to lower clears of
that slice to memclrHasPointers instead of memclrNoHeapPointers.

Updates #17503.

Change-Id: I20750e4bf57f7b8862f3d898bfb32d964b91d07b
Reviewed-on: https://go-review.googlesource.com/31450Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

58e2edaf

mime/multipart: simplify Part.Read · ef3df189

Russ Cox authored Oct 25, 2016

The basic structure of Part.Read should be simple:
do what you can with the current buffered data,
reading more as you need it. Make it that way.

Working entirely in the bufio.Reader's buffer eliminates
the need for an additional bytes.Buffer.

This structure should be easier to extend in the future as
more special cases arise.

Change-Id: I83cb24a755a1767c4c037f9ece6716460c3ecd01
Reviewed-on: https://go-review.googlesource.com/32092
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

ef3df189

Revert "runtime/pprof: write profiles in protobuf format." · 14f3284d

Austin Clements authored Oct 28, 2016

This reverts commit 7d14401b.

Reason for revert: Doesn't build.

Change-Id: I766179ab9225109d9232f783326e4d3843254980
Reviewed-on: https://go-review.googlesource.com/32256Reviewed-by: Russ Cox <rsc@golang.org>

14f3284d

net: add (*UnixListener).SetUnlinkOnClose · eb88b3ee

Russ Cox authored Oct 26, 2016

Let users control whether unix listener socket file is unlinked on close.

Fixes #13877.

Change-Id: I9d1cb47e31418d655f164d15c67e188656a67d1c
Reviewed-on: https://go-review.googlesource.com/32099
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

eb88b3ee

net: only remove Unix domain socket file on the first call to Close · 13558c41

Russ Cox authored Oct 26, 2016

Fixes #17131.

Change-Id: I60b381687746fadce12ef18a190cbe3f435172f2
Reviewed-on: https://go-review.googlesource.com/32098
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>

13558c41

runtime, cmd/compile: rename memclr -> memclrNoHeapPointers · 87e48c5a

Austin Clements authored Oct 17, 2016

Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.

This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.

Updates #17503.

Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

87e48c5a

runtime: make fixalloc zero allocations on reuse · ae3bb4a5

Austin Clements authored Sep 25, 2016

Currently fixalloc does not zero memory it reuses. This is dangerous
with the hybrid barrier if the type may contain heap pointers, since
it may cause us to observe a dead heap pointer on reuse. It's also
error-prone since it's the only allocator that doesn't zero on
allocation (mallocgc of course zeroes, but so do persistentalloc and
sysAlloc). It's also largely pointless: for mcache, the caller
immediately memclrs the allocation; and the two specials types are
tiny so there's no real cost to zeroing them.

Change fixalloc to zero allocations by default.

The only type we don't zero by default is mspan. This actually
requires that the spsn's sweepgen survive across freeing and
reallocating a span. If we were to zero it, the following race would
be possible:

1. The current sweepgen is 2. Span s is on the unswept list.

2. Direct sweeping sweeps span s, finds it's all free, and releases s
to the fixalloc.

3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including
s.sweepgen.

4. Thread 1 calls s.init, which sets s.state to _MSpanDead.

5. On thread 2, background sweeping comes across span s in allspans
and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it
owns it for sweeping. 6. Thread 1 continues initializing s.
Everything breaks.

I would like to fix this because it's obviously confusing, but it's a
subtle enough problem that I'm leaving it alone for now. The solution
may be to skip sweepgen 0, but then we have to think about wrap-around
much more carefully.

Updates #17503.

Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613
Reviewed-on: https://go-review.googlesource.com/31368Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

ae3bb4a5

runtime: make _MSpanDead be the zero state · f4dcc9b2

Austin Clements authored Oct 18, 2016

Currently the zero value of an mspan is in the "in use" state. This
seems like a bad idea in general. But it's going to wreak havoc when
we make fixalloc zero allocations: even "freed" mspan objects are
still on the allspans list and still get looked at by the garbage
collector. Hence, if we leave the mspan states the way they are,
allocating a span that reuses old memory will temporarily pass that
span (which is visible to GC!) through the "in use" state, which can
cause "unswept span" panics.

Fix all of this by making the zero state "dead".

Updates #17503.

Change-Id: I77c7ac06e297af4b9e6258bc091c37abe102acc3
Reviewed-on: https://go-review.googlesource.com/31367Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

f4dcc9b2

runtime: use typedmemclr for typed memory · aa581f51

Austin Clements authored Oct 17, 2016

The hybrid barrier requires distinguishing typed and untyped memory
even when zeroing because the *current* contents of the memory matters
even when overwriting.

This commit introduces runtime.typedmemclr and runtime.memclrHasPointers
as a typed memory clearing functions parallel to runtime.typedmemmove.
Currently these simply call memclr, but with the hybrid barrier we'll
need to shade any pointers we're overwriting. These will provide us
with the necessary hooks to do so.

Updates #17503.

Change-Id: I74478619f8907825898092aaa204d6e4690f27e6
Reviewed-on: https://go-review.googlesource.com/31366Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

aa581f51

runtime: parallelize STW mcache flushing · a475a38a

Austin Clements authored Oct 25, 2016

Currently all mcaches are flushed in a single STW root job. This takes
about 5 µs per P, but since it's done sequentially it adds about
5*GOMAXPROCS µs to the STW.

Fix this by parallelizing the job. Since there are exactly GOMAXPROCS
mcaches to flush, this parallelizes quite nicely and brings the STW
latency cost down to a constant 5 µs (assuming GOMAXPROCS actually
reflects the number of CPUs).

Updates #17503.

Change-Id: Ibefeb1c2229975d5137c6e67fac3b6c92103742d
Reviewed-on: https://go-review.googlesource.com/32033Reviewed-by: Rick Hudson <rlh@golang.org>

a475a38a

cmd/compile: don't alloc Name/Param for unresolved syms · 20edeabc

Josh Bleecher Snyder authored Oct 27, 2016

ONONAME nodes generated from unresolved symbols don't need Params.
They only need Names to store Iota; move Iota to Node.Xoffset.
While we're here, change iota to int64 to reduce casting.

Passes toolstash -cmp.

name old alloc/op new alloc/op delta
Template 39.9MB ± 0% 39.7MB ± 0% -0.39% (p=0.000 n=19+20)
Unicode 30.9MB ± 0% 30.7MB ± 0% -0.35% (p=0.000 n=20+20)
GoTypes 119MB ± 0% 118MB ± 0% -0.42% (p=0.000 n=20+20)
Compiler 464MB ± 0% 461MB ± 0% -0.54% (p=0.000 n=19+20)

name old allocs/op new allocs/op delta
Template 386k ± 0% 383k ± 0% -0.62% (p=0.000 n=20+20)
Unicode 323k ± 0% 321k ± 0% -0.49% (p=0.000 n=20+20)
GoTypes 1.16M ± 0% 1.15M ± 0% -0.67% (p=0.000 n=20+20)
Compiler 4.09M ± 0% 4.05M ± 0% -0.95% (p=0.000 n=20+20)

Change-Id: Ib27219a0d0405def1b4dadacf64935ba12d10a94
Reviewed-on: https://go-review.googlesource.com/32237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

20edeabc

runtime/pprof: write profiles in protobuf format. · 7d14401b

unknown authored Oct 07, 2016

Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes #16093
Change-Id: I5acdb2809cab0d16ed4694fdaa7b8ddfd68df11e
Reviewed-on: https://go-review.googlesource.com/30556
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>

7d14401b