Commits · 432cb66f16b2bb6a167725057168bbe4aefe5fb5 · Kirill Smelkov / go

An error occurred fetching the project authors.

12 Nov, 2015 1 commit

runtime: break out system-specific constants into package sys · 432cb66f

Michael Matloob authored 9 years ago

runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817Reviewed-by: Russ Cox <rsc@golang.org>

432cb66f

11 Nov, 2015 3 commits

runtime: remove unused marking parfor · d727312c

Austin Clements authored 9 years ago

The GC now handles the root marking jobs as part of general marking,
so work.markfor is no longer used.

Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14
Reviewed-on: https://go-review.googlesource.com/16811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

d727312c

runtime: clean up park messages · f84420c2

Austin Clements authored 9 years ago

This changes "mark worker (idle)" to "GC worker (idle)" so it's more
clear to users that these goroutines are GC-related. It changes "GC
assist" to "GC assist wait" to make it clear that the assist is
blocked.

Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76
Reviewed-on: https://go-review.googlesource.com/16511Reviewed-by: Rick Hudson <rlh@golang.org>

f84420c2

runtime: free stack spans outside STW · 56ad88b1

Austin Clements authored 9 years ago

We couldn't do this before this point because it must be done before
the next GC cycle starts. Hence, if it delayed the start of the next
cycle, that would widen the window between reaching the heap trigger
of the next cycle and starting the next GC cycle, during which the
mutator could over-allocate. With the decentralized GC, any mutators
that reach the heap trigger will block on the GC starting, so it's
safe to widen the time between starting the world and being able to
start the next GC cycle.

Fixes #11465.

Change-Id: Ic7ea7e9eba5b66fc050299f843a9c9001ad814aa
Reviewed-on: https://go-review.googlesource.com/16394Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

56ad88b1

10 Nov, 2015 1 commit

runtime: break atomics out into package runtime/internal/atomic · 67faca7d

Michael Matloob authored 9 years ago

This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>

67faca7d

05 Nov, 2015 11 commits

runtime: remove background GC goroutine and mark barriers · d5ba5821

Austin Clements authored 9 years ago

These are now unused.

Updates #11970.

Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f
Reviewed-on: https://go-review.googlesource.com/16393Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

d5ba5821

runtime: decentralize mark done and mark termination · c99d7f7f

Austin Clements authored 9 years ago

This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.

One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.

Fixes #11970.

Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

c99d7f7f

runtime: account mark worker time before gcMarkDone · d986bf27

Austin Clements authored 9 years ago

Currently gcMarkDone takes basically no time, so it's okay to account
the worker time after calling it. However, gcMarkDone is about to take
potentially *much* longer because it may perform all of mark
termination. Prepare for this by swapping the order so we account the
time before calling gcMarkDone.

Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb
Reviewed-on: https://go-review.googlesource.com/16390Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

d986bf27

runtime: factor mark done transition · 171204b5

Austin Clements authored 9 years ago

Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.

Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
Reviewed-on: https://go-review.googlesource.com/16359Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

171204b5

runtime: eliminate mark completion in scheduler · 12e23f05

Austin Clements authored 9 years ago

Currently, findRunnableGCWorker will perform mark completion if there
is no remaining work and no running workers. This used to be necessary
to resolve a race in the transition from mark 1 to mark 2 where we
would enter mark 2 with no mark work (and no dedicated workers), so no
workers would run, so no worker would signal mark completion.

However, we're about to make mark completion also perform the entire
follow-on process, which includes mark termination. We really don't
want to do that in the scheduler if it happens to detect completion.

Conveniently, this hack is no longer necessary because we always
enqueue root scanning work at the beginning of both mark 1 and mark 2,
so a mark worker will always run. Hence, we can simply eliminate it.

Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e
Reviewed-on: https://go-review.googlesource.com/16358Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

12e23f05

runtime: decentralize sweep termination and mark transition · a51905fa

Austin Clements authored 9 years ago

This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.

As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).

At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.

Updates #11970.

Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

a51905fa

runtime: decentralize concurrent sweep termination · 9630c47e

Austin Clements authored 9 years ago

This moves concurrent sweep termination from the coordinator to the
off->mark transition. This allows it to be performed by all Gs
attempting to start the GC.

Updates #11970.

Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947
Reviewed-on: https://go-review.googlesource.com/16356Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

9630c47e

runtime: beginning of decentralized off->mark transition · f54bcedc

Austin Clements authored 9 years ago

This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).

This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.

This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.

Updates #11970.

Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

f54bcedc

runtime: move concurrent mark setup off system stack · 38425962

Austin Clements authored 9 years ago

For historical reasons we currently do a lot of the concurrent mark
setup on the system stack. In fact, at this point the one and only
thing that needs to happen on the system stack is the start-the-world.

Clean up this code by lifting everything other than the
start-the-world off the system stack.

The diff for this change looks large, but the only code change is to
narrow the systemstack call. Everything else is re-indentation.

Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce
Reviewed-on: https://go-review.googlesource.com/16354Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

38425962

runtime: lift state variables from func gc to var work · 19596215

Austin Clements authored 9 years ago

We're about to split func gc across several functions, so lift the
local variables it uses for tracking statistics and state across the
cycle into the global "work" variable.

Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37
Reviewed-on: https://go-review.googlesource.com/16353Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

19596215

runtime: note a minor issue with GODEUG=gcstoptheworld · 16980189

Austin Clements authored 9 years ago

Change-Id: I91cda8d88b0852cd0f868d33c594206bcca0c386
Reviewed-on: https://go-review.googlesource.com/16352Reviewed-by: Rick Hudson <rlh@golang.org>

16980189

04 Nov, 2015 2 commits

runtime: make putfull start mark workers · dcd9e5bc

Austin Clements authored 9 years ago

Currently we depend on the good graces and timing of the scheduler to
get opportunities to start dedicated mark workers. In the worst case,
it may take 10ms to get dedicated mark workers going at the beginning
of mark 1 and mark 2 or after the amount of available work has dropped
and gone back up.

Instead of waiting for the regular preemption logic to get around to
us, make putfull enlist a random P if we're not already running enough
dedicated workers. This should improve performance stability of the
garbage collector and is likely to improve the overall performance
somewhat.

No overall effect on the go1 benchmarks. It speeds up the garbage
benchmark by 12%, which more than counters the performance loss from
the previous commit.

name old time/op new time/op delta
XBenchGarbage-12 6.32ms ± 4% 5.58ms ± 2% -11.68% (p=0.000 n=20+16)

name old time/op new time/op delta
BinaryTree17-12 3.18s ± 5% 3.12s ± 4% -1.83% (p=0.021 n=20+20)
Fannkuch11-12 2.50s ± 2% 2.46s ± 2% -1.57% (p=0.000 n=18+19)
FmtFprintfEmpty-12 50.8ns ± 3% 50.4ns ± 3% ~ (p=0.184 n=20+20)
FmtFprintfString-12 167ns ± 2% 171ns ± 1% +2.46% (p=0.000 n=20+19)
FmtFprintfInt-12 161ns ± 2% 163ns ± 2% +1.81% (p=0.000 n=20+20)
FmtFprintfIntInt-12 269ns ± 1% 266ns ± 1% -0.81% (p=0.002 n=19+20)
FmtFprintfPrefixedInt-12 237ns ± 2% 231ns ± 2% -2.86% (p=0.000 n=20+20)
FmtFprintfFloat-12 313ns ± 2% 313ns ± 1% ~ (p=0.681 n=20+20)
FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 1% -2.26% (p=0.000 n=20+20)
GobDecode-12 8.66ms ± 1% 8.67ms ± 1% ~ (p=0.380 n=19+20)
GobEncode-12 6.56ms ± 1% 6.56ms ± 2% ~ (p=0.607 n=19+20)
Gzip-12 317ms ± 1% 314ms ± 2% -1.10% (p=0.000 n=20+19)
Gunzip-12 42.1ms ± 1% 42.2ms ± 1% +0.27% (p=0.044 n=20+19)
HTTPClientServer-12 62.7µs ± 1% 62.0µs ± 1% -1.04% (p=0.000 n=19+18)
JSONEncode-12 16.7ms ± 1% 16.8ms ± 2% +0.59% (p=0.021 n=20+20)
JSONDecode-12 58.2ms ± 1% 61.4ms ± 2% +5.43% (p=0.000 n=18+19)
Mandelbrot200-12 3.84ms ± 1% 3.87ms ± 2% +0.79% (p=0.008 n=18+20)
GoParse-12 3.86ms ± 2% 3.76ms ± 2% -2.60% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 100ns ± 2% 100ns ± 1% -0.68% (p=0.005 n=18+15)
RegexpMatchEasy0_1K-12 332ns ± 1% 342ns ± 1% +3.16% (p=0.000 n=19+19)
RegexpMatchEasy1_32-12 82.9ns ± 3% 83.0ns ± 2% ~ (p=0.906 n=19+20)
RegexpMatchEasy1_1K-12 487ns ± 1% 494ns ± 1% +1.50% (p=0.000 n=17+20)
RegexpMatchMedium_32-12 131ns ± 2% 130ns ± 1% ~ (p=0.686 n=19+20)
RegexpMatchMedium_1K-12 39.6µs ± 1% 39.2µs ± 1% -1.09% (p=0.000 n=18+19)
RegexpMatchHard_32-12 2.04µs ± 1% 2.04µs ± 2% ~ (p=0.804 n=20+20)
RegexpMatchHard_1K-12 61.7µs ± 2% 61.3µs ± 2% ~ (p=0.052 n=18+20)
Revcomp-12 529ms ± 2% 533ms ± 1% +0.83% (p=0.003 n=20+19)
Template-12 70.7ms ± 2% 71.0ms ± 2% ~ (p=0.065 n=20+19)
TimeParse-12 351ns ± 2% 355ns ± 1% +1.25% (p=0.000 n=19+20)
TimeFormat-12 362ns ± 2% 373ns ± 1% +2.83% (p=0.000 n=18+20)
[Geo mean] 62.2µs 62.3µs +0.13%

name old speed new speed delta
GobDecode-12 88.6MB/s ± 1% 88.5MB/s ± 1% ~ (p=0.392 n=19+20)
GobEncode-12 117MB/s ± 1% 117MB/s ± 1% ~ (p=0.622 n=19+20)
Gzip-12 61.1MB/s ± 1% 61.8MB/s ± 2% +1.11% (p=0.000 n=20+19)
Gunzip-12 461MB/s ± 1% 460MB/s ± 1% -0.27% (p=0.044 n=20+19)
JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% -0.58% (p=0.022 n=20+20)
JSONDecode-12 33.3MB/s ± 1% 31.6MB/s ± 2% -5.15% (p=0.000 n=18+19)
GoParse-12 15.0MB/s ± 2% 15.4MB/s ± 2% +2.66% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 317MB/s ± 2% 319MB/s ± 2% ~ (p=0.052 n=20+20)
RegexpMatchEasy0_1K-12 3.08GB/s ± 1% 2.99GB/s ± 1% -3.07% (p=0.000 n=19+19)
RegexpMatchEasy1_32-12 386MB/s ± 3% 386MB/s ± 2% ~ (p=0.939 n=19+20)
RegexpMatchEasy1_1K-12 2.10GB/s ± 1% 2.07GB/s ± 1% -1.46% (p=0.000 n=17+20)
RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.64MB/s ± 1% ~ (p=0.702 n=19+20)
RegexpMatchMedium_1K-12 25.9MB/s ± 1% 26.1MB/s ± 2% +0.99% (p=0.000 n=18+20)
RegexpMatchHard_32-12 15.7MB/s ± 1% 15.7MB/s ± 2% ~ (p=0.723 n=20+20)
RegexpMatchHard_1K-12 16.6MB/s ± 2% 16.7MB/s ± 2% ~ (p=0.052 n=18+20)
Revcomp-12 481MB/s ± 2% 477MB/s ± 1% -0.83% (p=0.003 n=20+19)
Template-12 27.5MB/s ± 2% 27.3MB/s ± 2% ~ (p=0.062 n=20+19)
[Geo mean] 99.4MB/s 99.1MB/s -0.35%

Change-Id: I914d8cadded5a230509d118164a4c201601afc06
Reviewed-on: https://go-review.googlesource.com/16298Reviewed-by: Rick Hudson <rlh@golang.org>

dcd9e5bc

runtime: eliminate getfull barrier from concurrent mark · 62ba520b

Austin Clements authored 9 years ago

Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.

In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).

Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.

This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).

The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.

Updates #12041 and #8687.

No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.

name old time/op new time/op delta
XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20)

Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297Reviewed-by: Rick Hudson <rlh@golang.org>

62ba520b

03 Nov, 2015 2 commits

runtime: cache two workbufs to reduce contention · b6c0934a

Austin Clements authored 9 years ago

Currently the gcWork abstraction caches a single work buffer. As a
result, if a worker is putting and getting pointers right at the
boundary of a work buffer, it can flap between work buffers and
(potentially significantly) increase contention on the global work
buffer lists.

This change modifies gcWork to instead cache two work buffers and
switch off between them. This introduces one buffers' worth of
hysteresis and eliminates the above performance worst case by
amortizing the cost of getting or putting a work buffer over at least
one buffers' worth of work.

In practice, it's difficult to trigger this worst case with reasonably
large work buffers. On the garbage benchmark, this reduces the max
writes/sec to the global work list from 32K to 25K and the median from
6K to 5K. However, if a workload were to trigger this worst case
behavior, it could significantly drive up this contention.

This has negligible effects on the go1 benchmarks and slightly speeds
up the garbage benchmark.

name old time/op new time/op delta
XBenchGarbage-12 5.90ms ± 3% 5.83ms ± 4% -1.18% (p=0.011 n=18+18)

name old time/op new time/op delta
BinaryTree17-12 3.22s ± 4% 3.17s ± 3% -1.57% (p=0.009 n=19+20)
Fannkuch11-12 2.44s ± 1% 2.53s ± 4% +3.78% (p=0.000 n=18+19)
FmtFprintfEmpty-12 50.2ns ± 2% 50.5ns ± 5% ~ (p=0.631 n=19+20)
FmtFprintfString-12 167ns ± 1% 166ns ± 1% ~ (p=0.141 n=20+20)
FmtFprintfInt-12 162ns ± 1% 159ns ± 1% -1.80% (p=0.000 n=20+20)
FmtFprintfIntInt-12 277ns ± 2% 263ns ± 1% -4.78% (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12 240ns ± 1% 232ns ± 2% -3.25% (p=0.000 n=20+20)
FmtFprintfFloat-12 311ns ± 1% 315ns ± 2% +1.17% (p=0.000 n=20+20)
FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 2% -1.72% (p=0.000 n=20+20)
GobDecode-12 8.65ms ± 1% 8.71ms ± 2% +0.68% (p=0.001 n=19+20)
GobEncode-12 6.51ms ± 1% 6.54ms ± 1% +0.42% (p=0.047 n=20+19)
Gzip-12 318ms ± 2% 315ms ± 2% -1.20% (p=0.000 n=19+19)
Gunzip-12 42.2ms ± 2% 42.1ms ± 1% ~ (p=0.667 n=20+19)
HTTPClientServer-12 62.5µs ± 1% 62.4µs ± 1% ~ (p=0.110 n=20+18)
JSONEncode-12 16.8ms ± 1% 16.8ms ± 2% ~ (p=0.569 n=19+20)
JSONDecode-12 60.8ms ± 2% 59.8ms ± 1% -1.69% (p=0.000 n=19+19)
Mandelbrot200-12 3.87ms ± 1% 3.85ms ± 0% -0.61% (p=0.001 n=20+17)
GoParse-12 3.76ms ± 2% 3.76ms ± 1% ~ (p=0.698 n=20+20)
RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% ~ (p=0.065 n=19+20)
RegexpMatchEasy0_1K-12 342ns ± 2% 333ns ± 1% -2.82% (p=0.000 n=20+19)
RegexpMatchEasy1_32-12 83.3ns ± 2% 83.2ns ± 2% ~ (p=0.692 n=20+19)
RegexpMatchEasy1_1K-12 498ns ± 2% 490ns ± 1% -1.52% (p=0.000 n=18+20)
RegexpMatchMedium_32-12 131ns ± 2% 131ns ± 2% ~ (p=0.464 n=20+18)
RegexpMatchMedium_1K-12 39.3µs ± 2% 39.6µs ± 1% +0.77% (p=0.000 n=18+19)
RegexpMatchHard_32-12 2.04µs ± 2% 2.06µs ± 1% +0.69% (p=0.009 n=19+20)
RegexpMatchHard_1K-12 61.4µs ± 2% 62.1µs ± 1% +1.21% (p=0.000 n=19+20)
Revcomp-12 534ms ± 1% 529ms ± 1% -0.97% (p=0.000 n=19+16)
Template-12 70.4ms ± 2% 70.0ms ± 1% ~ (p=0.070 n=19+19)
TimeParse-12 359ns ± 3% 344ns ± 1% -4.15% (p=0.000 n=19+19)
TimeFormat-12 357ns ± 1% 361ns ± 2% +1.05% (p=0.002 n=20+20)
[Geo mean] 62.4µs 62.0µs -0.56%

name old speed new speed delta
GobDecode-12 88.7MB/s ± 1% 88.1MB/s ± 2% -0.68% (p=0.001 n=19+20)
GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.42% (p=0.046 n=20+19)
Gzip-12 60.9MB/s ± 2% 61.7MB/s ± 2% +1.21% (p=0.000 n=19+19)
Gunzip-12 460MB/s ± 2% 461MB/s ± 1% ~ (p=0.661 n=20+19)
JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% ~ (p=0.555 n=19+20)
JSONDecode-12 31.9MB/s ± 2% 32.5MB/s ± 1% +1.72% (p=0.000 n=19+19)
GoParse-12 15.4MB/s ± 2% 15.4MB/s ± 1% ~ (p=0.653 n=20+20)
RegexpMatchEasy0_32-12 317MB/s ± 2% 315MB/s ± 2% ~ (p=0.141 n=19+20)
RegexpMatchEasy0_1K-12 2.99GB/s ± 2% 3.07GB/s ± 1% +2.86% (p=0.000 n=20+19)
RegexpMatchEasy1_32-12 384MB/s ± 2% 385MB/s ± 2% ~ (p=0.672 n=20+19)
RegexpMatchEasy1_1K-12 2.06GB/s ± 2% 2.09GB/s ± 1% +1.54% (p=0.000 n=18+20)
RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.63MB/s ± 2% ~ (p=0.800 n=20+18)
RegexpMatchMedium_1K-12 26.0MB/s ± 1% 25.8MB/s ± 1% -0.77% (p=0.000 n=18+19)
RegexpMatchHard_32-12 15.7MB/s ± 2% 15.6MB/s ± 1% -0.69% (p=0.010 n=19+20)
RegexpMatchHard_1K-12 16.7MB/s ± 2% 16.5MB/s ± 1% -1.19% (p=0.000 n=19+20)
Revcomp-12 476MB/s ± 1% 481MB/s ± 1% +0.97% (p=0.000 n=19+16)
Template-12 27.6MB/s ± 2% 27.7MB/s ± 1% ~ (p=0.071 n=19+19)
[Geo mean] 99.1MB/s 99.3MB/s +0.27%

Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6
Reviewed-on: https://go-review.googlesource.com/16042Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

b6c0934a

runtime: replace assist sleep loop with park/ready · 15aa6bbd

Austin Clements authored 9 years ago

GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.

Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.

No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)

Updates #12041.

Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
Reviewed-on: https://go-review.googlesource.com/15890Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

15aa6bbd

30 Oct, 2015 3 commits

runtime: perform mark 2 root re-scanning in GC workers · fbf27325

Austin Clements authored 9 years ago

This moves another root scanning task out of the GC coordinator and
parallelizes it on the GC workers.

This has negligible effect on the go1 benchmarks and the garbage
benchmark.

name old time/op new time/op delta
XBenchGarbage-12 5.24ms ± 1% 5.26ms ± 1% +0.30% (p=0.007 n=18+17)

name old time/op new time/op delta
BinaryTree17-12 3.20s ± 5% 3.21s ± 5% ~ (p=0.264 n=20+18)
Fannkuch11-12 2.46s ± 1% 2.54s ± 2% +3.09% (p=0.000 n=18+20)
FmtFprintfEmpty-12 49.9ns ± 4% 50.0ns ± 5% ~ (p=0.356 n=20+20)
FmtFprintfString-12 170ns ± 1% 170ns ± 2% ~ (p=0.815 n=19+20)
FmtFprintfInt-12 160ns ± 1% 159ns ± 1% -0.63% (p=0.003 n=18+19)
FmtFprintfIntInt-12 270ns ± 1% 267ns ± 1% -1.00% (p=0.000 n=19+18)
FmtFprintfPrefixedInt-12 238ns ± 1% 232ns ± 1% -2.28% (p=0.000 n=19+19)
FmtFprintfFloat-12 310ns ± 2% 313ns ± 2% +0.93% (p=0.000 n=19+19)
FmtManyArgs-12 1.06µs ± 1% 1.04µs ± 1% -1.93% (p=0.000 n=20+19)
GobDecode-12 8.63ms ± 1% 8.70ms ± 1% +0.81% (p=0.001 n=20+19)
GobEncode-12 6.52ms ± 1% 6.56ms ± 1% +0.66% (p=0.000 n=20+19)
Gzip-12 318ms ± 1% 319ms ± 1% ~ (p=0.405 n=17+18)
Gunzip-12 42.1ms ± 2% 42.0ms ± 1% ~ (p=0.771 n=20+19)
HTTPClientServer-12 62.6µs ± 1% 62.9µs ± 1% +0.41% (p=0.038 n=20+20)
JSONEncode-12 16.9ms ± 1% 16.9ms ± 1% ~ (p=0.077 n=18+20)
JSONDecode-12 60.7ms ± 1% 62.3ms ± 1% +2.73% (p=0.000 n=20+20)
Mandelbrot200-12 3.86ms ± 1% 3.85ms ± 1% ~ (p=0.084 n=19+20)
GoParse-12 3.75ms ± 2% 3.73ms ± 1% ~ (p=0.107 n=20+19)
RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% +0.97% (p=0.001 n=20+19)
RegexpMatchEasy0_1K-12 342ns ± 2% 332ns ± 2% -2.86% (p=0.000 n=19+19)
RegexpMatchEasy1_32-12 83.2ns ± 2% 82.8ns ± 2% ~ (p=0.108 n=19+20)
RegexpMatchEasy1_1K-12 495ns ± 2% 490ns ± 2% -1.04% (p=0.000 n=18+19)
RegexpMatchMedium_32-12 130ns ± 2% 131ns ± 2% ~ (p=0.291 n=20+20)
RegexpMatchMedium_1K-12 39.3µs ± 1% 39.9µs ± 1% +1.54% (p=0.000 n=18+20)
RegexpMatchHard_32-12 2.02µs ± 1% 2.05µs ± 2% +1.19% (p=0.000 n=19+19)
RegexpMatchHard_1K-12 60.9µs ± 1% 61.5µs ± 1% +0.99% (p=0.000 n=18+18)
Revcomp-12 535ms ± 1% 531ms ± 1% -0.82% (p=0.000 n=17+17)
Template-12 73.0ms ± 1% 74.1ms ± 1% +1.47% (p=0.000 n=20+20)
TimeParse-12 356ns ± 2% 348ns ± 1% -2.30% (p=0.000 n=20+20)
TimeFormat-12 347ns ± 1% 353ns ± 1% +1.68% (p=0.000 n=19+20)
[Geo mean] 62.3µs 62.4µs +0.12%

name old speed new speed delta
GobDecode-12 88.9MB/s ± 1% 88.2MB/s ± 1% -0.81% (p=0.001 n=20+19)
GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.66% (p=0.000 n=20+19)
Gzip-12 60.9MB/s ± 1% 60.8MB/s ± 1% ~ (p=0.409 n=17+18)
Gunzip-12 461MB/s ± 2% 462MB/s ± 1% ~ (p=0.765 n=20+19)
JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.078 n=18+20)
JSONDecode-12 32.0MB/s ± 1% 31.1MB/s ± 1% -2.65% (p=0.000 n=20+20)
GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 1% ~ (p=0.111 n=20+19)
RegexpMatchEasy0_32-12 318MB/s ± 2% 314MB/s ± 2% -1.27% (p=0.000 n=20+19)
RegexpMatchEasy0_1K-12 2.99GB/s ± 1% 3.08GB/s ± 2% +2.94% (p=0.000 n=19+19)
RegexpMatchEasy1_32-12 385MB/s ± 2% 386MB/s ± 2% ~ (p=0.105 n=19+20)
RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.09GB/s ± 2% +1.06% (p=0.000 n=18+19)
RegexpMatchMedium_32-12 7.64MB/s ± 2% 7.61MB/s ± 1% ~ (p=0.179 n=20+20)
RegexpMatchMedium_1K-12 26.1MB/s ± 1% 25.7MB/s ± 1% -1.52% (p=0.000 n=18+20)
RegexpMatchHard_32-12 15.8MB/s ± 1% 15.6MB/s ± 2% -1.18% (p=0.000 n=19+19)
RegexpMatchHard_1K-12 16.8MB/s ± 2% 16.6MB/s ± 1% -0.90% (p=0.000 n=19+18)
Revcomp-12 475MB/s ± 1% 479MB/s ± 1% +0.83% (p=0.000 n=17+17)
Template-12 26.6MB/s ± 1% 26.2MB/s ± 1% -1.45% (p=0.000 n=20+20)
[Geo mean] 99.0MB/s 98.7MB/s -0.32%

Change-Id: I6ea44d7a59aaa6851c64695277ab65645ff9d32e
Reviewed-on: https://go-review.googlesource.com/16070Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

fbf27325

runtime: perform concurrent scan in GC workers · 82d14d77

Austin Clements authored 9 years ago

Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.

This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.

This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.

This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).

This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.

name old time/op new time/op delta
XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18)

name old time/op new time/op delta
BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20)
Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18)
FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20)
FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19)
FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18)
FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19)
FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19)
FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20)
GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20)
Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17)
Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20)
HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20)
JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18)
JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20)
Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19)
GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20)
RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20)
RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19)
RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19)
RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18)
RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20)
RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18)
RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19)
RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18)
Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17)
Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20)
TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20)
TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19)
[Geo mean] 62.1µs 62.3µs +0.31%

name old speed new speed delta
GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20)
Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17)
Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20)
JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18)
JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20)
GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20)
RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20)
RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19)
RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19)
RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18)
RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20)
RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18)
RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19)
RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19)
Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17)
Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20)
[Geo mean] 100MB/s 99.0MB/s -0.82%

Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

82d14d77

runtime: consolidate "out of GC work" checks · 4cca1cc0

Austin Clements authored 9 years ago

We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.

In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.

Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
Reviewed-on: https://go-review.googlesource.com/16058Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

4cca1cc0

26 Oct, 2015 1 commit

runtime: partition data and BSS root marking · d3df04cd

Austin Clements authored 9 years ago

Currently data and BSS root marking are each a single markroot job.
This makes them difficult to load balance, which can draw out mark
termination time if they are large.

Fix this by splitting both in to 256K chunks. While we're putting in
the infrastructure for dynamic roots, we also replace the fixed
sharding of the span roots with sharding in to fixed sizes. In
addition to helping balance root marking, this also paves the way to
parallelizing concurrent scan and to letting assists help with root
marking.

Updates #10345. This fixes the data and BSS aspects of that bug; it
does not partition scanning of large heap objects.

This has negligible effect on either the go1 benchmarks or the garbage
benchmark:

name old time/op new time/op delta
XBenchGarbage-12 4.90ms ± 1% 4.91ms ± 2% ~ (p=0.058 n=17+16)

name old time/op new time/op delta
BinaryTree17-12 3.11s ± 4% 3.12s ± 4% ~ (p=0.512 n=20+20)
Fannkuch11-12 2.53s ± 2% 2.47s ± 2% -2.28% (p=0.000 n=20+18)
FmtFprintfEmpty-12 49.1ns ± 1% 50.0ns ± 4% +1.68% (p=0.008 n=18+20)
FmtFprintfString-12 170ns ± 0% 172ns ± 1% +1.05% (p=0.000 n=14+19)
FmtFprintfInt-12 174ns ± 1% 162ns ± 1% -6.81% (p=0.000 n=18+17)
FmtFprintfIntInt-12 284ns ± 1% 277ns ± 1% -2.42% (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12 252ns ± 1% 244ns ± 1% -2.84% (p=0.000 n=18+20)
FmtFprintfFloat-12 317ns ± 0% 311ns ± 0% -1.95% (p=0.000 n=19+18)
FmtManyArgs-12 1.08µs ± 1% 1.11µs ± 1% +3.43% (p=0.000 n=18+19)
GobDecode-12 8.56ms ± 1% 8.61ms ± 1% +0.50% (p=0.020 n=20+20)
GobEncode-12 6.58ms ± 1% 6.57ms ± 1% ~ (p=0.792 n=20+19)
Gzip-12 317ms ± 3% 317ms ± 2% ~ (p=0.840 n=19+19)
Gunzip-12 41.6ms ± 0% 41.6ms ± 0% +0.07% (p=0.027 n=18+15)
HTTPClientServer-12 62.2µs ± 1% 62.3µs ± 1% ~ (p=0.283 n=19+20)
JSONEncode-12 16.5ms ± 2% 16.5ms ± 1% ~ (p=0.857 n=20+19)
JSONDecode-12 58.5ms ± 1% 61.3ms ± 1% +4.67% (p=0.000 n=18+17)
Mandelbrot200-12 3.84ms ± 0% 3.84ms ± 0% ~ (p=0.259 n=17+17)
GoParse-12 3.70ms ± 2% 3.74ms ± 2% +0.96% (p=0.009 n=19+20)
RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 0% +0.31% (p=0.040 n=19+15)
RegexpMatchEasy0_1K-12 340ns ± 1% 340ns ± 1% ~ (p=0.411 n=17+19)
RegexpMatchEasy1_32-12 82.7ns ± 2% 82.3ns ± 1% ~ (p=0.456 n=20+19)
RegexpMatchEasy1_1K-12 498ns ± 2% 495ns ± 0% ~ (p=0.108 n=19+17)
RegexpMatchMedium_32-12 130ns ± 1% 130ns ± 2% ~ (p=0.405 n=18+19)
RegexpMatchMedium_1K-12 39.4µs ± 2% 39.1µs ± 1% -0.64% (p=0.002 n=20+19)
RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 0% ~ (p=0.561 n=20+17)
RegexpMatchHard_1K-12 61.1µs ± 2% 60.8µs ± 1% ~ (p=0.615 n=19+18)
Revcomp-12 532ms ± 2% 531ms ± 1% ~ (p=0.470 n=19+19)
Template-12 68.5ms ± 1% 69.1ms ± 1% +0.87% (p=0.000 n=17+17)
TimeParse-12 344ns ± 2% 344ns ± 1% +0.25% (p=0.032 n=19+18)
TimeFormat-12 347ns ± 1% 362ns ± 1% +4.27% (p=0.000 n=17+19)
[Geo mean] 62.3µs 62.3µs -0.04%

name old speed new speed delta
GobDecode-12 89.6MB/s ± 1% 89.2MB/s ± 1% -0.50% (p=0.019 n=20+20)
GobEncode-12 117MB/s ± 1% 117MB/s ± 1% ~ (p=0.797 n=20+19)
Gzip-12 61.3MB/s ± 3% 61.2MB/s ± 2% ~ (p=0.834 n=19+19)
Gunzip-12 467MB/s ± 0% 466MB/s ± 0% -0.07% (p=0.027 n=18+15)
JSONEncode-12 117MB/s ± 2% 117MB/s ± 1% ~ (p=0.851 n=20+19)
JSONDecode-12 33.2MB/s ± 1% 31.7MB/s ± 1% -4.47% (p=0.000 n=18+17)
GoParse-12 15.6MB/s ± 2% 15.5MB/s ± 2% -0.95% (p=0.008 n=19+20)
RegexpMatchEasy0_32-12 321MB/s ± 2% 320MB/s ± 1% -0.57% (p=0.002 n=17+17)
RegexpMatchEasy0_1K-12 3.01GB/s ± 1% 3.01GB/s ± 1% ~ (p=0.132 n=17+18)
RegexpMatchEasy1_32-12 387MB/s ± 2% 389MB/s ± 1% ~ (p=0.423 n=20+19)
RegexpMatchEasy1_1K-12 2.05GB/s ± 2% 2.06GB/s ± 0% ~ (p=0.129 n=19+17)
RegexpMatchMedium_32-12 7.64MB/s ± 1% 7.66MB/s ± 1% ~ (p=0.258 n=18+19)
RegexpMatchMedium_1K-12 26.0MB/s ± 2% 26.2MB/s ± 1% +0.64% (p=0.002 n=20+19)
RegexpMatchHard_32-12 15.7MB/s ± 2% 15.8MB/s ± 1% ~ (p=0.510 n=20+17)
RegexpMatchHard_1K-12 16.8MB/s ± 2% 16.8MB/s ± 1% ~ (p=0.603 n=19+18)
Revcomp-12 477MB/s ± 2% 479MB/s ± 1% ~ (p=0.470 n=19+19)
Template-12 28.3MB/s ± 1% 28.1MB/s ± 1% -0.85% (p=0.000 n=17+17)
[Geo mean] 100MB/s 100MB/s -0.26%

Change-Id: Ib0bfe0145675ce88c5a8791752f7486ac98805b4
Reviewed-on: https://go-review.googlesource.com/16043Reviewed-by: Rick Hudson <rlh@golang.org>

d3df04cd

21 Oct, 2015 2 commits

runtime: eliminate unused _GCstw phase · a42f6686

Austin Clements authored 9 years ago

Change-Id: Ie94cd17e1975fdaaa418fa6a7b2d3b164fedc135
Reviewed-on: https://go-review.googlesource.com/16057Reviewed-by: Rick Hudson <rlh@golang.org>

a42f6686

runtime: eliminate unnecessary ragged barrier · 28f458ce

Austin Clements authored 9 years ago

The ragged barrier after entering the concurrent mark phase is
vestigial. This used to be the point where we enabled write barriers,
so it was necessary to synchronize all Ps to ensure write barriers
were enabled before any marking occurred. However, we've long since
switched to enabling write barriers during the concurrent scan phase,
so the start-the-world at the beginning of the concurrent scan phase
ensures that all Ps have enabled the write barrier.

Hence, we can eliminate the old "install write barrier" phase.

Fixes #11971.

Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8
Reviewed-on: https://go-review.googlesource.com/16044Reviewed-by: Rick Hudson <rlh@golang.org>

28f458ce

19 Oct, 2015 3 commits

runtime: combine gcResetGState and gcResetMarkState · 3cd56b4d

Austin Clements authored 9 years ago

These functions are always called together and perform logically
related state resets, so combine them in to just gcResetMarkState.

Fixes #11427.

Change-Id: I06c17ef65f66186494887a767b3993126955b5fe
Reviewed-on: https://go-review.googlesource.com/16041Reviewed-by: Rick Hudson <rlh@golang.org>

3cd56b4d

runtime: consolidate gcResetGState calls · b0d5e5c5

Austin Clements authored 9 years ago

Currently gcResetGState is called by func gcscan_m for concurrent GC
and directly by func gc for STW GC. Simplify this by consolidating
these two calls in to one call by func gc above where it splits for
concurrent and STW GC.

As a consequence, gcResetGState and gcResetMarkState are always called
together, so the next commit will consolidate these.

Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0
Reviewed-on: https://go-review.googlesource.com/16040Reviewed-by: Rick Hudson <rlh@golang.org>

b0d5e5c5

runtime: remove work.partial queue · feb92a8e

Austin Clements authored 9 years ago

This work queue is no longer used (there are many reads of
work.partial, but the only write is in putpartial, which is never
called).

Fixes #11922.

Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a
Reviewed-on: https://go-review.googlesource.com/15892Reviewed-by: Rick Hudson <rlh@golang.org>

feb92a8e

16 Oct, 2015 1 commit

runtime, runtime/debug: access unexported runtime functions with //go:linkname, not assembly stubs · 42c7929c

Michael Hudson-Doyle authored 9 years ago

Change-Id: I88f80f5914d6e4c179f3d28aa59fc29b7ef0cc66
Reviewed-on: https://go-review.googlesource.com/15960Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

42c7929c

09 Oct, 2015 8 commits

runtime: assist before allocating · 65aa2da6

Austin Clements authored 9 years ago

Currently, when the mutator allocates, the runtime first allocates the
memory and then, if that G has done "enough" allocation, the runtime
checks whether the G has assist debt to pay off and, if so, pays it
off. This approach leads to under-assisting, where a G can allocate a
large region (or many small regions) before paying for it, or can even
exit with outstanding debt.

This commit flips this around so that a G always acquires enough
credit for an allocation before it can perform that allocation. We
continue to amortize the cost of assists by requiring that they
over-assist when triggered to build up credit for many allocations.

Fixes #11967.

Change-Id: Idac9f11133b328535667674d837be72c23ebd899
Reviewed-on: https://go-review.googlesource.com/15409Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>

65aa2da6

runtime: directly track GC assist balance · 89c341c5

Austin Clements authored 9 years ago

Currently we track the per-G GC assist balance as two monotonically
increasing values: the bytes allocated by the G this cycle (gcalloc)
and the scan work performed by the G this cycle (gcscanwork). The
assist balance is hence assistRatio*gcalloc - gcscanwork.

This works, but has two important downsides:

1) It requires floating-point math to figure out if a G is in debt or
not. This makes it inappropriate to check for assist debt in the
hot path of mallocgc, so we only do this when a G allocates a new
span. As a result, Gs can operate "in the red", leading to
under-assist and extended GC cycle length.

2) Revising the assist ratio during a GC cycle can lead to an "assist
burst". If you think of plotting the scan work performed versus
heaps size, the assist ratio controls the slope of this line.
However, in the current system, the target line always passes
through 0 at the heap size that triggered GC, so if the runtime
increases the assist ratio, there has to be a potentially large
assist to jump from the current amount of scan work up to the new
target scan work for the current heap size.

This commit replaces this approach with directly tracking the GC
assist balance in terms of allocation credit bytes. Allocating N bytes
simply decreases this by N and assisting raises it by the amount of
scan work performed divided by the assist ratio (to get back to
bytes).

This will make it cheap to figure out if a G is in debt, which will
let us efficiently check if an assist is necessary *before* performing
an allocation and hence keep Gs "in the black".

This also fixes assist bursts because the assist ratio is now in terms
of *remaining* work, rather than work from the beginning of the GC
cycle. Hence, the plot of scan work versus heap size becomes
continuous: we can revise the slope, but this slope always starts from
where we are right now, rather than where we were at the beginning of
the cycle.

Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c
Reviewed-on: https://go-review.googlesource.com/15408Reviewed-by: Rick Hudson <rlh@golang.org>

89c341c5

runtime: ensure minimum heap distance via heap goal · 9e77c898

Austin Clements authored 9 years ago

Currently we ensure a minimum heap distance of 1MB when computing the
assist ratio. Rather than enforcing this minimum on the heap distance,
it makes more sense to enforce that the heap goal itself is at least
1MB over the live heap size at the beginning of GC. Currently the two
approaches are semantically equivalent, but this will let us switch to
basing the assist ratio on current heap distance rather than the
initial heap distance, since we can't enforce this minimum on the
current heap distance (the GC may never finish because the goal posts
will always be 1MB away).

Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903
Reviewed-on: https://go-review.googlesource.com/15604Reviewed-by: Rick Hudson <rlh@golang.org>

9e77c898

runtime: update gcController.scanWork regularly · 8e8219de

Austin Clements authored 9 years ago

Currently, gcController.scanWork is updated as lazily as possible
since it is only read at the end of the GC cycle. We're about to read
it during the GC cycle to improve the assist ratio revisions, so
modify gcDrain* to regularly flush to gcController.scanWork in much
the same way as we regularly flush to gcController.bgScanCredit.

One consequence of this is that it's difficult to keep gcw.scanWork
monotonic, so we give up on that and simply return the amount of scan
work done by gcDrainN rather than calculating it in the caller.

Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
Reviewed-on: https://go-review.googlesource.com/15407Reviewed-by: Rick Hudson <rlh@golang.org>

8e8219de

runtime: control background scan credit flushing with flag · c18b163c

Austin Clements authored 9 years ago

Currently callers of gcDrain control whether it flushes scan work
credit to gcController.bgScanCredit by passing a value other than -1
for the flush threshold. Shortly we're going to make this always flush
scan work to gcController.scanWork and optionally also flush scan work
to gcController.bgScanCredit. This will be much easier if the flush
threshold is simply a constant (which it is in practice) and callers
merely control whether or not the flush includes the background
credit. Hence, replace the flush threshold argument with a flag.

Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e
Reviewed-on: https://go-review.googlesource.com/15406Reviewed-by: Rick Hudson <rlh@golang.org>

c18b163c

runtime: consolidate gcDrain and gcDrainUntilPreempt · 9b3cdaf0

Austin Clements authored 9 years ago

These functions were nearly identical. Consolidate them by adding a
flags argument. In addition to cleaning up this code, this makes
further changes that affect both functions easier.

Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f
Reviewed-on: https://go-review.googlesource.com/15405Reviewed-by: Rick Hudson <rlh@golang.org>

9b3cdaf0

runtime: explain why continuous assist revising is necessary · 39ed6822

Austin Clements authored 9 years ago

Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313
Reviewed-on: https://go-review.googlesource.com/15404Reviewed-by: Rick Hudson <rlh@golang.org>

39ed6822

runtime: fix comment for assistRatio · 3e57b17d

Austin Clements authored 9 years ago

The comment for assistRatio claimed it to be the reciprocal of what it
actually is.

Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8
Reviewed-on: https://go-review.googlesource.com/15402Reviewed-by: Russ Cox <rsc@golang.org>

3e57b17d

02 Oct, 2015 2 commits

runtime: remove sweep wait loop in finishsweep_m · 9a31d38f

Austin Clements authored 9 years ago

In general, finishsweep_m must block until any spans that are
concurrently being swept have been swept. It accomplishes this by
looping over all spans, which, as in the previous commit, takes
~1ms/heap GB. Unfortunately, we do this during the STW sweep
termination phase, so multi-gigabyte heaps can push our STW time past
10ms.

However, there's no need to do this wait if the world is stopped
because, in effect, stopping the world already had to wait for
anything that was sweeping (and if it didn't, the wait in
finishsweep_m would deadlock). Hence, we can simply skip this loop if
the world is stopped, such as during sweep termination. In fact,
currently all calls to finishsweep_m are STW, but this hasn't always
been the case and may not be the case in the future, so we keep the
logic around.

For 24GB heaps, this reduces max pause time by 75% relative to tip and
by 90% relative to Go 1.5. Notably, all pauses are now well under
10ms. Here are the results for the garbage benchmark:

------------- max pause ------------
Heap Procs after change before change 1.5.1
24GB 12 3.8ms 16ms 37ms
24GB 4 3.7ms 16ms 37ms
4GB 4 3.7ms 3ms 6.9ms

In the 4GB/4P case, it seems the "before change" run got lucky: the
max went up, but the 99%ile pause time went down from 3ms to 2.04ms.

Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31
Reviewed-on: https://go-review.googlesource.com/15071Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

9a31d38f

runtime: remove in-use page count loop from STW · dac220b0

Austin Clements authored 9 years ago

In order to compute the sweep ratio, the runtime needs to know how
many pages belong to spans in state _MSpanInUse. Currently it finds
this out by looping over all spans during mark termination. However,
this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our
STW time past 10ms.

Replace the loop with an actively maintained count of in-use pages.

For multi-gigabyte heaps, this reduces max mark termination pause time
by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This
shifts the longest pause time for large heaps to the sweep termination
phase, so it only slightly decreases max pause time, though it roughly
halves mean pause time. Here are the results for the garbage
benchmark:

               ---- max mark termination pause ----
Heap   Procs   after change   before change   1.5.1
24GB     12        1.9ms          18ms         37ms
24GB      4        3.7ms          18ms         37ms
 4GB      4        920µs         3.8ms        6.9ms

Fixes #11484.

Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac
Reviewed-on: https://go-review.googlesource.com/15070Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>

dac220b0