An error occurred fetching the project authors.
- 12 Nov, 2015 1 commit
-
-
Michael Matloob authored
runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817Reviewed-by:
Russ Cox <rsc@golang.org>
-
- 11 Nov, 2015 3 commits
-
-
Austin Clements authored
The GC now handles the root marking jobs as part of general marking, so work.markfor is no longer used. Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14 Reviewed-on: https://go-review.googlesource.com/16811 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:
Brad Fitzpatrick <bradfitz@golang.org>
-
Austin Clements authored
This changes "mark worker (idle)" to "GC worker (idle)" so it's more clear to users that these goroutines are GC-related. It changes "GC assist" to "GC assist wait" to make it clear that the assist is blocked. Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76 Reviewed-on: https://go-review.googlesource.com/16511Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
We couldn't do this before this point because it must be done before the next GC cycle starts. Hence, if it delayed the start of the next cycle, that would widen the window between reaching the heap trigger of the next cycle and starting the next GC cycle, during which the mutator could over-allocate. With the decentralized GC, any mutators that reach the heap trigger will block on the GC starting, so it's safe to widen the time between starting the world and being able to start the next GC cycle. Fixes #11465. Change-Id: Ic7ea7e9eba5b66fc050299f843a9c9001ad814aa Reviewed-on: https://go-review.googlesource.com/16394Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 10 Nov, 2015 1 commit
-
-
Michael Matloob authored
This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by:
Russ Cox <rsc@golang.org>
-
- 05 Nov, 2015 11 commits
-
-
Austin Clements authored
These are now unused. Updates #11970. Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f Reviewed-on: https://go-review.googlesource.com/16393Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
This moves all of the mark 1 to mark 2 transition and mark termination to the mark done transition function. This means these transitions are now handled on the goroutine that detected mark completion. This also means that the GC coordinator and the background completion barriers are no longer used and various workarounds to yield to the coordinator are no longer necessary. These will be removed in follow-up commits. One consequence of this is that mark workers now need to be preemptible when performing the mark done transition. This allows them to stop the world and to perform the final clean-up steps of GC after restarting the world. They are only made preemptible while performing this transition, so if the worker findRunnableGCWorker would schedule isn't available, we didn't want to schedule it anyway. Fixes #11970. Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837 Reviewed-on: https://go-review.googlesource.com/16391Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
Currently gcMarkDone takes basically no time, so it's okay to account the worker time after calling it. However, gcMarkDone is about to take potentially *much* longer because it may perform all of mark termination. Prepare for this by swapping the order so we account the time before calling gcMarkDone. Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb Reviewed-on: https://go-review.googlesource.com/16390Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
Currently the code for completion of mark 1/mark 2 is duplicated in background workers and assists. Factor this in to a single function that will serve as the transition function for concurrent mark. Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b Reviewed-on: https://go-review.googlesource.com/16359Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
Currently, findRunnableGCWorker will perform mark completion if there is no remaining work and no running workers. This used to be necessary to resolve a race in the transition from mark 1 to mark 2 where we would enter mark 2 with no mark work (and no dedicated workers), so no workers would run, so no worker would signal mark completion. However, we're about to make mark completion also perform the entire follow-on process, which includes mark termination. We really don't want to do that in the scheduler if it happens to detect completion. Conveniently, this hack is no longer necessary because we always enqueue root scanning work at the beginning of both mark 1 and mark 2, so a mark worker will always run. Hence, we can simply eliminate it. Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e Reviewed-on: https://go-review.googlesource.com/16358Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
This moves all of GC initialization, sweep termination, and the transition to concurrent marking in to the off->mark transition function. This means it's now handled on the goroutine that detected the state exit condition. As a result, malloc no longer needs to Gosched() at the beginning of the GC cycle to prevent over-allocation while the GC is starting up because it will now *help* the GC to start up. The Gosched hack is still necessary during GC shutdown (this is easy to test by enabling gctrace and hitting Ctrl-S to block the gctrace output). At this point, the GC coordinator still handles later phases. This requires a small tweak to how we start the GC coordinator. Currently, starting the GC coordinator is best-effort and may fail if the coordinator is about to park from the previous cycle but hasn't yet. We fix this by replacing the park/ready to wake up the coordinator with a semaphore. This is temporary since the coordinator will be going away in a few commits. Updates #11970. Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe Reviewed-on: https://go-review.googlesource.com/16357Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
This moves concurrent sweep termination from the coordinator to the off->mark transition. This allows it to be performed by all Gs attempting to start the GC. Updates #11970. Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947 Reviewed-on: https://go-review.googlesource.com/16356Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
This begins the conversion of the centralized GC coordinator to a decentralized state machine by introducing the internal API that triggers the first state transition from _GCoff to _GCmark (or _GCmarktermination). This change introduces the transition lock, the off->mark transition condition (which is very similar to shouldtriggergc()), and the general structure of a state transition. Since we're doing this conversion in stages, it then falls back to the GC coordinator to actually execute the cycle. We'll start moving logic out of the GC coordinator and in to transition functions next. This fixes a minor bug in gcstoptheworld debug mode where passing the heap trigger once could trigger multiple STW GCs. Updates #11970. Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f Reviewed-on: https://go-review.googlesource.com/16355Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>
-
Austin Clements authored
For historical reasons we currently do a lot of the concurrent mark setup on the system stack. In fact, at this point the one and only thing that needs to happen on the system stack is the start-the-world. Clean up this code by lifting everything other than the start-the-world off the system stack. The diff for this change looks large, but the only code change is to narrow the systemstack call. Everything else is re-indentation. Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce Reviewed-on: https://go-review.googlesource.com/16354Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
We're about to split func gc across several functions, so lift the local variables it uses for tracking statistics and state across the cycle into the global "work" variable. Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37 Reviewed-on: https://go-review.googlesource.com/16353Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
Change-Id: I91cda8d88b0852cd0f868d33c594206bcca0c386 Reviewed-on: https://go-review.googlesource.com/16352Reviewed-by:
Rick Hudson <rlh@golang.org>
-
- 04 Nov, 2015 2 commits
-
-
Austin Clements authored
Currently we depend on the good graces and timing of the scheduler to get opportunities to start dedicated mark workers. In the worst case, it may take 10ms to get dedicated mark workers going at the beginning of mark 1 and mark 2 or after the amount of available work has dropped and gone back up. Instead of waiting for the regular preemption logic to get around to us, make putfull enlist a random P if we're not already running enough dedicated workers. This should improve performance stability of the garbage collector and is likely to improve the overall performance somewhat. No overall effect on the go1 benchmarks. It speeds up the garbage benchmark by 12%, which more than counters the performance loss from the previous commit. name old time/op new time/op delta XBenchGarbage-12 6.32ms ± 4% 5.58ms ± 2% -11.68% (p=0.000 n=20+16) name old time/op new time/op delta BinaryTree17-12 3.18s ± 5% 3.12s ± 4% -1.83% (p=0.021 n=20+20) Fannkuch11-12 2.50s ± 2% 2.46s ± 2% -1.57% (p=0.000 n=18+19) FmtFprintfEmpty-12 50.8ns ± 3% 50.4ns ± 3% ~ (p=0.184 n=20+20) FmtFprintfString-12 167ns ± 2% 171ns ± 1% +2.46% (p=0.000 n=20+19) FmtFprintfInt-12 161ns ± 2% 163ns ± 2% +1.81% (p=0.000 n=20+20) FmtFprintfIntInt-12 269ns ± 1% 266ns ± 1% -0.81% (p=0.002 n=19+20) FmtFprintfPrefixedInt-12 237ns ± 2% 231ns ± 2% -2.86% (p=0.000 n=20+20) FmtFprintfFloat-12 313ns ± 2% 313ns ± 1% ~ (p=0.681 n=20+20) FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 1% -2.26% (p=0.000 n=20+20) GobDecode-12 8.66ms ± 1% 8.67ms ± 1% ~ (p=0.380 n=19+20) GobEncode-12 6.56ms ± 1% 6.56ms ± 2% ~ (p=0.607 n=19+20) Gzip-12 317ms ± 1% 314ms ± 2% -1.10% (p=0.000 n=20+19) Gunzip-12 42.1ms ± 1% 42.2ms ± 1% +0.27% (p=0.044 n=20+19) HTTPClientServer-12 62.7µs ± 1% 62.0µs ± 1% -1.04% (p=0.000 n=19+18) JSONEncode-12 16.7ms ± 1% 16.8ms ± 2% +0.59% (p=0.021 n=20+20) JSONDecode-12 58.2ms ± 1% 61.4ms ± 2% +5.43% (p=0.000 n=18+19) Mandelbrot200-12 3.84ms ± 1% 3.87ms ± 2% +0.79% (p=0.008 n=18+20) GoParse-12 3.86ms ± 2% 3.76ms ± 2% -2.60% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 100ns ± 2% 100ns ± 1% -0.68% (p=0.005 n=18+15) RegexpMatchEasy0_1K-12 332ns ± 1% 342ns ± 1% +3.16% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 82.9ns ± 3% 83.0ns ± 2% ~ (p=0.906 n=19+20) RegexpMatchEasy1_1K-12 487ns ± 1% 494ns ± 1% +1.50% (p=0.000 n=17+20) RegexpMatchMedium_32-12 131ns ± 2% 130ns ± 1% ~ (p=0.686 n=19+20) RegexpMatchMedium_1K-12 39.6µs ± 1% 39.2µs ± 1% -1.09% (p=0.000 n=18+19) RegexpMatchHard_32-12 2.04µs ± 1% 2.04µs ± 2% ~ (p=0.804 n=20+20) RegexpMatchHard_1K-12 61.7µs ± 2% 61.3µs ± 2% ~ (p=0.052 n=18+20) Revcomp-12 529ms ± 2% 533ms ± 1% +0.83% (p=0.003 n=20+19) Template-12 70.7ms ± 2% 71.0ms ± 2% ~ (p=0.065 n=20+19) TimeParse-12 351ns ± 2% 355ns ± 1% +1.25% (p=0.000 n=19+20) TimeFormat-12 362ns ± 2% 373ns ± 1% +2.83% (p=0.000 n=18+20) [Geo mean] 62.2µs 62.3µs +0.13% name old speed new speed delta GobDecode-12 88.6MB/s ± 1% 88.5MB/s ± 1% ~ (p=0.392 n=19+20) GobEncode-12 117MB/s ± 1% 117MB/s ± 1% ~ (p=0.622 n=19+20) Gzip-12 61.1MB/s ± 1% 61.8MB/s ± 2% +1.11% (p=0.000 n=20+19) Gunzip-12 461MB/s ± 1% 460MB/s ± 1% -0.27% (p=0.044 n=20+19) JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% -0.58% (p=0.022 n=20+20) JSONDecode-12 33.3MB/s ± 1% 31.6MB/s ± 2% -5.15% (p=0.000 n=18+19) GoParse-12 15.0MB/s ± 2% 15.4MB/s ± 2% +2.66% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 317MB/s ± 2% 319MB/s ± 2% ~ (p=0.052 n=20+20) RegexpMatchEasy0_1K-12 3.08GB/s ± 1% 2.99GB/s ± 1% -3.07% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 386MB/s ± 3% 386MB/s ± 2% ~ (p=0.939 n=19+20) RegexpMatchEasy1_1K-12 2.10GB/s ± 1% 2.07GB/s ± 1% -1.46% (p=0.000 n=17+20) RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.64MB/s ± 1% ~ (p=0.702 n=19+20) RegexpMatchMedium_1K-12 25.9MB/s ± 1% 26.1MB/s ± 2% +0.99% (p=0.000 n=18+20) RegexpMatchHard_32-12 15.7MB/s ± 1% 15.7MB/s ± 2% ~ (p=0.723 n=20+20) RegexpMatchHard_1K-12 16.6MB/s ± 2% 16.7MB/s ± 2% ~ (p=0.052 n=18+20) Revcomp-12 481MB/s ± 2% 477MB/s ± 1% -0.83% (p=0.003 n=20+19) Template-12 27.5MB/s ± 2% 27.3MB/s ± 2% ~ (p=0.062 n=20+19) [Geo mean] 99.4MB/s 99.1MB/s -0.35% Change-Id: I914d8cadded5a230509d118164a4c201601afc06 Reviewed-on: https://go-review.googlesource.com/16298Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Currently dedicated mark workers participate in the getfull barrier during concurrent mark. However, the getfull barrier wasn't designed for concurrent work and this causes no end of headaches. In the concurrent setting, participants come and go. This makes mark completion susceptible to live-lock: since dedicated workers are only periodically polling for completion, it's possible for the program to be in some transient worker each time one of the dedicated workers wakes up to check if it can exit the getfull barrier. It also complicates reasoning about the system because dedicated workers participate directly in the getfull barrier, but transient workers must instead use trygetfull because they have exit conditions that aren't captured by getfull (e.g., fractional workers exit when preempted). The complexity of implementing these exit conditions contributed to #11677. Furthermore, the getfull barrier is inefficient because we could be running user code instead of spinning on a P. In effect, we're dedicating 25% of the CPU to marking even if that means we have to spin to make that 25%. It also causes issues on Windows because we can't actually sleep for 100µs (#8687). Fix this by making dedicated workers no longer participate in the getfull barrier. Instead, dedicated workers simply return to the scheduler when they fail to get more work, regardless of what others workers are doing, and the scheduler only starts new dedicated workers if there's work available. Everything that needs to be handled by this barrier is already handled by detection of mark completion. This makes the system much more symmetric because all workers and assists now use trygetfull during concurrent mark. It also loosens the 25% CPU target so that we can give some of that 25% back to user code if there isn't enough work to keep the mark worker busy. And it eliminates the problematic 100µs sleep on Windows during concurrent mark (though not during mark termination). The downside of this is that if we hit a bottleneck in the heap graph that then expands back out, the system may shut down dedicated workers and take a while to start them back up. We'll address this in the next commit. Updates #12041 and #8687. No effect on the go1 benchmarks. This slows down the garbage benchmark by 9%, but we'll more than make it up in the next commit. name old time/op new time/op delta XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20) Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b Reviewed-on: https://go-review.googlesource.com/16297Reviewed-by:
Rick Hudson <rlh@golang.org>
-
- 03 Nov, 2015 2 commits
-
-
Austin Clements authored
Currently the gcWork abstraction caches a single work buffer. As a result, if a worker is putting and getting pointers right at the boundary of a work buffer, it can flap between work buffers and (potentially significantly) increase contention on the global work buffer lists. This change modifies gcWork to instead cache two work buffers and switch off between them. This introduces one buffers' worth of hysteresis and eliminates the above performance worst case by amortizing the cost of getting or putting a work buffer over at least one buffers' worth of work. In practice, it's difficult to trigger this worst case with reasonably large work buffers. On the garbage benchmark, this reduces the max writes/sec to the global work list from 32K to 25K and the median from 6K to 5K. However, if a workload were to trigger this worst case behavior, it could significantly drive up this contention. This has negligible effects on the go1 benchmarks and slightly speeds up the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.90ms ± 3% 5.83ms ± 4% -1.18% (p=0.011 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.22s ± 4% 3.17s ± 3% -1.57% (p=0.009 n=19+20) Fannkuch11-12 2.44s ± 1% 2.53s ± 4% +3.78% (p=0.000 n=18+19) FmtFprintfEmpty-12 50.2ns ± 2% 50.5ns ± 5% ~ (p=0.631 n=19+20) FmtFprintfString-12 167ns ± 1% 166ns ± 1% ~ (p=0.141 n=20+20) FmtFprintfInt-12 162ns ± 1% 159ns ± 1% -1.80% (p=0.000 n=20+20) FmtFprintfIntInt-12 277ns ± 2% 263ns ± 1% -4.78% (p=0.000 n=20+18) FmtFprintfPrefixedInt-12 240ns ± 1% 232ns ± 2% -3.25% (p=0.000 n=20+20) FmtFprintfFloat-12 311ns ± 1% 315ns ± 2% +1.17% (p=0.000 n=20+20) FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 2% -1.72% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.71ms ± 2% +0.68% (p=0.001 n=19+20) GobEncode-12 6.51ms ± 1% 6.54ms ± 1% +0.42% (p=0.047 n=20+19) Gzip-12 318ms ± 2% 315ms ± 2% -1.20% (p=0.000 n=19+19) Gunzip-12 42.2ms ± 2% 42.1ms ± 1% ~ (p=0.667 n=20+19) HTTPClientServer-12 62.5µs ± 1% 62.4µs ± 1% ~ (p=0.110 n=20+18) JSONEncode-12 16.8ms ± 1% 16.8ms ± 2% ~ (p=0.569 n=19+20) JSONDecode-12 60.8ms ± 2% 59.8ms ± 1% -1.69% (p=0.000 n=19+19) Mandelbrot200-12 3.87ms ± 1% 3.85ms ± 0% -0.61% (p=0.001 n=20+17) GoParse-12 3.76ms ± 2% 3.76ms ± 1% ~ (p=0.698 n=20+20) RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% ~ (p=0.065 n=19+20) RegexpMatchEasy0_1K-12 342ns ± 2% 333ns ± 1% -2.82% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 83.3ns ± 2% 83.2ns ± 2% ~ (p=0.692 n=20+19) RegexpMatchEasy1_1K-12 498ns ± 2% 490ns ± 1% -1.52% (p=0.000 n=18+20) RegexpMatchMedium_32-12 131ns ± 2% 131ns ± 2% ~ (p=0.464 n=20+18) RegexpMatchMedium_1K-12 39.3µs ± 2% 39.6µs ± 1% +0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 2.04µs ± 2% 2.06µs ± 1% +0.69% (p=0.009 n=19+20) RegexpMatchHard_1K-12 61.4µs ± 2% 62.1µs ± 1% +1.21% (p=0.000 n=19+20) Revcomp-12 534ms ± 1% 529ms ± 1% -0.97% (p=0.000 n=19+16) Template-12 70.4ms ± 2% 70.0ms ± 1% ~ (p=0.070 n=19+19) TimeParse-12 359ns ± 3% 344ns ± 1% -4.15% (p=0.000 n=19+19) TimeFormat-12 357ns ± 1% 361ns ± 2% +1.05% (p=0.002 n=20+20) [Geo mean] 62.4µs 62.0µs -0.56% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.1MB/s ± 2% -0.68% (p=0.001 n=19+20) GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.42% (p=0.046 n=20+19) Gzip-12 60.9MB/s ± 2% 61.7MB/s ± 2% +1.21% (p=0.000 n=19+19) Gunzip-12 460MB/s ± 2% 461MB/s ± 1% ~ (p=0.661 n=20+19) JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% ~ (p=0.555 n=19+20) JSONDecode-12 31.9MB/s ± 2% 32.5MB/s ± 1% +1.72% (p=0.000 n=19+19) GoParse-12 15.4MB/s ± 2% 15.4MB/s ± 1% ~ (p=0.653 n=20+20) RegexpMatchEasy0_32-12 317MB/s ± 2% 315MB/s ± 2% ~ (p=0.141 n=19+20) RegexpMatchEasy0_1K-12 2.99GB/s ± 2% 3.07GB/s ± 1% +2.86% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 384MB/s ± 2% 385MB/s ± 2% ~ (p=0.672 n=20+19) RegexpMatchEasy1_1K-12 2.06GB/s ± 2% 2.09GB/s ± 1% +1.54% (p=0.000 n=18+20) RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.63MB/s ± 2% ~ (p=0.800 n=20+18) RegexpMatchMedium_1K-12 26.0MB/s ± 1% 25.8MB/s ± 1% -0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 15.7MB/s ± 2% 15.6MB/s ± 1% -0.69% (p=0.010 n=19+20) RegexpMatchHard_1K-12 16.7MB/s ± 2% 16.5MB/s ± 1% -1.19% (p=0.000 n=19+20) Revcomp-12 476MB/s ± 1% 481MB/s ± 1% +0.97% (p=0.000 n=19+16) Template-12 27.6MB/s ± 2% 27.7MB/s ± 1% ~ (p=0.071 n=19+19) [Geo mean] 99.1MB/s 99.3MB/s +0.27% Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6 Reviewed-on: https://go-review.googlesource.com/16042Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
GC assists must block until the assist can be satisfied (either through stealing credit or doing work) or the GC cycle ends. Currently, this is implemented as a retry loop with a 100 µs delay. This obviously isn't ideal, as it wastes CPU and delays mutator execution. It also has the somewhat peculiar downside that sleeping a G requires allocation, and this requires working around recursive allocation. Replace this timed delay with a proper scheduling queue. When an assist can't be satisfied immediately, it adds the allocating G to a queue and parks it. Any time background scan credit is flushed, it consults this queue, directly satisfies the debt of queued assists, and wakes up satisfied assists before flushing any remaining credit to the background credit pool. No effect on the go1 benchmarks. Slightly speeds up the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.81ms ± 1% 5.72ms ± 4% -1.65% (p=0.011 n=20+20) Updates #12041. Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7 Reviewed-on: https://go-review.googlesource.com/15890Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 30 Oct, 2015 3 commits
-
-
Austin Clements authored
This moves another root scanning task out of the GC coordinator and parallelizes it on the GC workers. This has negligible effect on the go1 benchmarks and the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.24ms ± 1% 5.26ms ± 1% +0.30% (p=0.007 n=18+17) name old time/op new time/op delta BinaryTree17-12 3.20s ± 5% 3.21s ± 5% ~ (p=0.264 n=20+18) Fannkuch11-12 2.46s ± 1% 2.54s ± 2% +3.09% (p=0.000 n=18+20) FmtFprintfEmpty-12 49.9ns ± 4% 50.0ns ± 5% ~ (p=0.356 n=20+20) FmtFprintfString-12 170ns ± 1% 170ns ± 2% ~ (p=0.815 n=19+20) FmtFprintfInt-12 160ns ± 1% 159ns ± 1% -0.63% (p=0.003 n=18+19) FmtFprintfIntInt-12 270ns ± 1% 267ns ± 1% -1.00% (p=0.000 n=19+18) FmtFprintfPrefixedInt-12 238ns ± 1% 232ns ± 1% -2.28% (p=0.000 n=19+19) FmtFprintfFloat-12 310ns ± 2% 313ns ± 2% +0.93% (p=0.000 n=19+19) FmtManyArgs-12 1.06µs ± 1% 1.04µs ± 1% -1.93% (p=0.000 n=20+19) GobDecode-12 8.63ms ± 1% 8.70ms ± 1% +0.81% (p=0.001 n=20+19) GobEncode-12 6.52ms ± 1% 6.56ms ± 1% +0.66% (p=0.000 n=20+19) Gzip-12 318ms ± 1% 319ms ± 1% ~ (p=0.405 n=17+18) Gunzip-12 42.1ms ± 2% 42.0ms ± 1% ~ (p=0.771 n=20+19) HTTPClientServer-12 62.6µs ± 1% 62.9µs ± 1% +0.41% (p=0.038 n=20+20) JSONEncode-12 16.9ms ± 1% 16.9ms ± 1% ~ (p=0.077 n=18+20) JSONDecode-12 60.7ms ± 1% 62.3ms ± 1% +2.73% (p=0.000 n=20+20) Mandelbrot200-12 3.86ms ± 1% 3.85ms ± 1% ~ (p=0.084 n=19+20) GoParse-12 3.75ms ± 2% 3.73ms ± 1% ~ (p=0.107 n=20+19) RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% +0.97% (p=0.001 n=20+19) RegexpMatchEasy0_1K-12 342ns ± 2% 332ns ± 2% -2.86% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 83.2ns ± 2% 82.8ns ± 2% ~ (p=0.108 n=19+20) RegexpMatchEasy1_1K-12 495ns ± 2% 490ns ± 2% -1.04% (p=0.000 n=18+19) RegexpMatchMedium_32-12 130ns ± 2% 131ns ± 2% ~ (p=0.291 n=20+20) RegexpMatchMedium_1K-12 39.3µs ± 1% 39.9µs ± 1% +1.54% (p=0.000 n=18+20) RegexpMatchHard_32-12 2.02µs ± 1% 2.05µs ± 2% +1.19% (p=0.000 n=19+19) RegexpMatchHard_1K-12 60.9µs ± 1% 61.5µs ± 1% +0.99% (p=0.000 n=18+18) Revcomp-12 535ms ± 1% 531ms ± 1% -0.82% (p=0.000 n=17+17) Template-12 73.0ms ± 1% 74.1ms ± 1% +1.47% (p=0.000 n=20+20) TimeParse-12 356ns ± 2% 348ns ± 1% -2.30% (p=0.000 n=20+20) TimeFormat-12 347ns ± 1% 353ns ± 1% +1.68% (p=0.000 n=19+20) [Geo mean] 62.3µs 62.4µs +0.12% name old speed new speed delta GobDecode-12 88.9MB/s ± 1% 88.2MB/s ± 1% -0.81% (p=0.001 n=20+19) GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.66% (p=0.000 n=20+19) Gzip-12 60.9MB/s ± 1% 60.8MB/s ± 1% ~ (p=0.409 n=17+18) Gunzip-12 461MB/s ± 2% 462MB/s ± 1% ~ (p=0.765 n=20+19) JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.078 n=18+20) JSONDecode-12 32.0MB/s ± 1% 31.1MB/s ± 1% -2.65% (p=0.000 n=20+20) GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 1% ~ (p=0.111 n=20+19) RegexpMatchEasy0_32-12 318MB/s ± 2% 314MB/s ± 2% -1.27% (p=0.000 n=20+19) RegexpMatchEasy0_1K-12 2.99GB/s ± 1% 3.08GB/s ± 2% +2.94% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 385MB/s ± 2% 386MB/s ± 2% ~ (p=0.105 n=19+20) RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.09GB/s ± 2% +1.06% (p=0.000 n=18+19) RegexpMatchMedium_32-12 7.64MB/s ± 2% 7.61MB/s ± 1% ~ (p=0.179 n=20+20) RegexpMatchMedium_1K-12 26.1MB/s ± 1% 25.7MB/s ± 1% -1.52% (p=0.000 n=18+20) RegexpMatchHard_32-12 15.8MB/s ± 1% 15.6MB/s ± 2% -1.18% (p=0.000 n=19+19) RegexpMatchHard_1K-12 16.8MB/s ± 2% 16.6MB/s ± 1% -0.90% (p=0.000 n=19+18) Revcomp-12 475MB/s ± 1% 479MB/s ± 1% +0.83% (p=0.000 n=17+17) Template-12 26.6MB/s ± 1% 26.2MB/s ± 1% -1.45% (p=0.000 n=20+20) [Geo mean] 99.0MB/s 98.7MB/s -0.32% Change-Id: I6ea44d7a59aaa6851c64695277ab65645ff9d32e Reviewed-on: https://go-review.googlesource.com/16070Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>
-
Austin Clements authored
Currently the concurrent root scan is performed in its entirety by the GC coordinator before entering concurrent mark (which enables GC workers). This scan is done sequentially, which can prolong the scan phase, delay the mark phase, and means that the scan phase does not obey the 25% CPU goal. Furthermore, there's no need to complete the root scan before starting marking (in fact, we already allow GC assists to happen during the scan phase), so this acts as an unnecessary barrier between root scanning and marking. This change shifts the root scan work out of the GC coordinator and in to the GC workers. The coordinator simply sets up the scan state and enqueues the right number of root scan jobs. The GC workers then drain the root scan jobs prior to draining heap scan jobs. This parallelizes the root scan process, makes it obey the 25% CPU goal, and effectively eliminates root scanning as an isolated phase, allowing the system to smoothly transition from root scanning to heap marking. This also eliminates a major non-STW responsibility of the GC coordinator, which will make it easier to switch to a decentralized state machine. Finally, it puts us in a good position to perform root scanning in assists as well, which will help satisfy assists at the beginning of the GC cycle. This is mostly straightforward. One tricky aspect is that we have to deal with preemption deadlock: where two non-preemptible gorountines are trying to preempt each other to perform a stack scan. Given the context where this happens, the only instance of this is two background workers trying to scan each other. We avoid this by simply not scanning the stacks of background workers during the concurrent phase; this is safe because we'll scan them during mark termination (and their stacks are *very* small and should not contain any new pointers). This change also switches the root marking during mark termination to use the same gcDrain-based code path as concurrent mark. This shouldn't affect performance because STW root marking was already parallel and tasks switched to heap marking immediately when no more root marking tasks were available. However, it simplifies the code and unifies these code paths. This has negligible effect on the go1 benchmarks. It slightly slows down the garbage benchmark, possibly by making GC run slightly more frequently. name old time/op new time/op delta XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20) Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18) FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20) FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19) FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18) FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19) FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19) FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19) FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20) GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20) Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17) Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20) HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20) JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18) JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20) Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19) GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20) RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19) RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19) RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18) RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20) RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18) RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19) RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18) Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17) Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20) TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20) TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19) [Geo mean] 62.1µs 62.3µs +0.31% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20) GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20) Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17) Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20) JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18) JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20) GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20) RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20) RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19) RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19) RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18) RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20) RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18) RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19) RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19) Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17) Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20) [Geo mean] 100MB/s 99.0MB/s -0.82% Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0 Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
We already have gcMarkWorkAvailable, but the check for GC mark work is open-coded in several places. Generalize gcMarkWorkAvailable slightly and replace these open-coded checks with calls to gcMarkWorkAvailable. In addition to cleaning up the code, this puts us in a better position to make this check slightly more complicated. Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860 Reviewed-on: https://go-review.googlesource.com/16058Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 26 Oct, 2015 1 commit
-
-
Austin Clements authored
Currently data and BSS root marking are each a single markroot job. This makes them difficult to load balance, which can draw out mark termination time if they are large. Fix this by splitting both in to 256K chunks. While we're putting in the infrastructure for dynamic roots, we also replace the fixed sharding of the span roots with sharding in to fixed sizes. In addition to helping balance root marking, this also paves the way to parallelizing concurrent scan and to letting assists help with root marking. Updates #10345. This fixes the data and BSS aspects of that bug; it does not partition scanning of large heap objects. This has negligible effect on either the go1 benchmarks or the garbage benchmark: name old time/op new time/op delta XBenchGarbage-12 4.90ms ± 1% 4.91ms ± 2% ~ (p=0.058 n=17+16) name old time/op new time/op delta BinaryTree17-12 3.11s ± 4% 3.12s ± 4% ~ (p=0.512 n=20+20) Fannkuch11-12 2.53s ± 2% 2.47s ± 2% -2.28% (p=0.000 n=20+18) FmtFprintfEmpty-12 49.1ns ± 1% 50.0ns ± 4% +1.68% (p=0.008 n=18+20) FmtFprintfString-12 170ns ± 0% 172ns ± 1% +1.05% (p=0.000 n=14+19) FmtFprintfInt-12 174ns ± 1% 162ns ± 1% -6.81% (p=0.000 n=18+17) FmtFprintfIntInt-12 284ns ± 1% 277ns ± 1% -2.42% (p=0.000 n=20+19) FmtFprintfPrefixedInt-12 252ns ± 1% 244ns ± 1% -2.84% (p=0.000 n=18+20) FmtFprintfFloat-12 317ns ± 0% 311ns ± 0% -1.95% (p=0.000 n=19+18) FmtManyArgs-12 1.08µs ± 1% 1.11µs ± 1% +3.43% (p=0.000 n=18+19) GobDecode-12 8.56ms ± 1% 8.61ms ± 1% +0.50% (p=0.020 n=20+20) GobEncode-12 6.58ms ± 1% 6.57ms ± 1% ~ (p=0.792 n=20+19) Gzip-12 317ms ± 3% 317ms ± 2% ~ (p=0.840 n=19+19) Gunzip-12 41.6ms ± 0% 41.6ms ± 0% +0.07% (p=0.027 n=18+15) HTTPClientServer-12 62.2µs ± 1% 62.3µs ± 1% ~ (p=0.283 n=19+20) JSONEncode-12 16.5ms ± 2% 16.5ms ± 1% ~ (p=0.857 n=20+19) JSONDecode-12 58.5ms ± 1% 61.3ms ± 1% +4.67% (p=0.000 n=18+17) Mandelbrot200-12 3.84ms ± 0% 3.84ms ± 0% ~ (p=0.259 n=17+17) GoParse-12 3.70ms ± 2% 3.74ms ± 2% +0.96% (p=0.009 n=19+20) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 0% +0.31% (p=0.040 n=19+15) RegexpMatchEasy0_1K-12 340ns ± 1% 340ns ± 1% ~ (p=0.411 n=17+19) RegexpMatchEasy1_32-12 82.7ns ± 2% 82.3ns ± 1% ~ (p=0.456 n=20+19) RegexpMatchEasy1_1K-12 498ns ± 2% 495ns ± 0% ~ (p=0.108 n=19+17) RegexpMatchMedium_32-12 130ns ± 1% 130ns ± 2% ~ (p=0.405 n=18+19) RegexpMatchMedium_1K-12 39.4µs ± 2% 39.1µs ± 1% -0.64% (p=0.002 n=20+19) RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 0% ~ (p=0.561 n=20+17) RegexpMatchHard_1K-12 61.1µs ± 2% 60.8µs ± 1% ~ (p=0.615 n=19+18) Revcomp-12 532ms ± 2% 531ms ± 1% ~ (p=0.470 n=19+19) Template-12 68.5ms ± 1% 69.1ms ± 1% +0.87% (p=0.000 n=17+17) TimeParse-12 344ns ± 2% 344ns ± 1% +0.25% (p=0.032 n=19+18) TimeFormat-12 347ns ± 1% 362ns ± 1% +4.27% (p=0.000 n=17+19) [Geo mean] 62.3µs 62.3µs -0.04% name old speed new speed delta GobDecode-12 89.6MB/s ± 1% 89.2MB/s ± 1% -0.50% (p=0.019 n=20+20) GobEncode-12 117MB/s ± 1% 117MB/s ± 1% ~ (p=0.797 n=20+19) Gzip-12 61.3MB/s ± 3% 61.2MB/s ± 2% ~ (p=0.834 n=19+19) Gunzip-12 467MB/s ± 0% 466MB/s ± 0% -0.07% (p=0.027 n=18+15) JSONEncode-12 117MB/s ± 2% 117MB/s ± 1% ~ (p=0.851 n=20+19) JSONDecode-12 33.2MB/s ± 1% 31.7MB/s ± 1% -4.47% (p=0.000 n=18+17) GoParse-12 15.6MB/s ± 2% 15.5MB/s ± 2% -0.95% (p=0.008 n=19+20) RegexpMatchEasy0_32-12 321MB/s ± 2% 320MB/s ± 1% -0.57% (p=0.002 n=17+17) RegexpMatchEasy0_1K-12 3.01GB/s ± 1% 3.01GB/s ± 1% ~ (p=0.132 n=17+18) RegexpMatchEasy1_32-12 387MB/s ± 2% 389MB/s ± 1% ~ (p=0.423 n=20+19) RegexpMatchEasy1_1K-12 2.05GB/s ± 2% 2.06GB/s ± 0% ~ (p=0.129 n=19+17) RegexpMatchMedium_32-12 7.64MB/s ± 1% 7.66MB/s ± 1% ~ (p=0.258 n=18+19) RegexpMatchMedium_1K-12 26.0MB/s ± 2% 26.2MB/s ± 1% +0.64% (p=0.002 n=20+19) RegexpMatchHard_32-12 15.7MB/s ± 2% 15.8MB/s ± 1% ~ (p=0.510 n=20+17) RegexpMatchHard_1K-12 16.8MB/s ± 2% 16.8MB/s ± 1% ~ (p=0.603 n=19+18) Revcomp-12 477MB/s ± 2% 479MB/s ± 1% ~ (p=0.470 n=19+19) Template-12 28.3MB/s ± 1% 28.1MB/s ± 1% -0.85% (p=0.000 n=17+17) [Geo mean] 100MB/s 100MB/s -0.26% Change-Id: Ib0bfe0145675ce88c5a8791752f7486ac98805b4 Reviewed-on: https://go-review.googlesource.com/16043Reviewed-by:
Rick Hudson <rlh@golang.org>
-
- 21 Oct, 2015 2 commits
-
-
Austin Clements authored
Change-Id: Ie94cd17e1975fdaaa418fa6a7b2d3b164fedc135 Reviewed-on: https://go-review.googlesource.com/16057Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
The ragged barrier after entering the concurrent mark phase is vestigial. This used to be the point where we enabled write barriers, so it was necessary to synchronize all Ps to ensure write barriers were enabled before any marking occurred. However, we've long since switched to enabling write barriers during the concurrent scan phase, so the start-the-world at the beginning of the concurrent scan phase ensures that all Ps have enabled the write barrier. Hence, we can eliminate the old "install write barrier" phase. Fixes #11971. Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8 Reviewed-on: https://go-review.googlesource.com/16044Reviewed-by:
Rick Hudson <rlh@golang.org>
-
- 19 Oct, 2015 3 commits
-
-
Austin Clements authored
These functions are always called together and perform logically related state resets, so combine them in to just gcResetMarkState. Fixes #11427. Change-Id: I06c17ef65f66186494887a767b3993126955b5fe Reviewed-on: https://go-review.googlesource.com/16041Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Currently gcResetGState is called by func gcscan_m for concurrent GC and directly by func gc for STW GC. Simplify this by consolidating these two calls in to one call by func gc above where it splits for concurrent and STW GC. As a consequence, gcResetGState and gcResetMarkState are always called together, so the next commit will consolidate these. Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0 Reviewed-on: https://go-review.googlesource.com/16040Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
This work queue is no longer used (there are many reads of work.partial, but the only write is in putpartial, which is never called). Fixes #11922. Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a Reviewed-on: https://go-review.googlesource.com/15892Reviewed-by:
Rick Hudson <rlh@golang.org>
-
- 16 Oct, 2015 1 commit
-
-
Michael Hudson-Doyle authored
Change-Id: I88f80f5914d6e4c179f3d28aa59fc29b7ef0cc66 Reviewed-on: https://go-review.googlesource.com/15960Reviewed-by:
Minux Ma <minux@golang.org> Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
- 09 Oct, 2015 8 commits
-
-
Austin Clements authored
Currently, when the mutator allocates, the runtime first allocates the memory and then, if that G has done "enough" allocation, the runtime checks whether the G has assist debt to pay off and, if so, pays it off. This approach leads to under-assisting, where a G can allocate a large region (or many small regions) before paying for it, or can even exit with outstanding debt. This commit flips this around so that a G always acquires enough credit for an allocation before it can perform that allocation. We continue to amortize the cost of assists by requiring that they over-assist when triggered to build up credit for many allocations. Fixes #11967. Change-Id: Idac9f11133b328535667674d837be72c23ebd899 Reviewed-on: https://go-review.googlesource.com/15409Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>
-
Austin Clements authored
Currently we track the per-G GC assist balance as two monotonically increasing values: the bytes allocated by the G this cycle (gcalloc) and the scan work performed by the G this cycle (gcscanwork). The assist balance is hence assistRatio*gcalloc - gcscanwork. This works, but has two important downsides: 1) It requires floating-point math to figure out if a G is in debt or not. This makes it inappropriate to check for assist debt in the hot path of mallocgc, so we only do this when a G allocates a new span. As a result, Gs can operate "in the red", leading to under-assist and extended GC cycle length. 2) Revising the assist ratio during a GC cycle can lead to an "assist burst". If you think of plotting the scan work performed versus heaps size, the assist ratio controls the slope of this line. However, in the current system, the target line always passes through 0 at the heap size that triggered GC, so if the runtime increases the assist ratio, there has to be a potentially large assist to jump from the current amount of scan work up to the new target scan work for the current heap size. This commit replaces this approach with directly tracking the GC assist balance in terms of allocation credit bytes. Allocating N bytes simply decreases this by N and assisting raises it by the amount of scan work performed divided by the assist ratio (to get back to bytes). This will make it cheap to figure out if a G is in debt, which will let us efficiently check if an assist is necessary *before* performing an allocation and hence keep Gs "in the black". This also fixes assist bursts because the assist ratio is now in terms of *remaining* work, rather than work from the beginning of the GC cycle. Hence, the plot of scan work versus heap size becomes continuous: we can revise the slope, but this slope always starts from where we are right now, rather than where we were at the beginning of the cycle. Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c Reviewed-on: https://go-review.googlesource.com/15408Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Currently we ensure a minimum heap distance of 1MB when computing the assist ratio. Rather than enforcing this minimum on the heap distance, it makes more sense to enforce that the heap goal itself is at least 1MB over the live heap size at the beginning of GC. Currently the two approaches are semantically equivalent, but this will let us switch to basing the assist ratio on current heap distance rather than the initial heap distance, since we can't enforce this minimum on the current heap distance (the GC may never finish because the goal posts will always be 1MB away). Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903 Reviewed-on: https://go-review.googlesource.com/15604Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Currently, gcController.scanWork is updated as lazily as possible since it is only read at the end of the GC cycle. We're about to read it during the GC cycle to improve the assist ratio revisions, so modify gcDrain* to regularly flush to gcController.scanWork in much the same way as we regularly flush to gcController.bgScanCredit. One consequence of this is that it's difficult to keep gcw.scanWork monotonic, so we give up on that and simply return the amount of scan work done by gcDrainN rather than calculating it in the caller. Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d Reviewed-on: https://go-review.googlesource.com/15407Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Currently callers of gcDrain control whether it flushes scan work credit to gcController.bgScanCredit by passing a value other than -1 for the flush threshold. Shortly we're going to make this always flush scan work to gcController.scanWork and optionally also flush scan work to gcController.bgScanCredit. This will be much easier if the flush threshold is simply a constant (which it is in practice) and callers merely control whether or not the flush includes the background credit. Hence, replace the flush threshold argument with a flag. Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e Reviewed-on: https://go-review.googlesource.com/15406Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
These functions were nearly identical. Consolidate them by adding a flags argument. In addition to cleaning up this code, this makes further changes that affect both functions easier. Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f Reviewed-on: https://go-review.googlesource.com/15405Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313 Reviewed-on: https://go-review.googlesource.com/15404Reviewed-by:
Rick Hudson <rlh@golang.org>
-
Austin Clements authored
The comment for assistRatio claimed it to be the reciprocal of what it actually is. Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8 Reviewed-on: https://go-review.googlesource.com/15402Reviewed-by:
Russ Cox <rsc@golang.org>
-
- 02 Oct, 2015 2 commits
-
-
Austin Clements authored
In general, finishsweep_m must block until any spans that are concurrently being swept have been swept. It accomplishes this by looping over all spans, which, as in the previous commit, takes ~1ms/heap GB. Unfortunately, we do this during the STW sweep termination phase, so multi-gigabyte heaps can push our STW time past 10ms. However, there's no need to do this wait if the world is stopped because, in effect, stopping the world already had to wait for anything that was sweeping (and if it didn't, the wait in finishsweep_m would deadlock). Hence, we can simply skip this loop if the world is stopped, such as during sweep termination. In fact, currently all calls to finishsweep_m are STW, but this hasn't always been the case and may not be the case in the future, so we keep the logic around. For 24GB heaps, this reduces max pause time by 75% relative to tip and by 90% relative to Go 1.5. Notably, all pauses are now well under 10ms. Here are the results for the garbage benchmark: ------------- max pause ------------ Heap Procs after change before change 1.5.1 24GB 12 3.8ms 16ms 37ms 24GB 4 3.7ms 16ms 37ms 4GB 4 3.7ms 3ms 6.9ms In the 4GB/4P case, it seems the "before change" run got lucky: the max went up, but the 99%ile pause time went down from 3ms to 2.04ms. Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31 Reviewed-on: https://go-review.googlesource.com/15071Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-
Austin Clements authored
In order to compute the sweep ratio, the runtime needs to know how many pages belong to spans in state _MSpanInUse. Currently it finds this out by looping over all spans during mark termination. However, this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our STW time past 10ms. Replace the loop with an actively maintained count of in-use pages. For multi-gigabyte heaps, this reduces max mark termination pause time by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This shifts the longest pause time for large heaps to the sweep termination phase, so it only slightly decreases max pause time, though it roughly halves mean pause time. Here are the results for the garbage benchmark: ---- max mark termination pause ---- Heap Procs after change before change 1.5.1 24GB 12 1.9ms 18ms 37ms 24GB 4 3.7ms 18ms 37ms 4GB 4 920µs 3.8ms 6.9ms Fixes #11484. Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac Reviewed-on: https://go-review.googlesource.com/15070Reviewed-by:
Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
-