runtime: use reachable heap estimate to set trigger/goal
Austin Clements authored
Currently, we set the heap goal for the next GC cycle using the size
of the marked heap at the end of the current cycle. This can lead to a
bad feedback loop if the mutator is rapidly allocating and releasing
pointers that can significantly bloat heap size.

If the GC were STW, the marked heap size would be exactly the
reachable heap size (call it stwLive). However, in concurrent GC,
marked=stwLive+floatLive, where floatLive is the amount of "floating
garbage": objects that were reachable at some point during the cycle
and were marked, but which are no longer reachable by the end of the
cycle. If the GC cycle is short, then the mutator doesn't have much
time to create floating garbage, so marked≈stwLive. However, if the GC
cycle is long and the mutator is allocating and creating floating
garbage very rapidly, then it's possible that marked≫stwLive. Since
the runtime currently sets the heap goal based on marked, this will
cause it to set a high heap goal. This means that 1) the next GC cycle
will take longer because of the larger heap and 2) the assist ratio
will be low because of the large distance between the trigger and the
goal. The combination of these lets the mutator produce even more
floating garbage in the next cycle, which further exacerbates the
problem.

For example, on the garbage benchmark with GOMAXPROCS=1, this causes
the heap to grow to ~500MB and the garbage collector to retain upwards
of ~300MB of heap, while the true reachable heap size is ~32MB. This,
in turn, causes the GC cycle to take upwards of ~3 seconds.

Fix this bad feedback loop by estimating the true reachable heap size
(stwLive) and using this rather than the marked heap size
(stwLive+floatLive) as the basis for the GC trigger and heap goal.
This breaks the bad feedback loop and causes the mutator to assist
more, which decreases the rate at which it can create floating
garbage. On the same garbage benchmark, this reduces the maximum heap
size to ~73MB, the retained heap to ~40MB, and the duration of the GC
cycle to ~200ms.

Change-Id: I7712244c94240743b266f9eb720c03802799cdd1
Reviewed-on: https://go-review.googlesource.com/9177

Reviewed-by: default avatarRick Hudson <rlh@golang.org>
4655aadd
Name Last commit Last update
cgo
debug
pprof
race
Makefile
alg.go
append_test.go
arch1_386.go
arch1_amd64.go
arch1_amd64p32.go
arch1_arm.go
arch1_arm64.go
arch1_ppc64.go
arch1_ppc64le.go
arch_386.go
arch_amd64.go
arch_amd64p32.go
arch_arm.go
arch_arm64.go
arch_ppc64.go
arch_ppc64le.go
asm.s
asm_386.s
asm_amd64.s
asm_amd64p32.s
asm_arm.s
asm_arm64.s
asm_ppc64x.s
atomic_386.go
atomic_amd64x.go
atomic_arm.go
atomic_arm64.go
atomic_arm64.s
atomic_pointer.go
atomic_ppc64x.go
atomic_ppc64x.s
cgo.go
cgocall.go
cgocallback.go
chan.go
chan_test.go
closure_test.go
compiler.go
complex.go
complex_test.go
cpuprof.go
cputicks.go
crash_cgo_test.go
crash_test.go
debug.go
defs1_linux.go
defs1_netbsd_386.go
defs1_netbsd_amd64.go
defs1_netbsd_arm.go
defs1_solaris_amd64.go
defs2_linux.go
defs3_linux.go
defs_arm_linux.go
defs_darwin.go
defs_darwin_386.go
defs_darwin_amd64.go
defs_darwin_arm.go
defs_darwin_arm64.go
defs_dragonfly.go
defs_dragonfly_amd64.go
defs_freebsd.go
defs_freebsd_386.go
defs_freebsd_amd64.go
defs_freebsd_arm.go
defs_linux.go
defs_linux_386.go
defs_linux_amd64.go
defs_linux_arm.go
defs_linux_arm64.go
defs_linux_ppc64.go
defs_linux_ppc64le.go
defs_nacl_386.go
defs_nacl_amd64p32.go
defs_nacl_arm.go
defs_netbsd.go
defs_netbsd_386.go
defs_netbsd_amd64.go
defs_netbsd_arm.go
defs_openbsd.go
defs_openbsd_386.go
defs_openbsd_amd64.go
defs_openbsd_arm.go
defs_plan9_386.go
defs_plan9_amd64.go
defs_solaris.go
defs_solaris_amd64.go
defs_windows.go
defs_windows_386.go
defs_windows_amd64.go
duff_386.s
duff_amd64.s
duff_arm.s
duff_arm64.s
duff_ppc64x.s
env_plan9.go
env_posix.go
env_test.go
error.go
export_futex_test.go