An error occurred fetching the project authors.
  1. 12 Nov, 2015 1 commit
  2. 11 Nov, 2015 3 commits
  3. 10 Nov, 2015 1 commit
    • Michael Matloob's avatar
      runtime: break atomics out into package runtime/internal/atomic · 67faca7d
      Michael Matloob authored
      This change breaks out most of the atomics functions in the runtime
      into package runtime/internal/atomic. It adds some basic support
      in the toolchain for runtime packages, and also modifies linux/arm
      atomics to remove the dependency on the runtime's mutex. The mutexes
      have been replaced with spinlocks.
      
      all trybots are happy!
      In addition to the trybots, I've tested on the darwin/arm64 builder,
      on the darwin/arm builder, and on a ppc64le machine.
      
      Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
      Reviewed-on: https://go-review.googlesource.com/14204
      Run-TryBot: Michael Matloob <matloob@golang.org>
      Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      67faca7d
  4. 05 Nov, 2015 11 commits
    • Austin Clements's avatar
      runtime: remove background GC goroutine and mark barriers · d5ba5821
      Austin Clements authored
      These are now unused.
      
      Updates #11970.
      
      Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f
      Reviewed-on: https://go-review.googlesource.com/16393Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d5ba5821
    • Austin Clements's avatar
      runtime: decentralize mark done and mark termination · c99d7f7f
      Austin Clements authored
      This moves all of the mark 1 to mark 2 transition and mark termination
      to the mark done transition function. This means these transitions are
      now handled on the goroutine that detected mark completion. This also
      means that the GC coordinator and the background completion barriers
      are no longer used and various workarounds to yield to the coordinator
      are no longer necessary. These will be removed in follow-up commits.
      
      One consequence of this is that mark workers now need to be
      preemptible when performing the mark done transition. This allows them
      to stop the world and to perform the final clean-up steps of GC after
      restarting the world. They are only made preemptible while performing
      this transition, so if the worker findRunnableGCWorker would schedule
      isn't available, we didn't want to schedule it anyway.
      
      Fixes #11970.
      
      Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
      Reviewed-on: https://go-review.googlesource.com/16391Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      c99d7f7f
    • Austin Clements's avatar
      runtime: account mark worker time before gcMarkDone · d986bf27
      Austin Clements authored
      Currently gcMarkDone takes basically no time, so it's okay to account
      the worker time after calling it. However, gcMarkDone is about to take
      potentially *much* longer because it may perform all of mark
      termination. Prepare for this by swapping the order so we account the
      time before calling gcMarkDone.
      
      Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb
      Reviewed-on: https://go-review.googlesource.com/16390Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      d986bf27
    • Austin Clements's avatar
      runtime: factor mark done transition · 171204b5
      Austin Clements authored
      Currently the code for completion of mark 1/mark 2 is duplicated in
      background workers and assists. Factor this in to a single function
      that will serve as the transition function for concurrent mark.
      
      Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
      Reviewed-on: https://go-review.googlesource.com/16359Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      171204b5
    • Austin Clements's avatar
      runtime: eliminate mark completion in scheduler · 12e23f05
      Austin Clements authored
      Currently, findRunnableGCWorker will perform mark completion if there
      is no remaining work and no running workers. This used to be necessary
      to resolve a race in the transition from mark 1 to mark 2 where we
      would enter mark 2 with no mark work (and no dedicated workers), so no
      workers would run, so no worker would signal mark completion.
      
      However, we're about to make mark completion also perform the entire
      follow-on process, which includes mark termination. We really don't
      want to do that in the scheduler if it happens to detect completion.
      
      Conveniently, this hack is no longer necessary because we always
      enqueue root scanning work at the beginning of both mark 1 and mark 2,
      so a mark worker will always run. Hence, we can simply eliminate it.
      
      Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e
      Reviewed-on: https://go-review.googlesource.com/16358Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      12e23f05
    • Austin Clements's avatar
      runtime: decentralize sweep termination and mark transition · a51905fa
      Austin Clements authored
      This moves all of GC initialization, sweep termination, and the
      transition to concurrent marking in to the off->mark transition
      function. This means it's now handled on the goroutine that detected
      the state exit condition.
      
      As a result, malloc no longer needs to Gosched() at the beginning of
      the GC cycle to prevent over-allocation while the GC is starting up
      because it will now *help* the GC to start up. The Gosched hack is
      still necessary during GC shutdown (this is easy to test by enabling
      gctrace and hitting Ctrl-S to block the gctrace output).
      
      At this point, the GC coordinator still handles later phases. This
      requires a small tweak to how we start the GC coordinator. Currently,
      starting the GC coordinator is best-effort and may fail if the
      coordinator is about to park from the previous cycle but hasn't yet.
      We fix this by replacing the park/ready to wake up the coordinator
      with a semaphore. This is temporary since the coordinator will be
      going away in a few commits.
      
      Updates #11970.
      
      Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
      Reviewed-on: https://go-review.googlesource.com/16357Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      a51905fa
    • Austin Clements's avatar
      runtime: decentralize concurrent sweep termination · 9630c47e
      Austin Clements authored
      This moves concurrent sweep termination from the coordinator to the
      off->mark transition. This allows it to be performed by all Gs
      attempting to start the GC.
      
      Updates #11970.
      
      Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947
      Reviewed-on: https://go-review.googlesource.com/16356Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9630c47e
    • Austin Clements's avatar
      runtime: beginning of decentralized off->mark transition · f54bcedc
      Austin Clements authored
      This begins the conversion of the centralized GC coordinator to a
      decentralized state machine by introducing the internal API that
      triggers the first state transition from _GCoff to _GCmark (or
      _GCmarktermination).
      
      This change introduces the transition lock, the off->mark transition
      condition (which is very similar to shouldtriggergc()), and the
      general structure of a state transition. Since we're doing this
      conversion in stages, it then falls back to the GC coordinator to
      actually execute the cycle. We'll start moving logic out of the GC
      coordinator and in to transition functions next.
      
      This fixes a minor bug in gcstoptheworld debug mode where passing the
      heap trigger once could trigger multiple STW GCs.
      
      Updates #11970.
      
      Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
      Reviewed-on: https://go-review.googlesource.com/16355Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      f54bcedc
    • Austin Clements's avatar
      runtime: move concurrent mark setup off system stack · 38425962
      Austin Clements authored
      For historical reasons we currently do a lot of the concurrent mark
      setup on the system stack. In fact, at this point the one and only
      thing that needs to happen on the system stack is the start-the-world.
      
      Clean up this code by lifting everything other than the
      start-the-world off the system stack.
      
      The diff for this change looks large, but the only code change is to
      narrow the systemstack call. Everything else is re-indentation.
      
      Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce
      Reviewed-on: https://go-review.googlesource.com/16354Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      38425962
    • Austin Clements's avatar
      runtime: lift state variables from func gc to var work · 19596215
      Austin Clements authored
      We're about to split func gc across several functions, so lift the
      local variables it uses for tracking statistics and state across the
      cycle into the global "work" variable.
      
      Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37
      Reviewed-on: https://go-review.googlesource.com/16353Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      19596215
    • Austin Clements's avatar
      runtime: note a minor issue with GODEUG=gcstoptheworld · 16980189
      Austin Clements authored
      Change-Id: I91cda8d88b0852cd0f868d33c594206bcca0c386
      Reviewed-on: https://go-review.googlesource.com/16352Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      16980189
  5. 04 Nov, 2015 2 commits
    • Austin Clements's avatar
      runtime: make putfull start mark workers · dcd9e5bc
      Austin Clements authored
      Currently we depend on the good graces and timing of the scheduler to
      get opportunities to start dedicated mark workers. In the worst case,
      it may take 10ms to get dedicated mark workers going at the beginning
      of mark 1 and mark 2 or after the amount of available work has dropped
      and gone back up.
      
      Instead of waiting for the regular preemption logic to get around to
      us, make putfull enlist a random P if we're not already running enough
      dedicated workers. This should improve performance stability of the
      garbage collector and is likely to improve the overall performance
      somewhat.
      
      No overall effect on the go1 benchmarks. It speeds up the garbage
      benchmark by 12%, which more than counters the performance loss from
      the previous commit.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  6.32ms ± 4%  5.58ms ± 2%  -11.68%  (p=0.000 n=20+16)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.18s ± 5%     3.12s ± 4%  -1.83%  (p=0.021 n=20+20)
      Fannkuch11-12                2.50s ± 2%     2.46s ± 2%  -1.57%  (p=0.000 n=18+19)
      FmtFprintfEmpty-12          50.8ns ± 3%    50.4ns ± 3%    ~     (p=0.184 n=20+20)
      FmtFprintfString-12          167ns ± 2%     171ns ± 1%  +2.46%  (p=0.000 n=20+19)
      FmtFprintfInt-12             161ns ± 2%     163ns ± 2%  +1.81%  (p=0.000 n=20+20)
      FmtFprintfIntInt-12          269ns ± 1%     266ns ± 1%  -0.81%  (p=0.002 n=19+20)
      FmtFprintfPrefixedInt-12     237ns ± 2%     231ns ± 2%  -2.86%  (p=0.000 n=20+20)
      FmtFprintfFloat-12           313ns ± 2%     313ns ± 1%    ~     (p=0.681 n=20+20)
      FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 1%  -2.26%  (p=0.000 n=20+20)
      GobDecode-12                8.66ms ± 1%    8.67ms ± 1%    ~     (p=0.380 n=19+20)
      GobEncode-12                6.56ms ± 1%    6.56ms ± 2%    ~     (p=0.607 n=19+20)
      Gzip-12                      317ms ± 1%     314ms ± 2%  -1.10%  (p=0.000 n=20+19)
      Gunzip-12                   42.1ms ± 1%    42.2ms ± 1%  +0.27%  (p=0.044 n=20+19)
      HTTPClientServer-12         62.7µs ± 1%    62.0µs ± 1%  -1.04%  (p=0.000 n=19+18)
      JSONEncode-12               16.7ms ± 1%    16.8ms ± 2%  +0.59%  (p=0.021 n=20+20)
      JSONDecode-12               58.2ms ± 1%    61.4ms ± 2%  +5.43%  (p=0.000 n=18+19)
      Mandelbrot200-12            3.84ms ± 1%    3.87ms ± 2%  +0.79%  (p=0.008 n=18+20)
      GoParse-12                  3.86ms ± 2%    3.76ms ± 2%  -2.60%  (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12       100ns ± 2%     100ns ± 1%  -0.68%  (p=0.005 n=18+15)
      RegexpMatchEasy0_1K-12       332ns ± 1%     342ns ± 1%  +3.16%  (p=0.000 n=19+19)
      RegexpMatchEasy1_32-12      82.9ns ± 3%    83.0ns ± 2%    ~     (p=0.906 n=19+20)
      RegexpMatchEasy1_1K-12       487ns ± 1%     494ns ± 1%  +1.50%  (p=0.000 n=17+20)
      RegexpMatchMedium_32-12      131ns ± 2%     130ns ± 1%    ~     (p=0.686 n=19+20)
      RegexpMatchMedium_1K-12     39.6µs ± 1%    39.2µs ± 1%  -1.09%  (p=0.000 n=18+19)
      RegexpMatchHard_32-12       2.04µs ± 1%    2.04µs ± 2%    ~     (p=0.804 n=20+20)
      RegexpMatchHard_1K-12       61.7µs ± 2%    61.3µs ± 2%    ~     (p=0.052 n=18+20)
      Revcomp-12                   529ms ± 2%     533ms ± 1%  +0.83%  (p=0.003 n=20+19)
      Template-12                 70.7ms ± 2%    71.0ms ± 2%    ~     (p=0.065 n=20+19)
      TimeParse-12                 351ns ± 2%     355ns ± 1%  +1.25%  (p=0.000 n=19+20)
      TimeFormat-12                362ns ± 2%     373ns ± 1%  +2.83%  (p=0.000 n=18+20)
      [Geo mean]                  62.2µs         62.3µs       +0.13%
      
      name                      old speed      new speed      delta
      GobDecode-12              88.6MB/s ± 1%  88.5MB/s ± 1%    ~     (p=0.392 n=19+20)
      GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.622 n=19+20)
      Gzip-12                   61.1MB/s ± 1%  61.8MB/s ± 2%  +1.11%  (p=0.000 n=20+19)
      Gunzip-12                  461MB/s ± 1%   460MB/s ± 1%  -0.27%  (p=0.044 n=20+19)
      JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%  -0.58%  (p=0.022 n=20+20)
      JSONDecode-12             33.3MB/s ± 1%  31.6MB/s ± 2%  -5.15%  (p=0.000 n=18+19)
      GoParse-12                15.0MB/s ± 2%  15.4MB/s ± 2%  +2.66%  (p=0.000 n=20+20)
      RegexpMatchEasy0_32-12     317MB/s ± 2%   319MB/s ± 2%    ~     (p=0.052 n=20+20)
      RegexpMatchEasy0_1K-12    3.08GB/s ± 1%  2.99GB/s ± 1%  -3.07%  (p=0.000 n=19+19)
      RegexpMatchEasy1_32-12     386MB/s ± 3%   386MB/s ± 2%    ~     (p=0.939 n=19+20)
      RegexpMatchEasy1_1K-12    2.10GB/s ± 1%  2.07GB/s ± 1%  -1.46%  (p=0.000 n=17+20)
      RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.64MB/s ± 1%    ~     (p=0.702 n=19+20)
      RegexpMatchMedium_1K-12   25.9MB/s ± 1%  26.1MB/s ± 2%  +0.99%  (p=0.000 n=18+20)
      RegexpMatchHard_32-12     15.7MB/s ± 1%  15.7MB/s ± 2%    ~     (p=0.723 n=20+20)
      RegexpMatchHard_1K-12     16.6MB/s ± 2%  16.7MB/s ± 2%    ~     (p=0.052 n=18+20)
      Revcomp-12                 481MB/s ± 2%   477MB/s ± 1%  -0.83%  (p=0.003 n=20+19)
      Template-12               27.5MB/s ± 2%  27.3MB/s ± 2%    ~     (p=0.062 n=20+19)
      [Geo mean]                99.4MB/s       99.1MB/s       -0.35%
      
      Change-Id: I914d8cadded5a230509d118164a4c201601afc06
      Reviewed-on: https://go-review.googlesource.com/16298Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      dcd9e5bc
    • Austin Clements's avatar
      runtime: eliminate getfull barrier from concurrent mark · 62ba520b
      Austin Clements authored
      Currently dedicated mark workers participate in the getfull barrier
      during concurrent mark. However, the getfull barrier wasn't designed
      for concurrent work and this causes no end of headaches.
      
      In the concurrent setting, participants come and go. This makes mark
      completion susceptible to live-lock: since dedicated workers are only
      periodically polling for completion, it's possible for the program to
      be in some transient worker each time one of the dedicated workers
      wakes up to check if it can exit the getfull barrier. It also
      complicates reasoning about the system because dedicated workers
      participate directly in the getfull barrier, but transient workers
      must instead use trygetfull because they have exit conditions that
      aren't captured by getfull (e.g., fractional workers exit when
      preempted). The complexity of implementing these exit conditions
      contributed to #11677. Furthermore, the getfull barrier is inefficient
      because we could be running user code instead of spinning on a P. In
      effect, we're dedicating 25% of the CPU to marking even if that means
      we have to spin to make that 25%. It also causes issues on Windows
      because we can't actually sleep for 100µs (#8687).
      
      Fix this by making dedicated workers no longer participate in the
      getfull barrier. Instead, dedicated workers simply return to the
      scheduler when they fail to get more work, regardless of what others
      workers are doing, and the scheduler only starts new dedicated workers
      if there's work available. Everything that needs to be handled by this
      barrier is already handled by detection of mark completion.
      
      This makes the system much more symmetric because all workers and
      assists now use trygetfull during concurrent mark. It also loosens the
      25% CPU target so that we can give some of that 25% back to user code
      if there isn't enough work to keep the mark worker busy. And it
      eliminates the problematic 100µs sleep on Windows during concurrent
      mark (though not during mark termination).
      
      The downside of this is that if we hit a bottleneck in the heap graph
      that then expands back out, the system may shut down dedicated workers
      and take a while to start them back up. We'll address this in the next
      commit.
      
      Updates #12041 and #8687.
      
      No effect on the go1 benchmarks. This slows down the garbage benchmark
      by 9%, but we'll more than make it up in the next commit.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.80ms ± 2%  6.32ms ± 4%  +9.03%  (p=0.000 n=20+20)
      
      Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
      Reviewed-on: https://go-review.googlesource.com/16297Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      62ba520b
  6. 03 Nov, 2015 2 commits
    • Austin Clements's avatar
      runtime: cache two workbufs to reduce contention · b6c0934a
      Austin Clements authored
      Currently the gcWork abstraction caches a single work buffer. As a
      result, if a worker is putting and getting pointers right at the
      boundary of a work buffer, it can flap between work buffers and
      (potentially significantly) increase contention on the global work
      buffer lists.
      
      This change modifies gcWork to instead cache two work buffers and
      switch off between them. This introduces one buffers' worth of
      hysteresis and eliminates the above performance worst case by
      amortizing the cost of getting or putting a work buffer over at least
      one buffers' worth of work.
      
      In practice, it's difficult to trigger this worst case with reasonably
      large work buffers. On the garbage benchmark, this reduces the max
      writes/sec to the global work list from 32K to 25K and the median from
      6K to 5K. However, if a workload were to trigger this worst case
      behavior, it could significantly drive up this contention.
      
      This has negligible effects on the go1 benchmarks and slightly speeds
      up the garbage benchmark.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.90ms ± 3%  5.83ms ± 4%  -1.18%  (p=0.011 n=18+18)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.22s ± 4%     3.17s ± 3%  -1.57%  (p=0.009 n=19+20)
      Fannkuch11-12                2.44s ± 1%     2.53s ± 4%  +3.78%  (p=0.000 n=18+19)
      FmtFprintfEmpty-12          50.2ns ± 2%    50.5ns ± 5%    ~     (p=0.631 n=19+20)
      FmtFprintfString-12          167ns ± 1%     166ns ± 1%    ~     (p=0.141 n=20+20)
      FmtFprintfInt-12             162ns ± 1%     159ns ± 1%  -1.80%  (p=0.000 n=20+20)
      FmtFprintfIntInt-12          277ns ± 2%     263ns ± 1%  -4.78%  (p=0.000 n=20+18)
      FmtFprintfPrefixedInt-12     240ns ± 1%     232ns ± 2%  -3.25%  (p=0.000 n=20+20)
      FmtFprintfFloat-12           311ns ± 1%     315ns ± 2%  +1.17%  (p=0.000 n=20+20)
      FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 2%  -1.72%  (p=0.000 n=20+20)
      GobDecode-12                8.65ms ± 1%    8.71ms ± 2%  +0.68%  (p=0.001 n=19+20)
      GobEncode-12                6.51ms ± 1%    6.54ms ± 1%  +0.42%  (p=0.047 n=20+19)
      Gzip-12                      318ms ± 2%     315ms ± 2%  -1.20%  (p=0.000 n=19+19)
      Gunzip-12                   42.2ms ± 2%    42.1ms ± 1%    ~     (p=0.667 n=20+19)
      HTTPClientServer-12         62.5µs ± 1%    62.4µs ± 1%    ~     (p=0.110 n=20+18)
      JSONEncode-12               16.8ms ± 1%    16.8ms ± 2%    ~     (p=0.569 n=19+20)
      JSONDecode-12               60.8ms ± 2%    59.8ms ± 1%  -1.69%  (p=0.000 n=19+19)
      Mandelbrot200-12            3.87ms ± 1%    3.85ms ± 0%  -0.61%  (p=0.001 n=20+17)
      GoParse-12                  3.76ms ± 2%    3.76ms ± 1%    ~     (p=0.698 n=20+20)
      RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%    ~     (p=0.065 n=19+20)
      RegexpMatchEasy0_1K-12       342ns ± 2%     333ns ± 1%  -2.82%  (p=0.000 n=20+19)
      RegexpMatchEasy1_32-12      83.3ns ± 2%    83.2ns ± 2%    ~     (p=0.692 n=20+19)
      RegexpMatchEasy1_1K-12       498ns ± 2%     490ns ± 1%  -1.52%  (p=0.000 n=18+20)
      RegexpMatchMedium_32-12      131ns ± 2%     131ns ± 2%    ~     (p=0.464 n=20+18)
      RegexpMatchMedium_1K-12     39.3µs ± 2%    39.6µs ± 1%  +0.77%  (p=0.000 n=18+19)
      RegexpMatchHard_32-12       2.04µs ± 2%    2.06µs ± 1%  +0.69%  (p=0.009 n=19+20)
      RegexpMatchHard_1K-12       61.4µs ± 2%    62.1µs ± 1%  +1.21%  (p=0.000 n=19+20)
      Revcomp-12                   534ms ± 1%     529ms ± 1%  -0.97%  (p=0.000 n=19+16)
      Template-12                 70.4ms ± 2%    70.0ms ± 1%    ~     (p=0.070 n=19+19)
      TimeParse-12                 359ns ± 3%     344ns ± 1%  -4.15%  (p=0.000 n=19+19)
      TimeFormat-12                357ns ± 1%     361ns ± 2%  +1.05%  (p=0.002 n=20+20)
      [Geo mean]                  62.4µs         62.0µs       -0.56%
      
      name                      old speed      new speed      delta
      GobDecode-12              88.7MB/s ± 1%  88.1MB/s ± 2%  -0.68%  (p=0.001 n=19+20)
      GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.42%  (p=0.046 n=20+19)
      Gzip-12                   60.9MB/s ± 2%  61.7MB/s ± 2%  +1.21%  (p=0.000 n=19+19)
      Gunzip-12                  460MB/s ± 2%   461MB/s ± 1%    ~     (p=0.661 n=20+19)
      JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%    ~     (p=0.555 n=19+20)
      JSONDecode-12             31.9MB/s ± 2%  32.5MB/s ± 1%  +1.72%  (p=0.000 n=19+19)
      GoParse-12                15.4MB/s ± 2%  15.4MB/s ± 1%    ~     (p=0.653 n=20+20)
      RegexpMatchEasy0_32-12     317MB/s ± 2%   315MB/s ± 2%    ~     (p=0.141 n=19+20)
      RegexpMatchEasy0_1K-12    2.99GB/s ± 2%  3.07GB/s ± 1%  +2.86%  (p=0.000 n=20+19)
      RegexpMatchEasy1_32-12     384MB/s ± 2%   385MB/s ± 2%    ~     (p=0.672 n=20+19)
      RegexpMatchEasy1_1K-12    2.06GB/s ± 2%  2.09GB/s ± 1%  +1.54%  (p=0.000 n=18+20)
      RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.63MB/s ± 2%    ~     (p=0.800 n=20+18)
      RegexpMatchMedium_1K-12   26.0MB/s ± 1%  25.8MB/s ± 1%  -0.77%  (p=0.000 n=18+19)
      RegexpMatchHard_32-12     15.7MB/s ± 2%  15.6MB/s ± 1%  -0.69%  (p=0.010 n=19+20)
      RegexpMatchHard_1K-12     16.7MB/s ± 2%  16.5MB/s ± 1%  -1.19%  (p=0.000 n=19+20)
      Revcomp-12                 476MB/s ± 1%   481MB/s ± 1%  +0.97%  (p=0.000 n=19+16)
      Template-12               27.6MB/s ± 2%  27.7MB/s ± 1%    ~     (p=0.071 n=19+19)
      [Geo mean]                99.1MB/s       99.3MB/s       +0.27%
      
      Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6
      Reviewed-on: https://go-review.googlesource.com/16042Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      b6c0934a
    • Austin Clements's avatar
      runtime: replace assist sleep loop with park/ready · 15aa6bbd
      Austin Clements authored
      GC assists must block until the assist can be satisfied (either
      through stealing credit or doing work) or the GC cycle ends.
      Currently, this is implemented as a retry loop with a 100 µs delay.
      This obviously isn't ideal, as it wastes CPU and delays mutator
      execution. It also has the somewhat peculiar downside that sleeping a
      G requires allocation, and this requires working around recursive
      allocation.
      
      Replace this timed delay with a proper scheduling queue. When an
      assist can't be satisfied immediately, it adds the allocating G to a
      queue and parks it. Any time background scan credit is flushed, it
      consults this queue, directly satisfies the debt of queued assists,
      and wakes up satisfied assists before flushing any remaining credit to
      the background credit pool.
      
      No effect on the go1 benchmarks. Slightly speeds up the garbage
      benchmark.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)
      
      Updates #12041.
      
      Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
      Reviewed-on: https://go-review.googlesource.com/15890Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      15aa6bbd
  7. 30 Oct, 2015 3 commits
    • Austin Clements's avatar
      runtime: perform mark 2 root re-scanning in GC workers · fbf27325
      Austin Clements authored
      This moves another root scanning task out of the GC coordinator and
      parallelizes it on the GC workers.
      
      This has negligible effect on the go1 benchmarks and the garbage
      benchmark.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.24ms ± 1%  5.26ms ± 1%  +0.30%  (p=0.007 n=18+17)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.20s ± 5%     3.21s ± 5%    ~     (p=0.264 n=20+18)
      Fannkuch11-12                2.46s ± 1%     2.54s ± 2%  +3.09%  (p=0.000 n=18+20)
      FmtFprintfEmpty-12          49.9ns ± 4%    50.0ns ± 5%    ~     (p=0.356 n=20+20)
      FmtFprintfString-12          170ns ± 1%     170ns ± 2%    ~     (p=0.815 n=19+20)
      FmtFprintfInt-12             160ns ± 1%     159ns ± 1%  -0.63%  (p=0.003 n=18+19)
      FmtFprintfIntInt-12          270ns ± 1%     267ns ± 1%  -1.00%  (p=0.000 n=19+18)
      FmtFprintfPrefixedInt-12     238ns ± 1%     232ns ± 1%  -2.28%  (p=0.000 n=19+19)
      FmtFprintfFloat-12           310ns ± 2%     313ns ± 2%  +0.93%  (p=0.000 n=19+19)
      FmtManyArgs-12              1.06µs ± 1%    1.04µs ± 1%  -1.93%  (p=0.000 n=20+19)
      GobDecode-12                8.63ms ± 1%    8.70ms ± 1%  +0.81%  (p=0.001 n=20+19)
      GobEncode-12                6.52ms ± 1%    6.56ms ± 1%  +0.66%  (p=0.000 n=20+19)
      Gzip-12                      318ms ± 1%     319ms ± 1%    ~     (p=0.405 n=17+18)
      Gunzip-12                   42.1ms ± 2%    42.0ms ± 1%    ~     (p=0.771 n=20+19)
      HTTPClientServer-12         62.6µs ± 1%    62.9µs ± 1%  +0.41%  (p=0.038 n=20+20)
      JSONEncode-12               16.9ms ± 1%    16.9ms ± 1%    ~     (p=0.077 n=18+20)
      JSONDecode-12               60.7ms ± 1%    62.3ms ± 1%  +2.73%  (p=0.000 n=20+20)
      Mandelbrot200-12            3.86ms ± 1%    3.85ms ± 1%    ~     (p=0.084 n=19+20)
      GoParse-12                  3.75ms ± 2%    3.73ms ± 1%    ~     (p=0.107 n=20+19)
      RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%  +0.97%  (p=0.001 n=20+19)
      RegexpMatchEasy0_1K-12       342ns ± 2%     332ns ± 2%  -2.86%  (p=0.000 n=19+19)
      RegexpMatchEasy1_32-12      83.2ns ± 2%    82.8ns ± 2%    ~     (p=0.108 n=19+20)
      RegexpMatchEasy1_1K-12       495ns ± 2%     490ns ± 2%  -1.04%  (p=0.000 n=18+19)
      RegexpMatchMedium_32-12      130ns ± 2%     131ns ± 2%    ~     (p=0.291 n=20+20)
      RegexpMatchMedium_1K-12     39.3µs ± 1%    39.9µs ± 1%  +1.54%  (p=0.000 n=18+20)
      RegexpMatchHard_32-12       2.02µs ± 1%    2.05µs ± 2%  +1.19%  (p=0.000 n=19+19)
      RegexpMatchHard_1K-12       60.9µs ± 1%    61.5µs ± 1%  +0.99%  (p=0.000 n=18+18)
      Revcomp-12                   535ms ± 1%     531ms ± 1%  -0.82%  (p=0.000 n=17+17)
      Template-12                 73.0ms ± 1%    74.1ms ± 1%  +1.47%  (p=0.000 n=20+20)
      TimeParse-12                 356ns ± 2%     348ns ± 1%  -2.30%  (p=0.000 n=20+20)
      TimeFormat-12                347ns ± 1%     353ns ± 1%  +1.68%  (p=0.000 n=19+20)
      [Geo mean]                  62.3µs         62.4µs       +0.12%
      
      name                      old speed      new speed      delta
      GobDecode-12              88.9MB/s ± 1%  88.2MB/s ± 1%  -0.81%  (p=0.001 n=20+19)
      GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.66%  (p=0.000 n=20+19)
      Gzip-12                   60.9MB/s ± 1%  60.8MB/s ± 1%    ~     (p=0.409 n=17+18)
      Gunzip-12                  461MB/s ± 2%   462MB/s ± 1%    ~     (p=0.765 n=20+19)
      JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.078 n=18+20)
      JSONDecode-12             32.0MB/s ± 1%  31.1MB/s ± 1%  -2.65%  (p=0.000 n=20+20)
      GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 1%    ~     (p=0.111 n=20+19)
      RegexpMatchEasy0_32-12     318MB/s ± 2%   314MB/s ± 2%  -1.27%  (p=0.000 n=20+19)
      RegexpMatchEasy0_1K-12    2.99GB/s ± 1%  3.08GB/s ± 2%  +2.94%  (p=0.000 n=19+19)
      RegexpMatchEasy1_32-12     385MB/s ± 2%   386MB/s ± 2%    ~     (p=0.105 n=19+20)
      RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.09GB/s ± 2%  +1.06%  (p=0.000 n=18+19)
      RegexpMatchMedium_32-12   7.64MB/s ± 2%  7.61MB/s ± 1%    ~     (p=0.179 n=20+20)
      RegexpMatchMedium_1K-12   26.1MB/s ± 1%  25.7MB/s ± 1%  -1.52%  (p=0.000 n=18+20)
      RegexpMatchHard_32-12     15.8MB/s ± 1%  15.6MB/s ± 2%  -1.18%  (p=0.000 n=19+19)
      RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.6MB/s ± 1%  -0.90%  (p=0.000 n=19+18)
      Revcomp-12                 475MB/s ± 1%   479MB/s ± 1%  +0.83%  (p=0.000 n=17+17)
      Template-12               26.6MB/s ± 1%  26.2MB/s ± 1%  -1.45%  (p=0.000 n=20+20)
      [Geo mean]                99.0MB/s       98.7MB/s       -0.32%
      
      Change-Id: I6ea44d7a59aaa6851c64695277ab65645ff9d32e
      Reviewed-on: https://go-review.googlesource.com/16070Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      fbf27325
    • Austin Clements's avatar
      runtime: perform concurrent scan in GC workers · 82d14d77
      Austin Clements authored
      Currently the concurrent root scan is performed in its entirety by the
      GC coordinator before entering concurrent mark (which enables GC
      workers). This scan is done sequentially, which can prolong the scan
      phase, delay the mark phase, and means that the scan phase does not
      obey the 25% CPU goal. Furthermore, there's no need to complete the
      root scan before starting marking (in fact, we already allow GC
      assists to happen during the scan phase), so this acts as an
      unnecessary barrier between root scanning and marking.
      
      This change shifts the root scan work out of the GC coordinator and in
      to the GC workers. The coordinator simply sets up the scan state and
      enqueues the right number of root scan jobs. The GC workers then drain
      the root scan jobs prior to draining heap scan jobs.
      
      This parallelizes the root scan process, makes it obey the 25% CPU
      goal, and effectively eliminates root scanning as an isolated phase,
      allowing the system to smoothly transition from root scanning to heap
      marking. This also eliminates a major non-STW responsibility of the GC
      coordinator, which will make it easier to switch to a decentralized
      state machine. Finally, it puts us in a good position to perform root
      scanning in assists as well, which will help satisfy assists at the
      beginning of the GC cycle.
      
      This is mostly straightforward. One tricky aspect is that we have to
      deal with preemption deadlock: where two non-preemptible gorountines
      are trying to preempt each other to perform a stack scan. Given the
      context where this happens, the only instance of this is two
      background workers trying to scan each other. We avoid this by simply
      not scanning the stacks of background workers during the concurrent
      phase; this is safe because we'll scan them during mark termination
      (and their stacks are *very* small and should not contain any new
      pointers).
      
      This change also switches the root marking during mark termination to
      use the same gcDrain-based code path as concurrent mark. This
      shouldn't affect performance because STW root marking was already
      parallel and tasks switched to heap marking immediately when no more
      root marking tasks were available. However, it simplifies the code and
      unifies these code paths.
      
      This has negligible effect on the go1 benchmarks. It slightly slows
      down the garbage benchmark, possibly by making GC run slightly more
      frequently.
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  5.10ms ± 1%  5.24ms ± 1%  +2.87%  (p=0.000 n=18+18)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.25s ± 3%     3.20s ± 5%  -1.57%  (p=0.013 n=20+20)
      Fannkuch11-12                2.45s ± 1%     2.46s ± 1%  +0.38%  (p=0.019 n=20+18)
      FmtFprintfEmpty-12          49.7ns ± 3%    49.9ns ± 4%    ~     (p=0.851 n=19+20)
      FmtFprintfString-12          170ns ± 2%     170ns ± 1%    ~     (p=0.775 n=20+19)
      FmtFprintfInt-12             161ns ± 1%     160ns ± 1%  -0.78%  (p=0.000 n=19+18)
      FmtFprintfIntInt-12          267ns ± 1%     270ns ± 1%  +1.04%  (p=0.000 n=19+19)
      FmtFprintfPrefixedInt-12     238ns ± 2%     238ns ± 1%    ~     (p=0.133 n=18+19)
      FmtFprintfFloat-12           311ns ± 1%     310ns ± 2%  -0.35%  (p=0.023 n=20+19)
      FmtManyArgs-12              1.08µs ± 1%    1.06µs ± 1%  -2.31%  (p=0.000 n=20+20)
      GobDecode-12                8.65ms ± 1%    8.63ms ± 1%    ~     (p=0.377 n=18+20)
      GobEncode-12                6.49ms ± 1%    6.52ms ± 1%  +0.37%  (p=0.015 n=20+20)
      Gzip-12                      319ms ± 3%     318ms ± 1%    ~     (p=0.975 n=19+17)
      Gunzip-12                   41.9ms ± 1%    42.1ms ± 2%  +0.65%  (p=0.004 n=19+20)
      HTTPClientServer-12         61.7µs ± 1%    62.6µs ± 1%  +1.40%  (p=0.000 n=18+20)
      JSONEncode-12               16.8ms ± 1%    16.9ms ± 1%    ~     (p=0.239 n=20+18)
      JSONDecode-12               58.4ms ± 1%    60.7ms ± 1%  +3.85%  (p=0.000 n=19+20)
      Mandelbrot200-12            3.86ms ± 0%    3.86ms ± 1%    ~     (p=0.092 n=18+19)
      GoParse-12                  3.75ms ± 2%    3.75ms ± 2%    ~     (p=0.708 n=19+20)
      RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 2%  +0.60%  (p=0.010 n=17+20)
      RegexpMatchEasy0_1K-12       341ns ± 1%     342ns ± 2%    ~     (p=0.203 n=20+19)
      RegexpMatchEasy1_32-12      82.5ns ± 2%    83.2ns ± 2%  +0.83%  (p=0.007 n=19+19)
      RegexpMatchEasy1_1K-12       495ns ± 1%     495ns ± 2%    ~     (p=0.970 n=19+18)
      RegexpMatchMedium_32-12      130ns ± 2%     130ns ± 2%  +0.59%  (p=0.039 n=19+20)
      RegexpMatchMedium_1K-12     39.2µs ± 1%    39.3µs ± 1%    ~     (p=0.214 n=18+18)
      RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 1%    ~     (p=0.166 n=18+19)
      RegexpMatchHard_1K-12       61.0µs ± 1%    60.9µs ± 1%    ~     (p=0.169 n=20+18)
      Revcomp-12                   533ms ± 1%     535ms ± 1%    ~     (p=0.071 n=19+17)
      Template-12                 68.1ms ± 2%    73.0ms ± 1%  +7.26%  (p=0.000 n=19+20)
      TimeParse-12                 355ns ± 2%     356ns ± 2%    ~     (p=0.530 n=19+20)
      TimeFormat-12                357ns ± 2%     347ns ± 1%  -2.59%  (p=0.000 n=20+19)
      [Geo mean]                  62.1µs         62.3µs       +0.31%
      
      name                      old speed      new speed      delta
      GobDecode-12              88.7MB/s ± 1%  88.9MB/s ± 1%    ~     (p=0.377 n=18+20)
      GobEncode-12               118MB/s ± 1%   118MB/s ± 1%  -0.37%  (p=0.015 n=20+20)
      Gzip-12                   60.9MB/s ± 3%  60.9MB/s ± 1%    ~     (p=0.944 n=19+17)
      Gunzip-12                  464MB/s ± 1%   461MB/s ± 2%  -0.64%  (p=0.004 n=19+20)
      JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.236 n=20+18)
      JSONDecode-12             33.2MB/s ± 1%  32.0MB/s ± 1%  -3.71%  (p=0.000 n=19+20)
      GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 2%    ~     (p=0.702 n=19+20)
      RegexpMatchEasy0_32-12     320MB/s ± 1%   318MB/s ± 2%    ~     (p=0.094 n=18+20)
      RegexpMatchEasy0_1K-12    3.00GB/s ± 1%  2.99GB/s ± 1%    ~     (p=0.194 n=20+19)
      RegexpMatchEasy1_32-12     388MB/s ± 2%   385MB/s ± 2%  -0.83%  (p=0.008 n=19+19)
      RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.07GB/s ± 1%    ~     (p=0.964 n=19+18)
      RegexpMatchMedium_32-12   7.68MB/s ± 1%  7.64MB/s ± 2%  -0.57%  (p=0.020 n=19+20)
      RegexpMatchMedium_1K-12   26.1MB/s ± 1%  26.1MB/s ± 1%    ~     (p=0.211 n=18+18)
      RegexpMatchHard_32-12     15.8MB/s ± 1%  15.8MB/s ± 1%    ~     (p=0.180 n=18+19)
      RegexpMatchHard_1K-12     16.8MB/s ± 1%  16.8MB/s ± 2%    ~     (p=0.236 n=20+19)
      Revcomp-12                 477MB/s ± 1%   475MB/s ± 1%    ~     (p=0.071 n=19+17)
      Template-12               28.5MB/s ± 2%  26.6MB/s ± 1%  -6.77%  (p=0.000 n=19+20)
      [Geo mean]                 100MB/s       99.0MB/s       -0.82%
      
      Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
      Reviewed-on: https://go-review.googlesource.com/16059Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      82d14d77
    • Austin Clements's avatar
      runtime: consolidate "out of GC work" checks · 4cca1cc0
      Austin Clements authored
      We already have gcMarkWorkAvailable, but the check for GC mark work is
      open-coded in several places. Generalize gcMarkWorkAvailable slightly
      and replace these open-coded checks with calls to gcMarkWorkAvailable.
      
      In addition to cleaning up the code, this puts us in a better position
      to make this check slightly more complicated.
      
      Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
      Reviewed-on: https://go-review.googlesource.com/16058Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      4cca1cc0
  8. 26 Oct, 2015 1 commit
    • Austin Clements's avatar
      runtime: partition data and BSS root marking · d3df04cd
      Austin Clements authored
      Currently data and BSS root marking are each a single markroot job.
      This makes them difficult to load balance, which can draw out mark
      termination time if they are large.
      
      Fix this by splitting both in to 256K chunks. While we're putting in
      the infrastructure for dynamic roots, we also replace the fixed
      sharding of the span roots with sharding in to fixed sizes. In
      addition to helping balance root marking, this also paves the way to
      parallelizing concurrent scan and to letting assists help with root
      marking.
      
      Updates #10345. This fixes the data and BSS aspects of that bug; it
      does not partition scanning of large heap objects.
      
      This has negligible effect on either the go1 benchmarks or the garbage
      benchmark:
      
      name              old time/op  new time/op  delta
      XBenchGarbage-12  4.90ms ± 1%  4.91ms ± 2%   ~     (p=0.058 n=17+16)
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              3.11s ± 4%     3.12s ± 4%    ~     (p=0.512 n=20+20)
      Fannkuch11-12                2.53s ± 2%     2.47s ± 2%  -2.28%  (p=0.000 n=20+18)
      FmtFprintfEmpty-12          49.1ns ± 1%    50.0ns ± 4%  +1.68%  (p=0.008 n=18+20)
      FmtFprintfString-12          170ns ± 0%     172ns ± 1%  +1.05%  (p=0.000 n=14+19)
      FmtFprintfInt-12             174ns ± 1%     162ns ± 1%  -6.81%  (p=0.000 n=18+17)
      FmtFprintfIntInt-12          284ns ± 1%     277ns ± 1%  -2.42%  (p=0.000 n=20+19)
      FmtFprintfPrefixedInt-12     252ns ± 1%     244ns ± 1%  -2.84%  (p=0.000 n=18+20)
      FmtFprintfFloat-12           317ns ± 0%     311ns ± 0%  -1.95%  (p=0.000 n=19+18)
      FmtManyArgs-12              1.08µs ± 1%    1.11µs ± 1%  +3.43%  (p=0.000 n=18+19)
      GobDecode-12                8.56ms ± 1%    8.61ms ± 1%  +0.50%  (p=0.020 n=20+20)
      GobEncode-12                6.58ms ± 1%    6.57ms ± 1%    ~     (p=0.792 n=20+19)
      Gzip-12                      317ms ± 3%     317ms ± 2%    ~     (p=0.840 n=19+19)
      Gunzip-12                   41.6ms ± 0%    41.6ms ± 0%  +0.07%  (p=0.027 n=18+15)
      HTTPClientServer-12         62.2µs ± 1%    62.3µs ± 1%    ~     (p=0.283 n=19+20)
      JSONEncode-12               16.5ms ± 2%    16.5ms ± 1%    ~     (p=0.857 n=20+19)
      JSONDecode-12               58.5ms ± 1%    61.3ms ± 1%  +4.67%  (p=0.000 n=18+17)
      Mandelbrot200-12            3.84ms ± 0%    3.84ms ± 0%    ~     (p=0.259 n=17+17)
      GoParse-12                  3.70ms ± 2%    3.74ms ± 2%  +0.96%  (p=0.009 n=19+20)
      RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 0%  +0.31%  (p=0.040 n=19+15)
      RegexpMatchEasy0_1K-12       340ns ± 1%     340ns ± 1%    ~     (p=0.411 n=17+19)
      RegexpMatchEasy1_32-12      82.7ns ± 2%    82.3ns ± 1%    ~     (p=0.456 n=20+19)
      RegexpMatchEasy1_1K-12       498ns ± 2%     495ns ± 0%    ~     (p=0.108 n=19+17)
      RegexpMatchMedium_32-12      130ns ± 1%     130ns ± 2%    ~     (p=0.405 n=18+19)
      RegexpMatchMedium_1K-12     39.4µs ± 2%    39.1µs ± 1%  -0.64%  (p=0.002 n=20+19)
      RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 0%    ~     (p=0.561 n=20+17)
      RegexpMatchHard_1K-12       61.1µs ± 2%    60.8µs ± 1%    ~     (p=0.615 n=19+18)
      Revcomp-12                   532ms ± 2%     531ms ± 1%    ~     (p=0.470 n=19+19)
      Template-12                 68.5ms ± 1%    69.1ms ± 1%  +0.87%  (p=0.000 n=17+17)
      TimeParse-12                 344ns ± 2%     344ns ± 1%  +0.25%  (p=0.032 n=19+18)
      TimeFormat-12                347ns ± 1%     362ns ± 1%  +4.27%  (p=0.000 n=17+19)
      [Geo mean]                  62.3µs         62.3µs       -0.04%
      
      name                      old speed      new speed      delta
      GobDecode-12              89.6MB/s ± 1%  89.2MB/s ± 1%  -0.50%  (p=0.019 n=20+20)
      GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.797 n=20+19)
      Gzip-12                   61.3MB/s ± 3%  61.2MB/s ± 2%    ~     (p=0.834 n=19+19)
      Gunzip-12                  467MB/s ± 0%   466MB/s ± 0%  -0.07%  (p=0.027 n=18+15)
      JSONEncode-12              117MB/s ± 2%   117MB/s ± 1%    ~     (p=0.851 n=20+19)
      JSONDecode-12             33.2MB/s ± 1%  31.7MB/s ± 1%  -4.47%  (p=0.000 n=18+17)
      GoParse-12                15.6MB/s ± 2%  15.5MB/s ± 2%  -0.95%  (p=0.008 n=19+20)
      RegexpMatchEasy0_32-12     321MB/s ± 2%   320MB/s ± 1%  -0.57%  (p=0.002 n=17+17)
      RegexpMatchEasy0_1K-12    3.01GB/s ± 1%  3.01GB/s ± 1%    ~     (p=0.132 n=17+18)
      RegexpMatchEasy1_32-12     387MB/s ± 2%   389MB/s ± 1%    ~     (p=0.423 n=20+19)
      RegexpMatchEasy1_1K-12    2.05GB/s ± 2%  2.06GB/s ± 0%    ~     (p=0.129 n=19+17)
      RegexpMatchMedium_32-12   7.64MB/s ± 1%  7.66MB/s ± 1%    ~     (p=0.258 n=18+19)
      RegexpMatchMedium_1K-12   26.0MB/s ± 2%  26.2MB/s ± 1%  +0.64%  (p=0.002 n=20+19)
      RegexpMatchHard_32-12     15.7MB/s ± 2%  15.8MB/s ± 1%    ~     (p=0.510 n=20+17)
      RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.8MB/s ± 1%    ~     (p=0.603 n=19+18)
      Revcomp-12                 477MB/s ± 2%   479MB/s ± 1%    ~     (p=0.470 n=19+19)
      Template-12               28.3MB/s ± 1%  28.1MB/s ± 1%  -0.85%  (p=0.000 n=17+17)
      [Geo mean]                 100MB/s        100MB/s       -0.26%
      
      Change-Id: Ib0bfe0145675ce88c5a8791752f7486ac98805b4
      Reviewed-on: https://go-review.googlesource.com/16043Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      d3df04cd
  9. 21 Oct, 2015 2 commits
  10. 19 Oct, 2015 3 commits
  11. 16 Oct, 2015 1 commit
  12. 09 Oct, 2015 8 commits
    • Austin Clements's avatar
      runtime: assist before allocating · 65aa2da6
      Austin Clements authored
      Currently, when the mutator allocates, the runtime first allocates the
      memory and then, if that G has done "enough" allocation, the runtime
      checks whether the G has assist debt to pay off and, if so, pays it
      off. This approach leads to under-assisting, where a G can allocate a
      large region (or many small regions) before paying for it, or can even
      exit with outstanding debt.
      
      This commit flips this around so that a G always acquires enough
      credit for an allocation before it can perform that allocation. We
      continue to amortize the cost of assists by requiring that they
      over-assist when triggered to build up credit for many allocations.
      
      Fixes #11967.
      
      Change-Id: Idac9f11133b328535667674d837be72c23ebd899
      Reviewed-on: https://go-review.googlesource.com/15409Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      65aa2da6
    • Austin Clements's avatar
      runtime: directly track GC assist balance · 89c341c5
      Austin Clements authored
      Currently we track the per-G GC assist balance as two monotonically
      increasing values: the bytes allocated by the G this cycle (gcalloc)
      and the scan work performed by the G this cycle (gcscanwork). The
      assist balance is hence assistRatio*gcalloc - gcscanwork.
      
      This works, but has two important downsides:
      
      1) It requires floating-point math to figure out if a G is in debt or
         not. This makes it inappropriate to check for assist debt in the
         hot path of mallocgc, so we only do this when a G allocates a new
         span. As a result, Gs can operate "in the red", leading to
         under-assist and extended GC cycle length.
      
      2) Revising the assist ratio during a GC cycle can lead to an "assist
         burst". If you think of plotting the scan work performed versus
         heaps size, the assist ratio controls the slope of this line.
         However, in the current system, the target line always passes
         through 0 at the heap size that triggered GC, so if the runtime
         increases the assist ratio, there has to be a potentially large
         assist to jump from the current amount of scan work up to the new
         target scan work for the current heap size.
      
      This commit replaces this approach with directly tracking the GC
      assist balance in terms of allocation credit bytes. Allocating N bytes
      simply decreases this by N and assisting raises it by the amount of
      scan work performed divided by the assist ratio (to get back to
      bytes).
      
      This will make it cheap to figure out if a G is in debt, which will
      let us efficiently check if an assist is necessary *before* performing
      an allocation and hence keep Gs "in the black".
      
      This also fixes assist bursts because the assist ratio is now in terms
      of *remaining* work, rather than work from the beginning of the GC
      cycle. Hence, the plot of scan work versus heap size becomes
      continuous: we can revise the slope, but this slope always starts from
      where we are right now, rather than where we were at the beginning of
      the cycle.
      
      Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c
      Reviewed-on: https://go-review.googlesource.com/15408Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      89c341c5
    • Austin Clements's avatar
      runtime: ensure minimum heap distance via heap goal · 9e77c898
      Austin Clements authored
      Currently we ensure a minimum heap distance of 1MB when computing the
      assist ratio. Rather than enforcing this minimum on the heap distance,
      it makes more sense to enforce that the heap goal itself is at least
      1MB over the live heap size at the beginning of GC. Currently the two
      approaches are semantically equivalent, but this will let us switch to
      basing the assist ratio on current heap distance rather than the
      initial heap distance, since we can't enforce this minimum on the
      current heap distance (the GC may never finish because the goal posts
      will always be 1MB away).
      
      Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903
      Reviewed-on: https://go-review.googlesource.com/15604Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      9e77c898
    • Austin Clements's avatar
      runtime: update gcController.scanWork regularly · 8e8219de
      Austin Clements authored
      Currently, gcController.scanWork is updated as lazily as possible
      since it is only read at the end of the GC cycle. We're about to read
      it during the GC cycle to improve the assist ratio revisions, so
      modify gcDrain* to regularly flush to gcController.scanWork in much
      the same way as we regularly flush to gcController.bgScanCredit.
      
      One consequence of this is that it's difficult to keep gcw.scanWork
      monotonic, so we give up on that and simply return the amount of scan
      work done by gcDrainN rather than calculating it in the caller.
      
      Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
      Reviewed-on: https://go-review.googlesource.com/15407Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      8e8219de
    • Austin Clements's avatar
      runtime: control background scan credit flushing with flag · c18b163c
      Austin Clements authored
      Currently callers of gcDrain control whether it flushes scan work
      credit to gcController.bgScanCredit by passing a value other than -1
      for the flush threshold. Shortly we're going to make this always flush
      scan work to gcController.scanWork and optionally also flush scan work
      to gcController.bgScanCredit. This will be much easier if the flush
      threshold is simply a constant (which it is in practice) and callers
      merely control whether or not the flush includes the background
      credit. Hence, replace the flush threshold argument with a flag.
      
      Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e
      Reviewed-on: https://go-review.googlesource.com/15406Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      c18b163c
    • Austin Clements's avatar
      runtime: consolidate gcDrain and gcDrainUntilPreempt · 9b3cdaf0
      Austin Clements authored
      These functions were nearly identical. Consolidate them by adding a
      flags argument. In addition to cleaning up this code, this makes
      further changes that affect both functions easier.
      
      Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f
      Reviewed-on: https://go-review.googlesource.com/15405Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      9b3cdaf0
    • Austin Clements's avatar
      runtime: explain why continuous assist revising is necessary · 39ed6822
      Austin Clements authored
      Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313
      Reviewed-on: https://go-review.googlesource.com/15404Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      39ed6822
    • Austin Clements's avatar
      runtime: fix comment for assistRatio · 3e57b17d
      Austin Clements authored
      The comment for assistRatio claimed it to be the reciprocal of what it
      actually is.
      
      Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8
      Reviewed-on: https://go-review.googlesource.com/15402Reviewed-by: default avatarRuss Cox <rsc@golang.org>
      3e57b17d
  13. 02 Oct, 2015 2 commits
    • Austin Clements's avatar
      runtime: remove sweep wait loop in finishsweep_m · 9a31d38f
      Austin Clements authored
      In general, finishsweep_m must block until any spans that are
      concurrently being swept have been swept. It accomplishes this by
      looping over all spans, which, as in the previous commit, takes
      ~1ms/heap GB. Unfortunately, we do this during the STW sweep
      termination phase, so multi-gigabyte heaps can push our STW time past
      10ms.
      
      However, there's no need to do this wait if the world is stopped
      because, in effect, stopping the world already had to wait for
      anything that was sweeping (and if it didn't, the wait in
      finishsweep_m would deadlock). Hence, we can simply skip this loop if
      the world is stopped, such as during sweep termination. In fact,
      currently all calls to finishsweep_m are STW, but this hasn't always
      been the case and may not be the case in the future, so we keep the
      logic around.
      
      For 24GB heaps, this reduces max pause time by 75% relative to tip and
      by 90% relative to Go 1.5. Notably, all pauses are now well under
      10ms. Here are the results for the garbage benchmark:
      
                     ------------- max pause ------------
      Heap   Procs   after change   before change   1.5.1
      24GB     12        3.8ms          16ms         37ms
      24GB      4        3.7ms          16ms         37ms
       4GB      4        3.7ms           3ms        6.9ms
      
      In the 4GB/4P case, it seems the "before change" run got lucky: the
      max went up, but the 99%ile pause time went down from 3ms to 2.04ms.
      
      Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31
      Reviewed-on: https://go-review.googlesource.com/15071Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      9a31d38f
    • Austin Clements's avatar
      runtime: remove in-use page count loop from STW · dac220b0
      Austin Clements authored
      In order to compute the sweep ratio, the runtime needs to know how
      many pages belong to spans in state _MSpanInUse. Currently it finds
      this out by looping over all spans during mark termination. However,
      this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our
      STW time past 10ms.
      
      Replace the loop with an actively maintained count of in-use pages.
      
      For multi-gigabyte heaps, this reduces max mark termination pause time
      by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This
      shifts the longest pause time for large heaps to the sweep termination
      phase, so it only slightly decreases max pause time, though it roughly
      halves mean pause time. Here are the results for the garbage
      benchmark:
      
                     ---- max mark termination pause ----
      Heap   Procs   after change   before change   1.5.1
      24GB     12        1.9ms          18ms         37ms
      24GB      4        3.7ms          18ms         37ms
       4GB      4        920µs         3.8ms        6.9ms
      
      Fixes #11484.
      
      Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac
      Reviewed-on: https://go-review.googlesource.com/15070Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      dac220b0