1. 29 Sep, 2016 8 commits
  2. 28 Sep, 2016 8 commits
  3. 27 Sep, 2016 21 commits
  4. 26 Sep, 2016 3 commits
    • Austin Clements's avatar
      runtime: optimize defer code · f8b2314c
      Austin Clements authored
      This optimizes deferproc and deferreturn in various ways.
      
      The most important optimization is that it more carefully arranges to
      prevent preemption or stack growth. Currently we do this by switching
      to the system stack on every deferproc and every deferreturn. While we
      need to be on the system stack for the slow path of allocating and
      freeing defers, in the common case we can fit in the nosplit stack.
      Hence, this change pushes the system stack switch down into the slow
      paths and makes everything now exposed to the user stack nosplit. This
      also eliminates the need for various acquirem/releasem pairs, since we
      are now preventing preemption by preventing stack split checks.
      
      As another smaller optimization, we special case the common cases of
      zero-sized and pointer-sized defer frames to respectively skip the
      copy and perform the copy in line instead of calling memmove.
      
      This speeds up the runtime defer benchmark by 42%:
      
      name           old time/op  new time/op  delta
      Defer-4        75.1ns ± 1%  43.3ns ± 1%  -42.31%   (p=0.000 n=8+10)
      
      In reality, this speeds up defer by about 2.2X. The two benchmarks
      below compare a Lock/defer Unlock pair (DeferLock) with a Lock/Unlock
      pair (NoDeferLock). NoDeferLock establishes a baseline cost, so these
      two benchmarks together show that this change reduces the overhead of
      defer from 61.4ns to 27.9ns.
      
      name           old time/op  new time/op  delta
      DeferLock-4    77.4ns ± 1%  43.9ns ± 1%  -43.31%  (p=0.000 n=10+10)
      NoDeferLock-4  16.0ns ± 0%  15.9ns ± 0%   -0.39%    (p=0.000 n=9+8)
      
      This also shaves 34ns off cgo calls:
      
      name       old time/op  new time/op  delta
      CgoNoop-4   122ns ± 1%  88.3ns ± 1%  -27.72%  (p=0.000 n=8+9)
      
      Updates #14939, #16051.
      
      Change-Id: I2baa0dea378b7e4efebbee8fca919a97d5e15f38
      Reviewed-on: https://go-review.googlesource.com/29656Reviewed-by: default avatarKeith Randall <khr@golang.org>
      f8b2314c
    • Austin Clements's avatar
      runtime: implement getcallersp in Go · d211c2d3
      Austin Clements authored
      This makes it possible to inline getcallersp. getcallersp is on the
      hot path of defers, so this slightly speeds up defer:
      
      name           old time/op  new time/op  delta
      Defer-4        78.3ns ± 2%  75.1ns ± 1%  -4.00%   (p=0.000 n=9+8)
      
      Updates #14939.
      
      Change-Id: Icc1cc4cd2f0a81fc4c8344432d0b2e783accacdd
      Reviewed-on: https://go-review.googlesource.com/29655
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      d211c2d3
    • Austin Clements's avatar
      runtime: update malloc.go documentation · aaf4099a
      Austin Clements authored
      The big documentation comment at the top of malloc.go has gotten
      woefully out of date. Update it.
      
      Change-Id: Ibdb1bdcfdd707a6dc9db79d0633a36a28882301b
      Reviewed-on: https://go-review.googlesource.com/29731Reviewed-by: default avatarHyang-Ah Hana Kim <hyangah@gmail.com>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      aaf4099a