1. 06 Sep, 2016 13 commits
    • Austin Clements's avatar
      runtime: bound scanobject to ~100 µs · cf4f1d07
      Austin Clements authored
      Currently the time spent in scanobject is proportional to the size of
      the object being scanned. Since scanobject is non-preemptible, large
      objects can cause significant goroutine (and even whole application)
      delays through several means:
      
      1. If a GC assist picks up a large object, the allocating goroutine is
         blocked for the whole scan, even if that scan well exceeds that
         goroutine's debt.
      
      2. Since the scheduler does not run on the P performing a large object
         scan, goroutines in that P's run queue do not run unless they are
         stolen by another P (which can take some time). If there are a few
         large objects, all of the Ps may get tied up so the scheduler
         doesn't run anywhere.
      
      3. Even if a large object is scanned by a background worker and other
         Ps are still running the scheduler, the large object scan doesn't
         flush background credit until the whole scan is done. This can
         easily cause all allocations to block in assists, waiting for
         credit, causing an effective STW.
      
      Fix this by splitting large objects into 128 KB "oblets" and scanning
      at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
      to bounding scanobject at roughly 100 µs. This improves assist
      behavior both because assists can no longer get "unlucky" and be stuck
      scanning a large object, and because it causes the background worker
      to flush credit and unblock assists more frequently when scanning
      large objects. This also improves GC parallelism if the heap consists
      primarily of a small number of very large objects by letting multiple
      workers scan a large objects in parallel.
      
      Fixes #10345. Fixes #16293.
      
      This substantially improves goroutine latency in the benchmark from
      issue #16293, which exercises several forms of very large objects:
      
      name                 old max-latency    new max-latency    delta
      SliceNoPointer-12           154µs ± 1%        155µs ±  2%     ~     (p=0.087 n=13+12)
      SlicePointer-12             314ms ± 1%       5.94ms ±138%  -98.11%  (p=0.000 n=19+20)
      SliceLivePointer-12        1148ms ± 0%       4.72ms ±167%  -99.59%  (p=0.000 n=19+20)
      MapNoPointer-12           72509µs ± 1%        408µs ±325%  -99.44%  (p=0.000 n=19+18)
      ChanPointer-12              313ms ± 0%       4.74ms ±140%  -98.49%  (p=0.000 n=18+20)
      ChanLivePointer-12         1147ms ± 0%       3.30ms ±149%  -99.71%  (p=0.000 n=19+20)
      
      name                 old P99.9-latency  new P99.9-latency  delta
      SliceNoPointer-12           113µs ±25%         107µs ±12%     ~     (p=0.153 n=20+18)
      SlicePointer-12          309450µs ± 0%         133µs ±23%  -99.96%  (p=0.000 n=20+20)
      SliceLivePointer-12         961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      MapNoPointer-12            448µs ±288%         119µs ±18%  -73.34%  (p=0.000 n=18+20)
      ChanPointer-12           309450µs ± 0%         134µs ±23%  -99.96%  (p=0.000 n=20+19)
      ChanLivePointer-12          961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      
      This has negligible effect on all metrics from the garbage, JSON, and
      HTTP x/benchmarks.
      
      It shows slight improvement on some of the go1 benchmarks,
      particularly Revcomp, which uses some multi-megabyte buffers:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.46s ± 1%     2.47s ± 1%  +0.32%  (p=0.012 n=20+20)
      Fannkuch11-12                2.82s ± 0%     2.81s ± 0%  -0.61%  (p=0.000 n=17+20)
      FmtFprintfEmpty-12          50.8ns ± 5%    50.5ns ± 2%    ~     (p=0.197 n=17+19)
      FmtFprintfString-12          131ns ± 1%     132ns ± 0%  +0.57%  (p=0.000 n=20+16)
      FmtFprintfInt-12             117ns ± 0%     116ns ± 0%  -0.47%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          180ns ± 0%     179ns ± 1%  -0.78%  (p=0.000 n=16+20)
      FmtFprintfPrefixedInt-12     186ns ± 1%     185ns ± 1%  -0.55%  (p=0.000 n=19+20)
      FmtFprintfFloat-12           263ns ± 1%     271ns ± 0%  +2.84%  (p=0.000 n=18+20)
      FmtManyArgs-12               741ns ± 1%     742ns ± 1%    ~     (p=0.190 n=19+19)
      GobDecode-12                7.44ms ± 0%    7.35ms ± 1%  -1.21%  (p=0.000 n=20+20)
      GobEncode-12                6.22ms ± 1%    6.21ms ± 1%    ~     (p=0.336 n=20+19)
      Gzip-12                      220ms ± 1%     219ms ± 1%    ~     (p=0.130 n=19+19)
      Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=1.000 n=20+19)
      HTTPClientServer-12         82.5µs ± 3%    82.6µs ± 3%    ~     (p=0.776 n=20+19)
      JSONEncode-12               16.4ms ± 1%    16.5ms ± 2%  +0.49%  (p=0.003 n=18+19)
      JSONDecode-12               53.7ms ± 1%    54.1ms ± 1%  +0.71%  (p=0.000 n=19+18)
      Mandelbrot200-12            4.19ms ± 1%    4.20ms ± 1%    ~     (p=0.452 n=19+19)
      GoParse-12                  3.38ms ± 1%    3.37ms ± 1%    ~     (p=0.123 n=19+19)
      RegexpMatchEasy0_32-12      72.1ns ± 1%    71.8ns ± 1%    ~     (p=0.397 n=19+17)
      RegexpMatchEasy0_1K-12       242ns ± 0%     242ns ± 0%    ~     (p=0.168 n=17+20)
      RegexpMatchEasy1_32-12      72.1ns ± 1%    72.1ns ± 1%    ~     (p=0.538 n=18+19)
      RegexpMatchEasy1_1K-12       385ns ± 1%     384ns ± 1%    ~     (p=0.388 n=20+20)
      RegexpMatchMedium_32-12      112ns ± 1%     112ns ± 3%    ~     (p=0.539 n=20+20)
      RegexpMatchMedium_1K-12     34.4µs ± 2%    34.4µs ± 2%    ~     (p=0.628 n=18+18)
      RegexpMatchHard_32-12       1.80µs ± 1%    1.80µs ± 1%    ~     (p=0.522 n=18+19)
      RegexpMatchHard_1K-12       54.0µs ± 1%    54.1µs ± 1%    ~     (p=0.647 n=20+19)
      Revcomp-12                   387ms ± 1%     369ms ± 5%  -4.89%  (p=0.000 n=17+19)
      Template-12                 62.3ms ± 1%    62.0ms ± 0%  -0.48%  (p=0.002 n=20+17)
      TimeParse-12                 314ns ± 1%     314ns ± 0%    ~     (p=1.011 n=20+13)
      TimeFormat-12                358ns ± 0%     354ns ± 0%  -1.12%  (p=0.000 n=17+20)
      [Geo mean]                  53.5µs         53.3µs       -0.23%
      
      Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
      Reviewed-on: https://go-review.googlesource.com/23540
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      cf4f1d07
    • Austin Clements's avatar
      runtime: clean up more traces of the old mark bit · b275e55d
      Austin Clements authored
      Commit 59877bfa renamed bitMarked to bitScan, since the bitmap is no
      longer used for marking. However, there were several other references
      to this strewn about comments and in some other constant names. Fix
      these up, too.
      
      Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
      Reviewed-on: https://go-review.googlesource.com/28450
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b275e55d
    • Cherry Zhang's avatar
      cmd/compile: remove nil check if followed by storezero on ARM64, MIPS64 · 4d5bb762
      Cherry Zhang authored
      Change-Id: Ib90c92056fa70b27feb734837794ef53e842c41a
      Reviewed-on: https://go-review.googlesource.com/28513
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      4d5bb762
    • David Chase's avatar
      cmd/compile: remove ld/st-followed nil checks for PPC64 · 0e0ab203
      David Chase authored
      Enabled checks (except for DUFF-ops which aren't implemented yet).
      Added ppc64le to relevant test.
      
      Also updated register list to reflect no-longer-reserved-
      for-constants status (file was missed in that change).
      
      Updates #16010.
      
      Change-Id: I31b1aac19e14994f760f2ecd02edbeb1f78362e7
      Reviewed-on: https://go-review.googlesource.com/28548
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0e0ab203
    • David Crawshaw's avatar
      cmd/link: remove outdated cast and comment · b926bf83
      David Crawshaw authored
      This program is written in Go now.
      
      Change-Id: Ieec21a1bcac7c7a59e88cd1e1359977659de1757
      Reviewed-on: https://go-review.googlesource.com/28549Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      b926bf83
    • Aliaksandr Valialkin's avatar
      regexp: reduce mallocs in Regexp.Find* and Regexp.ReplaceAll*. · bea39e63
      Aliaksandr Valialkin authored
      This improves Regexp.Find* and Regexp.ReplaceAll* speed:
      
      name                  old time/op    new time/op    delta
      Find-4                   345ns ± 1%     314ns ± 1%    -8.94%    (p=0.000 n=9+8)
      FindString-4             341ns ± 1%     308ns ± 0%    -9.85%   (p=0.000 n=10+9)
      FindSubmatch-4           440ns ± 1%     404ns ± 0%    -8.27%   (p=0.000 n=10+8)
      FindStringSubmatch-4     426ns ± 0%     387ns ± 0%    -9.07%   (p=0.000 n=10+9)
      ReplaceAll-4            1.75µs ± 1%    1.67µs ± 0%    -4.45%   (p=0.000 n=9+10)
      
      name                  old alloc/op   new alloc/op   delta
      Find-4                   16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4             16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4           80.0B ± 0%     48.0B ± 0%   -40.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              152B ± 0%      104B ± 0%   -31.58%  (p=0.000 n=10+10)
      
      name                  old allocs/op  new allocs/op  delta
      Find-4                    1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4              1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4            2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              8.00 ± 0%      5.00 ± 0%   -37.50%  (p=0.000 n=10+10)
      
      Fixes #15643
      
      Change-Id: I594fe51172373e2adb98d1d25c76ca2cde54ff48
      Reviewed-on: https://go-review.googlesource.com/23030Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bea39e63
    • David Crawshaw's avatar
      cmd/compile: generate table of main symbol types · 5923df1a
      David Crawshaw authored
      For each exported symbol in package main, add its name and type to
      go.plugin.tabs symbol. This is used by the runtime when loading a
      plugin to return a typed interface{} value.
      
      Change-Id: I23c39583e57180acb8f7a74d218dae4368614f46
      Reviewed-on: https://go-review.googlesource.com/27818
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      5923df1a
    • Ilya Tocar's avatar
      math: fix sqrt regression on AMD64 · 6e703ae7
      Ilya Tocar authored
      1.7 introduced a significant regression compared to 1.6:
      
      SqrtIndirect-4  2.32ns ± 0%  7.86ns ± 0%  +238.79%        (p=0.000 n=20+18)
      
      This is caused by sqrtsd preserving upper part of destination register.
      Which introduces dependency on previous  value of X0.
      In 1.6 benchmark loop didn't use X0 immediately after call:
      
      callq  *%rbx
      movsd  0x8(%rsp),%xmm2
      movsd  0x20(%rsp),%xmm1
      addsd  %xmm2,%xmm1
      mov    0x18(%rsp),%rax
      inc    %rax
      jmp    loop
      
      In 1.7 however xmm0 is used just after call:
      
      callq  *%rbx
      mov    0x10(%rsp),%rcx
      lea    0x1(%rcx),%rax
      movsd  0x8(%rsp),%xmm0
      movsd  0x18(%rsp),%xmm1
      
      I've  verified that this is caused by dependency, by inserting
      XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on 1.6 level.
      
      Splitting SQRTSD mem,reg into:
      MOVSD mem,reg
      SQRTSD reg,reg
      
      Removes dependency, because MOVSD (load version)
      doesn't need to preserve upper part of a register.
      And reg,reg operation is solved by renamer in CPU.
      
      As a result of this change regression is gone:
      SqrtIndirect-4  7.86ns ± 0%  2.33ns ± 0%  -70.36%  (p=0.000 n=18+17)
      
      This also removes old Sqrt benchmarks, in favor of benchmarks measuring latency.
      Only SqrtIndirect is kept, to show impact of this patch.
      
      Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
      Reviewed-on: https://go-review.googlesource.com/28392
      Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      6e703ae7
    • Josh Bleecher Snyder's avatar
      cmd/go: run mkalldocs.sh · 6bcca5e9
      Josh Bleecher Snyder authored
      This should have happened as part of CL 28485.
      
      Change-Id: I63cd31303e542ceaec3f4002c5573f186a1e9a52
      Reviewed-on: https://go-review.googlesource.com/28547
      Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
      Reviewed-by: default avatarDavid Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      6bcca5e9
    • Cherry Zhang's avatar
      cmd/compile: fix intrinsifying sync/atomic.Swap* on AMD64 · 644c16c7
      Cherry Zhang authored
      It should alias to Xchg instead of Swap. Found when testing #16985.
      
      Change-Id: If9fd734a1f89b8b2656f421eb31b9d1b0d95a49f
      Reviewed-on: https://go-review.googlesource.com/28512
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      644c16c7
    • Cherry Zhang's avatar
      cmd/compile: mark some AMD64 atomic ops as clobberFlags · f1ef5a06
      Cherry Zhang authored
      Fixes #16985.
      
      Change-Id: I5954db28f7b70dd3ac7768e471d5df871a5b20f9
      Reviewed-on: https://go-review.googlesource.com/28510
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarKeith Randall <khr@golang.org>
      f1ef5a06
    • Brad Fitzpatrick's avatar
      syscall: add yet more TestGetfsstat debugging · 6db13e07
      Brad Fitzpatrick authored
      Updates #16937
      
      Change-Id: I98aa203176f8f2ca2fcca6e334a65bc60d6f824d
      Reviewed-on: https://go-review.googlesource.com/28535Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      6db13e07
    • Erik Staab's avatar
      runtime: remove redundant expression from SetFinalizer · 66121ce8
      Erik Staab authored
      The previous if condition already checks the same expression and doesn't
      have side effects.
      
      Change-Id: Ieaf30a786572b608d0a883052b45fd3f04bc6147
      Reviewed-on: https://go-review.googlesource.com/28475Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      66121ce8
  2. 05 Sep, 2016 7 commits
  3. 04 Sep, 2016 11 commits
  4. 03 Sep, 2016 2 commits
  5. 02 Sep, 2016 7 commits