1. 08 Sep, 2016 14 commits
  2. 07 Sep, 2016 13 commits
  3. 06 Sep, 2016 13 commits
    • Hiroshi Ioka's avatar
      bytes: make IndexRune faster · e10286ae
      Hiroshi Ioka authored
      re-implement IndexRune by IndexByte and Index which are well optimized
      to get performance gain.
      
      name                  old time/op   new time/op     delta
      IndexRune/10-4         53.2ns ± 1%     29.1ns ± 1%    -45.32%  (p=0.008 n=5+5)
      IndexRune/32-4          191ns ± 1%       27ns ± 1%    -85.75%  (p=0.008 n=5+5)
      IndexRune/4K-4         23.5µs ± 1%      1.0µs ± 1%    -95.77%  (p=0.008 n=5+5)
      IndexRune/4M-4         23.8ms ± 0%      1.0ms ± 2%    -95.90%  (p=0.008 n=5+5)
      IndexRune/64M-4         384ms ± 1%       15ms ± 1%    -95.98%  (p=0.008 n=5+5)
      IndexRuneASCII/10-4    61.5ns ± 0%     10.3ns ± 4%    -83.17%  (p=0.008 n=5+5)
      IndexRuneASCII/32-4     203ns ± 0%       11ns ± 5%    -94.68%  (p=0.008 n=5+5)
      IndexRuneASCII/4K-4    23.4µs ± 0%      0.3µs ± 2%    -98.60%  (p=0.008 n=5+5)
      IndexRuneASCII/4M-4    24.0ms ± 1%      0.3ms ± 1%    -98.60%  (p=0.008 n=5+5)
      IndexRuneASCII/64M-4    386ms ± 2%        6ms ± 1%    -98.57%  (p=0.008 n=5+5)
      
      name                  old speed     new speed       delta
      IndexRune/10-4        188MB/s ± 1%    344MB/s ± 1%    +82.91%  (p=0.008 n=5+5)
      IndexRune/32-4        167MB/s ± 0%   1175MB/s ± 1%   +603.52%  (p=0.008 n=5+5)
      IndexRune/4K-4        174MB/s ± 1%   4117MB/s ± 1%  +2262.71%  (p=0.008 n=5+5)
      IndexRune/4M-4        176MB/s ± 0%   4299MB/s ± 2%  +2340.46%  (p=0.008 n=5+5)
      IndexRune/64M-4       175MB/s ± 1%   4354MB/s ± 1%  +2388.57%  (p=0.008 n=5+5)
      IndexRuneASCII/10-4   163MB/s ± 0%    968MB/s ± 4%   +494.66%  (p=0.008 n=5+5)
      IndexRuneASCII/32-4   157MB/s ± 0%   2974MB/s ± 4%  +1788.59%  (p=0.008 n=5+5)
      IndexRuneASCII/4K-4   175MB/s ± 0%  12481MB/s ± 2%  +7027.71%  (p=0.008 n=5+5)
      IndexRuneASCII/4M-4   175MB/s ± 1%  12510MB/s ± 1%  +7061.15%  (p=0.008 n=5+5)
      IndexRuneASCII/64M-4  174MB/s ± 2%  12143MB/s ± 1%  +6881.70%  (p=0.008 n=5+5)
      
      Change-Id: I0632eadb83937c2a9daa7f0ce79df1dee64f992e
      Reviewed-on: https://go-review.googlesource.com/28537
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      e10286ae
    • Austin Clements's avatar
      runtime/debug: enable TestFreeOSMemory on all arches · 8259cf3c
      Austin Clements authored
      TestFreeOSMemory was disabled on many arches because of issue #9993.
      Since that's been fixed, enable the test everywhere.
      
      Change-Id: I298c38c3e04128d9c8a1f558980939d5699bea03
      Reviewed-on: https://go-review.googlesource.com/27403
      Run-TryBot: Austin Clements <austin@google.com>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Reviewed-by: default avatarMinux Ma <minux@golang.org>
      8259cf3c
    • Austin Clements's avatar
      syscall: make Getpagesize return page size from runtime · 1b9499b0
      Austin Clements authored
      syscall.Getpagesize currently returns hard-coded page sizes on all
      architectures (some of which are probably always wrong, and some of
      which are definitely not always right). The runtime now has this
      information, queried from the OS during runtime init, so make
      syscall.Getpagesize return the page size that the runtime knows.
      
      Updates #10180.
      
      Change-Id: I4daa6fbc61a2193eb8fa9e7878960971205ac346
      Reviewed-on: https://go-review.googlesource.com/25051
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      1b9499b0
    • Austin Clements's avatar
      runtime: don't hard-code physical page size · 6dda7b2f
      Austin Clements authored
      Now that the runtime fetches the true physical page size from the OS,
      make the physical page size used by heap growth a variable instead of
      a constant. This isn't used in any performance-critical paths, so it
      shouldn't be an issue.
      
      sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
      clear that it's not necessarily the true page size. There are no uses
      of this constant any more, but we'll keep it around for now.
      
      Updates #12480 and #10180.
      
      Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
      Reviewed-on: https://go-review.googlesource.com/25022
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      6dda7b2f
    • Austin Clements's avatar
      runtime: fetch physical page size from the OS · 276a52de
      Austin Clements authored
      Currently the physical page size assumed by the runtime is hard-coded.
      On Linux the runtime at least fetches the OS page size during init and
      sanity checks against the hard-coded value, but they may still differ.
      On other OSes we wouldn't even notice.
      
      Add support on all OSes to fetch the actual OS physical page size
      during runtime init and lift the sanity check of PhysPageSize from the
      Linux init code to general malloc init. Currently this is the only use
      of the retrieved page size, but we'll add more shortly.
      
      Updates #12480 and #10180.
      
      Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da
      Reviewed-on: https://go-review.googlesource.com/25050
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      276a52de
    • Austin Clements's avatar
      runtime: assume 64kB physical pages on ARM · d7de8b6d
      Austin Clements authored
      Currently we assume the physical page size on ARM is 4kB. While this
      is usually true, the architecture also supports 16kB and 64kB physical
      pages, and Linux (and possibly other OSes) can be configured to use
      these larger page sizes.
      
      With Go 1.6, such a configuration could potentially run, but generally
      resulted in memory corruption or random panics. With current master,
      this configuration will cause the runtime to panic during init on
      Linux when it checks the true physical page size (and will still cause
      corruption or panics on other OSes).
      
      However, the assumed physical page size only has to be a multiple of
      the true physical page size, the scavenger can now deal with large
      physical page sizes, and the rest of the runtime can deal with a
      larger assumed physical page size than the true size. Hence, there's
      little disadvantage to conservatively setting the assumed physical
      page size to 64kB on ARM.
      
      This may result in some extra memory use, since we can only return
      memory at multiples of the assumed physical page size. However, it is
      a simple change that should make Go run on systems configured for
      larger page sizes. The following commits will make the runtime query
      the actual physical page size from the OS, but this is a simple step
      there.
      
      Updates #12480.
      
      Change-Id: I851829595bc9e0c76235c847a7b5f62ad82b5302
      Reviewed-on: https://go-review.googlesource.com/25021
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarMinux Ma <minux@golang.org>
      d7de8b6d
    • Austin Clements's avatar
      runtime: bound scanobject to ~100 µs · cf4f1d07
      Austin Clements authored
      Currently the time spent in scanobject is proportional to the size of
      the object being scanned. Since scanobject is non-preemptible, large
      objects can cause significant goroutine (and even whole application)
      delays through several means:
      
      1. If a GC assist picks up a large object, the allocating goroutine is
         blocked for the whole scan, even if that scan well exceeds that
         goroutine's debt.
      
      2. Since the scheduler does not run on the P performing a large object
         scan, goroutines in that P's run queue do not run unless they are
         stolen by another P (which can take some time). If there are a few
         large objects, all of the Ps may get tied up so the scheduler
         doesn't run anywhere.
      
      3. Even if a large object is scanned by a background worker and other
         Ps are still running the scheduler, the large object scan doesn't
         flush background credit until the whole scan is done. This can
         easily cause all allocations to block in assists, waiting for
         credit, causing an effective STW.
      
      Fix this by splitting large objects into 128 KB "oblets" and scanning
      at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
      to bounding scanobject at roughly 100 µs. This improves assist
      behavior both because assists can no longer get "unlucky" and be stuck
      scanning a large object, and because it causes the background worker
      to flush credit and unblock assists more frequently when scanning
      large objects. This also improves GC parallelism if the heap consists
      primarily of a small number of very large objects by letting multiple
      workers scan a large objects in parallel.
      
      Fixes #10345. Fixes #16293.
      
      This substantially improves goroutine latency in the benchmark from
      issue #16293, which exercises several forms of very large objects:
      
      name                 old max-latency    new max-latency    delta
      SliceNoPointer-12           154µs ± 1%        155µs ±  2%     ~     (p=0.087 n=13+12)
      SlicePointer-12             314ms ± 1%       5.94ms ±138%  -98.11%  (p=0.000 n=19+20)
      SliceLivePointer-12        1148ms ± 0%       4.72ms ±167%  -99.59%  (p=0.000 n=19+20)
      MapNoPointer-12           72509µs ± 1%        408µs ±325%  -99.44%  (p=0.000 n=19+18)
      ChanPointer-12              313ms ± 0%       4.74ms ±140%  -98.49%  (p=0.000 n=18+20)
      ChanLivePointer-12         1147ms ± 0%       3.30ms ±149%  -99.71%  (p=0.000 n=19+20)
      
      name                 old P99.9-latency  new P99.9-latency  delta
      SliceNoPointer-12           113µs ±25%         107µs ±12%     ~     (p=0.153 n=20+18)
      SlicePointer-12          309450µs ± 0%         133µs ±23%  -99.96%  (p=0.000 n=20+20)
      SliceLivePointer-12         961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      MapNoPointer-12            448µs ±288%         119µs ±18%  -73.34%  (p=0.000 n=18+20)
      ChanPointer-12           309450µs ± 0%         134µs ±23%  -99.96%  (p=0.000 n=20+19)
      ChanLivePointer-12          961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
      
      This has negligible effect on all metrics from the garbage, JSON, and
      HTTP x/benchmarks.
      
      It shows slight improvement on some of the go1 benchmarks,
      particularly Revcomp, which uses some multi-megabyte buffers:
      
      name                      old time/op    new time/op    delta
      BinaryTree17-12              2.46s ± 1%     2.47s ± 1%  +0.32%  (p=0.012 n=20+20)
      Fannkuch11-12                2.82s ± 0%     2.81s ± 0%  -0.61%  (p=0.000 n=17+20)
      FmtFprintfEmpty-12          50.8ns ± 5%    50.5ns ± 2%    ~     (p=0.197 n=17+19)
      FmtFprintfString-12          131ns ± 1%     132ns ± 0%  +0.57%  (p=0.000 n=20+16)
      FmtFprintfInt-12             117ns ± 0%     116ns ± 0%  -0.47%  (p=0.000 n=15+20)
      FmtFprintfIntInt-12          180ns ± 0%     179ns ± 1%  -0.78%  (p=0.000 n=16+20)
      FmtFprintfPrefixedInt-12     186ns ± 1%     185ns ± 1%  -0.55%  (p=0.000 n=19+20)
      FmtFprintfFloat-12           263ns ± 1%     271ns ± 0%  +2.84%  (p=0.000 n=18+20)
      FmtManyArgs-12               741ns ± 1%     742ns ± 1%    ~     (p=0.190 n=19+19)
      GobDecode-12                7.44ms ± 0%    7.35ms ± 1%  -1.21%  (p=0.000 n=20+20)
      GobEncode-12                6.22ms ± 1%    6.21ms ± 1%    ~     (p=0.336 n=20+19)
      Gzip-12                      220ms ± 1%     219ms ± 1%    ~     (p=0.130 n=19+19)
      Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=1.000 n=20+19)
      HTTPClientServer-12         82.5µs ± 3%    82.6µs ± 3%    ~     (p=0.776 n=20+19)
      JSONEncode-12               16.4ms ± 1%    16.5ms ± 2%  +0.49%  (p=0.003 n=18+19)
      JSONDecode-12               53.7ms ± 1%    54.1ms ± 1%  +0.71%  (p=0.000 n=19+18)
      Mandelbrot200-12            4.19ms ± 1%    4.20ms ± 1%    ~     (p=0.452 n=19+19)
      GoParse-12                  3.38ms ± 1%    3.37ms ± 1%    ~     (p=0.123 n=19+19)
      RegexpMatchEasy0_32-12      72.1ns ± 1%    71.8ns ± 1%    ~     (p=0.397 n=19+17)
      RegexpMatchEasy0_1K-12       242ns ± 0%     242ns ± 0%    ~     (p=0.168 n=17+20)
      RegexpMatchEasy1_32-12      72.1ns ± 1%    72.1ns ± 1%    ~     (p=0.538 n=18+19)
      RegexpMatchEasy1_1K-12       385ns ± 1%     384ns ± 1%    ~     (p=0.388 n=20+20)
      RegexpMatchMedium_32-12      112ns ± 1%     112ns ± 3%    ~     (p=0.539 n=20+20)
      RegexpMatchMedium_1K-12     34.4µs ± 2%    34.4µs ± 2%    ~     (p=0.628 n=18+18)
      RegexpMatchHard_32-12       1.80µs ± 1%    1.80µs ± 1%    ~     (p=0.522 n=18+19)
      RegexpMatchHard_1K-12       54.0µs ± 1%    54.1µs ± 1%    ~     (p=0.647 n=20+19)
      Revcomp-12                   387ms ± 1%     369ms ± 5%  -4.89%  (p=0.000 n=17+19)
      Template-12                 62.3ms ± 1%    62.0ms ± 0%  -0.48%  (p=0.002 n=20+17)
      TimeParse-12                 314ns ± 1%     314ns ± 0%    ~     (p=1.011 n=20+13)
      TimeFormat-12                358ns ± 0%     354ns ± 0%  -1.12%  (p=0.000 n=17+20)
      [Geo mean]                  53.5µs         53.3µs       -0.23%
      
      Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
      Reviewed-on: https://go-review.googlesource.com/23540
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      cf4f1d07
    • Austin Clements's avatar
      runtime: clean up more traces of the old mark bit · b275e55d
      Austin Clements authored
      Commit 59877bfa renamed bitMarked to bitScan, since the bitmap is no
      longer used for marking. However, there were several other references
      to this strewn about comments and in some other constant names. Fix
      these up, too.
      
      Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
      Reviewed-on: https://go-review.googlesource.com/28450
      Run-TryBot: Austin Clements <austin@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarRick Hudson <rlh@golang.org>
      b275e55d
    • Cherry Zhang's avatar
      cmd/compile: remove nil check if followed by storezero on ARM64, MIPS64 · 4d5bb762
      Cherry Zhang authored
      Change-Id: Ib90c92056fa70b27feb734837794ef53e842c41a
      Reviewed-on: https://go-review.googlesource.com/28513
      Run-TryBot: Cherry Zhang <cherryyz@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarDavid Chase <drchase@google.com>
      4d5bb762
    • David Chase's avatar
      cmd/compile: remove ld/st-followed nil checks for PPC64 · 0e0ab203
      David Chase authored
      Enabled checks (except for DUFF-ops which aren't implemented yet).
      Added ppc64le to relevant test.
      
      Also updated register list to reflect no-longer-reserved-
      for-constants status (file was missed in that change).
      
      Updates #16010.
      
      Change-Id: I31b1aac19e14994f760f2ecd02edbeb1f78362e7
      Reviewed-on: https://go-review.googlesource.com/28548
      Run-TryBot: David Chase <drchase@google.com>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarCherry Zhang <cherryyz@google.com>
      0e0ab203
    • David Crawshaw's avatar
      cmd/link: remove outdated cast and comment · b926bf83
      David Crawshaw authored
      This program is written in Go now.
      
      Change-Id: Ieec21a1bcac7c7a59e88cd1e1359977659de1757
      Reviewed-on: https://go-review.googlesource.com/28549Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      b926bf83
    • Aliaksandr Valialkin's avatar
      regexp: reduce mallocs in Regexp.Find* and Regexp.ReplaceAll*. · bea39e63
      Aliaksandr Valialkin authored
      This improves Regexp.Find* and Regexp.ReplaceAll* speed:
      
      name                  old time/op    new time/op    delta
      Find-4                   345ns ± 1%     314ns ± 1%    -8.94%    (p=0.000 n=9+8)
      FindString-4             341ns ± 1%     308ns ± 0%    -9.85%   (p=0.000 n=10+9)
      FindSubmatch-4           440ns ± 1%     404ns ± 0%    -8.27%   (p=0.000 n=10+8)
      FindStringSubmatch-4     426ns ± 0%     387ns ± 0%    -9.07%   (p=0.000 n=10+9)
      ReplaceAll-4            1.75µs ± 1%    1.67µs ± 0%    -4.45%   (p=0.000 n=9+10)
      
      name                  old alloc/op   new alloc/op   delta
      Find-4                   16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4             16.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4           80.0B ± 0%     48.0B ± 0%   -40.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              152B ± 0%      104B ± 0%   -31.58%  (p=0.000 n=10+10)
      
      name                  old allocs/op  new allocs/op  delta
      Find-4                    1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindString-4              1.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=10+10)
      FindSubmatch-4            2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      FindStringSubmatch-4      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
      ReplaceAll-4              8.00 ± 0%      5.00 ± 0%   -37.50%  (p=0.000 n=10+10)
      
      Fixes #15643
      
      Change-Id: I594fe51172373e2adb98d1d25c76ca2cde54ff48
      Reviewed-on: https://go-review.googlesource.com/23030Reviewed-by: default avatarBrad Fitzpatrick <bradfitz@golang.org>
      Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      bea39e63
    • David Crawshaw's avatar
      cmd/compile: generate table of main symbol types · 5923df1a
      David Crawshaw authored
      For each exported symbol in package main, add its name and type to
      go.plugin.tabs symbol. This is used by the runtime when loading a
      plugin to return a typed interface{} value.
      
      Change-Id: I23c39583e57180acb8f7a74d218dae4368614f46
      Reviewed-on: https://go-review.googlesource.com/27818
      Run-TryBot: David Crawshaw <crawshaw@golang.org>
      TryBot-Result: Gobot Gobot <gobot@golang.org>
      Reviewed-by: default avatarIan Lance Taylor <iant@golang.org>
      5923df1a