An error occurred fetching the project authors.
  1. 25 Mar, 2014 4 commits
  2. 14 Mar, 2014 3 commits
  3. 12 Mar, 2014 1 commit
    • Russ Cox's avatar
      runtime: fix empty string handling in garbage collector · 54c901cd
      Russ Cox authored
      The garbage collector uses type information to guide the
      traversal of the heap. If it sees a field that should be a string,
      it marks the object pointed at by the string data pointer as
      visited but does not bother to look at the data, because
      strings contain bytes, not pointers.
      
      If you save s[len(s):] somewhere, though, the string data pointer
      actually points just beyond the string data; if the string data
      were exactly the size of an allocated block, the string data
      pointer would actually point at the next block. It is incorrect
      to mark that next block as visited and not bother to look at
      the data, because the next block may be some other type
      entirely.
      
      The fix is to ignore strings with zero length during collection:
      they are empty and can never become non-empty: the base
      pointer will never be used again. The handling of slices already
      does this (but using cap instead of len).
      
      This was not a bug in Go 1.2, because until January all string
      allocations included a trailing NUL byte not included in the
      length, so s[len(s):] still pointed inside the string allocation
      (at the NUL).
      
      This bug was causing the crashes in test/run.go. Specifically,
      the parsing of a regexp in package regexp/syntax allocated a
      []syntax.Inst with rounded size 1152 bytes. In fact it
      allocated many such slices, because during the processing of
      test/index2.go it creates thousands of regexps that are all
      approximately the same complexity. That takes a long time, and
      test/run works on other tests in other goroutines. One such
      other test is chan/perm.go, which uses an 1152-byte source
      file. test/run reads that file into a []byte and then calls
      strings.Split(string(src), "\n"). The string(src) creates an
      1152-byte string - and there's a very good chance of it
      landing next to one of the many many regexp slices already
      allocated - and then because the file ends in a \n,
      strings.Split records the tail empty string as the final
      element in the slice. A garbage collection happens at this
      point, the collection finds that string before encountering
      the []syntax.Inst data it now inadvertently points to, and the
      []syntax.Inst data is not scanned for the pointers that it
      contains. Each syntax.Inst contains a []rune, those are
      missed, and the backing rune arrays are freed for reuse. When
      the regexp is later executed, the runes being searched for are
      no longer runes at all, and there is no match, even on text
      that should match.
      
      On 64-bit machines the pointer in the []rune inside the
      syntax.Inst is larger (along with a few other pointers),
      pushing the []syntax.Inst backing array into a larger size
      class, avoiding the collision with chan/perm.go's
      inadvertently sized file.
      
      I expect this was more prevalent on OS X than on Linux or
      Windows because those managed to run faster or slower and
      didn't overlap index2.go with chan/perm.go as often. On the
      ARM systems, we only run one errorcheck test at a time, so
      index2 and chan/perm would never overlap.
      
      It is possible that this bug is the root cause of other crashes
      as well. For now we only know it is the cause of the test/run crash.
      
      Many thanks to Dmitriy for help debugging.
      
      Fixes #7344.
      Fixes #7455.
      
      LGTM=r, dvyukov, dave, iant
      R=golang-codereviews, dave, r, dvyukov, delpontej, iant
      CC=golang-codereviews, khr
      https://golang.org/cl/74250043
      54c901cd
  4. 11 Mar, 2014 2 commits
  5. 10 Mar, 2014 1 commit
  6. 07 Mar, 2014 1 commit
    • Russ Cox's avatar
      runtime: fix memory leak in runfinq · b08156cd
      Russ Cox authored
      One reason the sync.Pool finalizer test can fail is that
      this function's ef1 contains uninitialized data that just
      happens to point at some of the old pool. I've seen this cause
      retention of a single pool cache line (32 elements) on arm.
      
      Really we need liveness information for C functions, but
      for now we can be more careful about data in long-lived
      C functions that block.
      
      LGTM=bradfitz, dvyukov
      R=golang-codereviews, bradfitz, dvyukov
      CC=golang-codereviews, iant, khr
      https://golang.org/cl/72490043
      b08156cd
  7. 06 Mar, 2014 3 commits
  8. 27 Feb, 2014 2 commits
    • Keith Randall's avatar
      runtime: move stack shrinking until after sweepgen is incremented. · e9445547
      Keith Randall authored
      Before GC, we flush all the per-P allocation caches.  Doing
      stack shrinking mid-GC causes these caches to fill up.  At the
      end of gc, the sweepgen is incremented which causes all of the
      data in these caches to be in a bad state (cached but not yet
      swept).
      
      Move the stack shrinking until after sweepgen is incremented,
      so any caching that happens as part of shrinking is done with
      already-swept data.
      
      Reenable stack copying.
      
      LGTM=bradfitz
      R=golang-codereviews, bradfitz
      CC=golang-codereviews
      https://golang.org/cl/69620043
      e9445547
    • Keith Randall's avatar
      runtime: grow stack by copying · 1665b006
      Keith Randall authored
      On stack overflow, if all frames on the stack are
      copyable, we copy the frames to a new stack twice
      as large as the old one.  During GC, if a G is using
      less than 1/4 of its stack, copy the stack to a stack
      half its size.
      
      TODO
      - Do something about C frames.  When a C frame is in the
        stack segment, it isn't copyable.  We allocate a new segment
        in this case.
        - For idempotent C code, we can abort it, copy the stack,
          then retry.  I'm working on a separate CL for this.
        - For other C code, we can raise the stackguard
          to the lowest Go frame so the next call that Go frame
          makes triggers a copy, which will then succeed.
      - Pick a starting stack size?
      
      The plan is that eventually we reach a point where the
      stack contains only copyable frames.
      
      LGTM=rsc
      R=dvyukov, rsc
      CC=golang-codereviews
      https://golang.org/cl/54650044
      1665b006
  9. 26 Feb, 2014 1 commit
    • Keith Randall's avatar
      runtime: get rid of the settype buffer and lock. · 3b5278fc
      Keith Randall authored
      MCaches	now hold a MSpan for each sizeclass which they have
      exclusive access to allocate from, so no lock is needed.
      
      Modifying the heap bitmaps also no longer requires a cas.
      
      runtime.free gets more expensive.  But we don't use it
      much any more.
      
      It's not much faster on 1 processor, but it's a lot
      faster on multiple processors.
      
      benchmark                 old ns/op    new ns/op    delta
      BenchmarkSetTypeNoPtr1           24           23   -0.42%
      BenchmarkSetTypeNoPtr2           33           34   +0.89%
      BenchmarkSetTypePtr1             51           49   -3.72%
      BenchmarkSetTypePtr2             55           54   -1.98%
      
      benchmark                old ns/op    new ns/op    delta
      BenchmarkAllocation          52739        50770   -3.73%
      BenchmarkAllocation-2        33957        34141   +0.54%
      BenchmarkAllocation-3        33326        29015  -12.94%
      BenchmarkAllocation-4        38105        25795  -32.31%
      BenchmarkAllocation-5        68055        24409  -64.13%
      BenchmarkAllocation-6        71544        23488  -67.17%
      BenchmarkAllocation-7        68374        23041  -66.30%
      BenchmarkAllocation-8        70117        20758  -70.40%
      
      LGTM=rsc, dvyukov
      R=dvyukov, bradfitz, khr, rsc
      CC=golang-codereviews
      https://golang.org/cl/46810043
      3b5278fc
  10. 25 Feb, 2014 1 commit
    • Dave Cheney's avatar
      all: merge NaCl branch (part 1) · 7c8280c9
      Dave Cheney authored
      See golang.org/s/go13nacl for design overview.
      
      This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.
      
      CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.
      
      The exact change lists included are
      
      15050047: syscall: support for Native Client
      15360044: syscall: unzip implementation for Native Client
      15370044: syscall: Native Client SRPC implementation
      15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
      15410048: runtime: support for Native Client
      15410049: syscall: file descriptor table for Native Client
      15410050: syscall: in-memory file system for Native Client
      15440048: all: update +build lines for Native Client port
      15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
      15570045: os: support for Native Client
      15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
      15690044: net: support for Native Client
      15690048: runtime: support for fake time like on Go Playground
      15690051: build: disable various tests on Native Client
      
      LGTM=rsc
      R=rsc
      CC=golang-codereviews
      https://golang.org/cl/68150047
      7c8280c9
  11. 24 Feb, 2014 3 commits
  12. 20 Feb, 2014 1 commit
    • Russ Cox's avatar
      runtime: use goc2c as much as possible · 67c83db6
      Russ Cox authored
      Package runtime's C functions written to be called from Go
      started out written in C using carefully constructed argument
      lists and the FLUSH macro to write a result back to memory.
      
      For some functions, the appropriate parameter list ended up
      being architecture-dependent due to differences in alignment,
      so we added 'goc2c', which takes a .goc file containing Go func
      declarations but C bodies, rewrites the Go func declaration to
      equivalent C declarations for the target architecture, adds the
      needed FLUSH statements, and writes out an equivalent C file.
      That C file is compiled as part of package runtime.
      
      Native Client's x86-64 support introduces the most complex
      alignment rules yet, breaking many functions that could until
      now be portably written in C. Using goc2c for those avoids the
      breakage.
      
      Separately, Keith's work on emitting stack information from
      the C compiler would require the hand-written functions
      to add #pragmas specifying how many arguments are result
      parameters. Using goc2c for those avoids maintaining #pragmas.
      
      For both reasons, use goc2c for as many Go-called C functions
      as possible.
      
      This CL is a replay of the bulk of CL 15400047 and CL 15790043,
      both of which were reviewed as part of the NaCl port and are
      checked in to the NaCl branch. This CL is part of bringing the
      NaCl code into the main tree.
      
      No new code here, just reformatting and occasional movement
      into .h files.
      
      LGTM=r
      R=dave, alex.brainman, r
      CC=golang-codereviews
      https://golang.org/cl/65220044
      67c83db6
  13. 13 Feb, 2014 2 commits
  14. 12 Feb, 2014 3 commits
    • Russ Cox's avatar
      runtime: fix non-concurrent sweep · 73a30435
      Russ Cox authored
      State of the world:
      
      CL 46430043 introduced a new concurrent sweep but is broken.
      
      CL 62360043 made the new sweep non-concurrent
      to try to fix the world while we understand what's wrong with
      the concurrent version.
      
      This CL fixes the non-concurrent form to run finalizers.
      This CL is just a band-aid to get the build green again.
      
      Dmitriy is working on understanding and then fixing what's
      wrong with the concurrent sweep.
      
      TBR=dvyukov
      CC=golang-codereviews
      https://golang.org/cl/62370043
      73a30435
    • Dmitriy Vyukov's avatar
      runtime: temporary disable concurrent GC sweep · 3cac829f
      Dmitriy Vyukov authored
      We see failures on builders, e.g.:
      http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4
      
      LGTM=rsc, dave
      R=rsc, dave
      CC=golang-codereviews
      https://golang.org/cl/62360043
      3cac829f
    • Dmitriy Vyukov's avatar
      runtime: concurrent GC sweep · 3c3be622
      Dmitriy Vyukov authored
      Moves sweep phase out of stoptheworld by adding
      background sweeper goroutine and lazy on-demand sweeping.
      
      It turned out to be somewhat trickier than I expected,
      because there is no point in time when we know size of live heap
      nor consistent number of mallocs and frees.
      So everything related to next_gc, mprof, memstats, etc becomes trickier.
      
      At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
      which is much larger than real value. But after every sweep
      next_gc is decremented by freed*GOGC. So when everything is swept
      next_gc becomes what it should be.
      
      For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
      because by the end of GC we know number of frees for the *previous* GC.
      
      Significant caution is required to not cross yet-unknown real value of next_gc.
      This is achieved by 2 means:
      1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
      2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
      returned to heap.
      This provides quite strong guarantees that heap does not grow when it should now.
      
      http-1
      allocated                    7036         7033      -0.04%
      allocs                         60           60      +0.00%
      cputime                     51050        46700      -8.52%
      gc-pause-one             34060569      1777993     -94.78%
      gc-pause-total               2554          133     -94.79%
      latency-50                 178448       170926      -4.22%
      latency-95                 284350       198294     -30.26%
      latency-99                 345191       220652     -36.08%
      rss                     101564416    101007360      -0.55%
      sys-gc                    6606832      6541296      -0.99%
      sys-heap                 88801280     87752704      -1.18%
      sys-other                 7334208      7405928      +0.98%
      sys-stack                  524288       524288      +0.00%
      sys-total               103266608    102224216      -1.01%
      time                        50339        46533      -7.56%
      virtual-mem             292990976    293728256      +0.25%
      
      garbage-1
      allocated                 2983818      2990889      +0.24%
      allocs                      62880        62902      +0.03%
      cputime                  16480000     16190000      -1.76%
      gc-pause-one            828462467    487875135     -41.11%
      gc-pause-total            4142312      2439375     -41.11%
      rss                    1151709184   1153712128      +0.17%
      sys-gc                   66068352     66068352      +0.00%
      sys-heap               1039728640   1039728640      +0.00%
      sys-other                37776064     40770176      +7.93%
      sys-stack                 8781824      8781824      +0.00%
      sys-total              1152354880   1155348992      +0.26%
      time                     16496998     16199876      -1.80%
      virtual-mem            1409564672   1402281984      -0.52%
      
      LGTM=rsc
      R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
      CC=golang-codereviews, khr
      https://golang.org/cl/46430043
      3c3be622
  15. 30 Jan, 2014 1 commit
    • Dmitriy Vyukov's avatar
      runtime: increase page size to 8K · e48751e2
      Dmitriy Vyukov authored
      Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
      Only Chromium uses 4K pages today (in "slow but small" configuration).
      The general tendency is to increase page size, because it reduces
      metadata size and DTLB pressure.
      This change reduces GC pause by ~10% and slightly improves other metrics.
      
      json-1
      allocated                 8037492      8038689      +0.01%
      allocs                     105762       105573      -0.18%
      cputime                 158400000    155800000      -1.64%
      gc-pause-one              4412234      4135702      -6.27%
      gc-pause-total            2647340      2398707      -9.39%
      rss                      54923264     54525952      -0.72%
      sys-gc                    3952624      3928048      -0.62%
      sys-heap                 46399488     46006272      -0.85%
      sys-other                 5597504      5290304      -5.49%
      sys-stack                  393216       393216      +0.00%
      sys-total                56342832     55617840      -1.29%
      time                    158478890    156046916      -1.53%
      virtual-mem             256548864    256593920      +0.02%
      
      garbage-1
      allocated                 2991113      2986259      -0.16%
      allocs                      62844        62652      -0.31%
      cputime                  16330000     15860000      -2.88%
      gc-pause-one            789108229    725555211      -8.05%
      gc-pause-total            3945541      3627776      -8.05%
      rss                    1143660544   1132253184      -1.00%
      sys-gc                   65609600     65806208      +0.30%
      sys-heap               1032388608   1035599872      +0.31%
      sys-other                37501632     22777664     -39.26%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1144150592   1132965568      -0.98%
      time                     16364602     15891994      -2.89%
      virtual-mem            1327296512   1313746944      -1.02%
      
      This is the exact reincarnation of already LGTMed:
      https://golang.org/cl/45770044
      which must not break darwin/freebsd after:
      https://golang.org/cl/56630043
      TBR=iant
      
      LGTM=khr, iant
      R=iant, khr
      CC=golang-codereviews
      https://golang.org/cl/58230043
      e48751e2
  16. 27 Jan, 2014 1 commit
    • Dmitriy Vyukov's avatar
      runtime: fix windows build · 86a3a542
      Dmitriy Vyukov authored
      Currently windows crashes because early allocs in schedinit
      try to allocate tiny memory blocks, but m->p is not yet setup.
      I've considered calling procresize(1) earlier in schedinit,
      but this refactoring is better and must fix the issue as well.
      Fixes #7218.
      
      R=golang-codereviews, r
      CC=golang-codereviews
      https://golang.org/cl/54570045
      86a3a542
  17. 24 Jan, 2014 3 commits
    • Dmitriy Vyukov's avatar
      runtime: combine small NoScan allocations · 1fa70294
      Dmitriy Vyukov authored
      Combine NoScan allocations < 16 bytes into a single memory block.
      Reduces number of allocations on json/garbage benchmarks by 10+%.
      
      json-1
      allocated                 8039872      7949194      -1.13%
      allocs                     105774        93776     -11.34%
      cputime                 156200000    100700000     -35.53%
      gc-pause-one              4908873      3814853     -22.29%
      gc-pause-total            2748969      2899288      +5.47%
      rss                      52674560     43560960     -17.30%
      sys-gc                    3796976      3256304     -14.24%
      sys-heap                 43843584     35192832     -19.73%
      sys-other                 5589312      5310784      -4.98%
      sys-stack                  393216       393216      +0.00%
      sys-total                53623088     44153136     -17.66%
      time                    156193436    100886714     -35.41%
      virtual-mem             256548864    256540672      -0.00%
      
      garbage-1
      allocated                 2996885      2932982      -2.13%
      allocs                      62904        55200     -12.25%
      cputime                  17470000     17400000      -0.40%
      gc-pause-one            932757485    925806143      -0.75%
      gc-pause-total            4663787      4629030      -0.75%
      rss                    1151074304   1133670400      -1.51%
      sys-gc                   66068352     65085312      -1.49%
      sys-heap               1039728640   1024065536      -1.51%
      sys-other                38038208     37485248      -1.45%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1152485952   1135417920      -1.48%
      time                     17478088     17418005      -0.34%
      virtual-mem            1343709184   1324204032      -1.45%
      
      LGTM=iant, bradfitz
      R=golang-codereviews, dave, iant, rsc, bradfitz
      CC=golang-codereviews, khr
      https://golang.org/cl/38750047
      1fa70294
    • Dmitriy Vyukov's avatar
      sync: scalable Pool · f8e0057b
      Dmitriy Vyukov authored
      Introduce fixed-size P-local caches.
      When local caches overflow/underflow a batch of items
      is transferred to/from global mutex-protected cache.
      
      benchmark                    old ns/op    new ns/op    delta
      BenchmarkPool                    50554        22423  -55.65%
      BenchmarkPool-4                 400359         5904  -98.53%
      BenchmarkPool-16                403311         1598  -99.60%
      BenchmarkPool-32                367310         1526  -99.58%
      
      BenchmarkPoolOverlflow            5214         3633  -30.32%
      BenchmarkPoolOverlflow-4         42663         9539  -77.64%
      BenchmarkPoolOverlflow-8         46919        11385  -75.73%
      BenchmarkPoolOverlflow-16        39454        13048  -66.93%
      
      BenchmarkSprintfEmpty                    84           63  -25.68%
      BenchmarkSprintfEmpty-2                 371           32  -91.13%
      BenchmarkSprintfEmpty-4                 465           22  -95.25%
      BenchmarkSprintfEmpty-8                 565           12  -97.77%
      BenchmarkSprintfEmpty-16                498            5  -98.87%
      BenchmarkSprintfEmpty-32                492            4  -99.04%
      
      BenchmarkSprintfString                  259          229  -11.58%
      BenchmarkSprintfString-2                574          144  -74.91%
      BenchmarkSprintfString-4                651           77  -88.05%
      BenchmarkSprintfString-8                868           47  -94.48%
      BenchmarkSprintfString-16               825           33  -95.96%
      BenchmarkSprintfString-32               825           30  -96.28%
      
      BenchmarkSprintfInt                     213          188  -11.74%
      BenchmarkSprintfInt-2                   448          138  -69.20%
      BenchmarkSprintfInt-4                   624           52  -91.63%
      BenchmarkSprintfInt-8                   691           31  -95.43%
      BenchmarkSprintfInt-16                  724           18  -97.46%
      BenchmarkSprintfInt-32                  718           16  -97.70%
      
      BenchmarkSprintfIntInt                  311          282   -9.32%
      BenchmarkSprintfIntInt-2                333          145  -56.46%
      BenchmarkSprintfIntInt-4                642          110  -82.87%
      BenchmarkSprintfIntInt-8                832           42  -94.90%
      BenchmarkSprintfIntInt-16               817           24  -97.00%
      BenchmarkSprintfIntInt-32               805           22  -97.17%
      
      BenchmarkSprintfPrefixedInt             309          269  -12.94%
      BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
      BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
      BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
      BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
      BenchmarkSprintfPrefixedInt-32          824           50  -93.83%
      
      BenchmarkSprintfFloat                   418          398   -4.78%
      BenchmarkSprintfFloat-2                 295          203  -31.19%
      BenchmarkSprintfFloat-4                 585          128  -78.12%
      BenchmarkSprintfFloat-8                 873           60  -93.13%
      BenchmarkSprintfFloat-16                884           33  -96.24%
      BenchmarkSprintfFloat-32                881           29  -96.62%
      
      BenchmarkManyArgs                      1097         1069   -2.55%
      BenchmarkManyArgs-2                     705          567  -19.57%
      BenchmarkManyArgs-4                     792          319  -59.72%
      BenchmarkManyArgs-8                     963          172  -82.14%
      BenchmarkManyArgs-16                   1115          103  -90.76%
      BenchmarkManyArgs-32                   1133           90  -92.03%
      
      LGTM=rsc
      R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
      CC=golang-codereviews
      https://golang.org/cl/46010043
      f8e0057b
    • Russ Cox's avatar
      cmd/gc: add zeroing to enable precise stack accounting · a81692e2
      Russ Cox authored
      There is more zeroing than I would like right now -
      temporaries used for the new map and channel runtime
      calls need to be eliminated - but it will do for now.
      
      This CL only has an effect if you are building with
      
              GOEXPERIMENT=precisestack ./all.bash
      
      (or make.bash). It costs about 5% in the overall time
      spent in all.bash. That number will come down before
      we make it on by default, but this should be enough for
      Keith to try using the precise maps for copying stacks.
      
      amd64 only (and it's not really great generated code).
      
      TBR=khr, iant
      CC=golang-codereviews
      https://golang.org/cl/56430043
      a81692e2
  18. 23 Jan, 2014 2 commits
    • Dmitriy Vyukov's avatar
      undo CL 45770044 / d795425bfa18 · 8371b014
      Dmitriy Vyukov authored
      Breaks darwin and freebsd.
      
      ««« original CL description
      runtime: increase page size to 8K
      Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
      Only Chromium uses 4K pages today (in "slow but small" configuration).
      The general tendency is to increase page size, because it reduces
      metadata size and DTLB pressure.
      This change reduces GC pause by ~10% and slightly improves other metrics.
      
      json-1
      allocated                 8037492      8038689      +0.01%
      allocs                     105762       105573      -0.18%
      cputime                 158400000    155800000      -1.64%
      gc-pause-one              4412234      4135702      -6.27%
      gc-pause-total            2647340      2398707      -9.39%
      rss                      54923264     54525952      -0.72%
      sys-gc                    3952624      3928048      -0.62%
      sys-heap                 46399488     46006272      -0.85%
      sys-other                 5597504      5290304      -5.49%
      sys-stack                  393216       393216      +0.00%
      sys-total                56342832     55617840      -1.29%
      time                    158478890    156046916      -1.53%
      virtual-mem             256548864    256593920      +0.02%
      
      garbage-1
      allocated                 2991113      2986259      -0.16%
      allocs                      62844        62652      -0.31%
      cputime                  16330000     15860000      -2.88%
      gc-pause-one            789108229    725555211      -8.05%
      gc-pause-total            3945541      3627776      -8.05%
      rss                    1143660544   1132253184      -1.00%
      sys-gc                   65609600     65806208      +0.30%
      sys-heap               1032388608   1035599872      +0.31%
      sys-other                37501632     22777664     -39.26%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1144150592   1132965568      -0.98%
      time                     16364602     15891994      -2.89%
      virtual-mem            1327296512   1313746944      -1.02%
      
      R=golang-codereviews, dave, khr, rsc, khr
      CC=golang-codereviews
      https://golang.org/cl/45770044
      »»»
      
      R=golang-codereviews
      CC=golang-codereviews
      https://golang.org/cl/56060043
      8371b014
    • Dmitriy Vyukov's avatar
      runtime: increase page size to 8K · 6d603af6
      Dmitriy Vyukov authored
      Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
      Only Chromium uses 4K pages today (in "slow but small" configuration).
      The general tendency is to increase page size, because it reduces
      metadata size and DTLB pressure.
      This change reduces GC pause by ~10% and slightly improves other metrics.
      
      json-1
      allocated                 8037492      8038689      +0.01%
      allocs                     105762       105573      -0.18%
      cputime                 158400000    155800000      -1.64%
      gc-pause-one              4412234      4135702      -6.27%
      gc-pause-total            2647340      2398707      -9.39%
      rss                      54923264     54525952      -0.72%
      sys-gc                    3952624      3928048      -0.62%
      sys-heap                 46399488     46006272      -0.85%
      sys-other                 5597504      5290304      -5.49%
      sys-stack                  393216       393216      +0.00%
      sys-total                56342832     55617840      -1.29%
      time                    158478890    156046916      -1.53%
      virtual-mem             256548864    256593920      +0.02%
      
      garbage-1
      allocated                 2991113      2986259      -0.16%
      allocs                      62844        62652      -0.31%
      cputime                  16330000     15860000      -2.88%
      gc-pause-one            789108229    725555211      -8.05%
      gc-pause-total            3945541      3627776      -8.05%
      rss                    1143660544   1132253184      -1.00%
      sys-gc                   65609600     65806208      +0.30%
      sys-heap               1032388608   1035599872      +0.31%
      sys-other                37501632     22777664     -39.26%
      sys-stack                 8650752      8781824      +1.52%
      sys-total              1144150592   1132965568      -0.98%
      time                     16364602     15891994      -2.89%
      virtual-mem            1327296512   1313746944      -1.02%
      
      R=golang-codereviews, dave, khr, rsc, khr
      CC=golang-codereviews
      https://golang.org/cl/45770044
      6d603af6
  19. 22 Jan, 2014 1 commit
    • Dmitriy Vyukov's avatar
      runtime: remove locks from netpoll hotpaths · 9cbd2fb1
      Dmitriy Vyukov authored
      Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
      This mechanism does not require backing mutex to protect wait predicate.
      Use it in netpoll. See comment in netpoll.goc for details.
      This slightly reduces contention between reader, writer and read/write io notifications;
      and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.
      
      benchmark                             old ns/op    new ns/op    delta
      BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
      BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
      BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
      BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
      BenchmarkTCP4Persistent                    9411         9240   -1.82%
      BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
      BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
      BenchmarkTCP4Persistent-8                  3943         3857   -2.18%
      
      R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
      CC=golang-codereviews, khr
      https://golang.org/cl/45700043
      9cbd2fb1
  20. 21 Jan, 2014 3 commits
    • Dmitriy Vyukov's avatar
      runtime: do not collect GC roots explicitly · cb133c66
      Dmitriy Vyukov authored
      Currently we collect (add) all roots into a global array in a single-threaded GC phase.
      This hinders parallelism.
      With this change we just kick off parallel for for number_of_goroutines+5 iterations.
      Then parallel for callback decides whether it needs to scan stack of a goroutine
      scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
      This requires to store all goroutines in an array instead of a linked list
      (to allow direct indexing).
      This CL also removes DebugScan functionality. It is broken because it uses
      unbounded stack, so it can not run on g0. When it was working, I've found
      it helpless for debugging issues because the two algorithms are too different now.
      This change would require updating the DebugScan, so it's simpler to just delete it.
      
      With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.
      
      garbage-8
      allocated                 2987886      2989221      +0.04%
      allocs                      62885        62887      +0.00%
      cputime                  21286000     21272000      -0.07%
      gc-pause-one             26633247     24885421      -6.56%
      gc-pause-total             873570       811264      -7.13%
      rss                     242089984    242515968      +0.18%
      sys-gc                   13934336     13869056      -0.47%
      sys-heap                205062144    205062144      +0.00%
      sys-other                12628288     12628288      +0.00%
      sys-stack                11534336     11927552      +3.41%
      sys-total               243159104    243487040      +0.13%
      time                      2809477      2740795      -2.44%
      
      R=golang-codereviews, rsc
      CC=cshapiro, golang-codereviews, khr
      https://golang.org/cl/46860043
      cb133c66
    • Dmitriy Vyukov's avatar
      runtime: per-P defer pool · 1ba04c17
      Dmitriy Vyukov authored
      Instead of a per-goroutine stack of defers for all sizes,
      introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.
      
      For a program that starts 1e6 goroutines and then joins then:
      old: rss=6.6g virtmem=10.2g time=4.85s
      new: rss=4.5g virtmem= 8.2g time=3.48s
      
      R=golang-codereviews, rsc
      CC=golang-codereviews
      https://golang.org/cl/42750044
      1ba04c17
    • Dmitriy Vyukov's avatar
      runtime: zero 2-word memory blocks in-place · d5a36cd6
      Dmitriy Vyukov authored
      Currently for 2-word blocks we set the flag to clear the flag. Makes no sense.
      In particular on 32-bits we call memclr always.
      
      R=golang-codereviews, dave, iant
      CC=golang-codereviews, khr, rsc
      https://golang.org/cl/41170044
      d5a36cd6
  21. 16 Jan, 2014 1 commit